summaryrefslogtreecommitdiffstats
path: root/cs_CZ/thesaurus/dictionary-to-thesaurus.py
Commit message (Collapse)AuthorAgeFilesLines
* tdf#128341 use python3Mattia Rizzolo2019-10-241-7/+7
| | | | | | | | Change-Id: Ic8deb039da037bd270c39da03f8697a9ab034ff0 Signed-off-by: Mattia Rizzolo <mattia@mapreri.org> Reviewed-on: https://gerrit.libreoffice.org/81410 Reviewed-by: Michael Stahl <michael.stahl@cib.de> Tested-by: Michael Stahl <michael.stahl@cib.de>
* flake8 fixes to the dictionary-to-thesaurus scriptMattia Rizzolo2019-10-241-11/+19
| | | | | | | | | | | tested to work with both python3 and 2.7. Change-Id: I52fe00e1f33e605010cd99885c1a41396440e49d Signed-off-by: Mattia Rizzolo <mattia@mapreri.org> Reviewed-on: https://gerrit.libreoffice.org/81411 Reviewed-by: Thorsten Behrens <Thorsten.Behrens@CIB.de> Reviewed-by: Michael Stahl <michael.stahl@cib.de> Tested-by: Michael Stahl <michael.stahl@cib.de>
* Czech thesaurus: regenerate from updated sourceStanislav Horacek2016-11-071-3/+3
| | | | | | | | | | | changed location and authors of the source sorting for source was applied -> thesaurus sorting also changed Change-Id: I009688bb1aeaac20dbe0884f1b43b523a2a3eb7b Reviewed-on: https://gerrit.libreoffice.org/30612 Reviewed-by: Jan Holesovsky <kendy@collabora.com> Reviewed-by: Stanislav Horáček <stanislav.horacek@gmail.com> Tested-by: Stanislav Horáček <stanislav.horacek@gmail.com>
* dictionary-to-thesaurus.py: Put the better categorized words to the front.Jan Holesovsky2016-02-261-7/+13
| | | | Change-Id: Ib5c77f185abeeaef5045780766514a813794c8e8
* dictionary-to-thesaurus.py: Only output the same class of word.Jan Holesovsky2016-02-261-5/+27
| | | | | | | | | | When the class of the word is unambiguous, limit the output only to that - gives more precise & expected results. [Like, it is interesting to see the other possibilities too, but I guess less choices but more focused ones are preferred.] Change-Id: I2876fbb4fa02c00fc7e65189812365f77b9a5ed6
* dictionary-to-thesaurus.py: Move blacklist to a separate file.Jan Holesovsky2016-02-251-16/+26
| | | | Change-Id: Ie05e0c0ce8b4f9541a5a143ddf9ccf960940a3b7
* dictionary-to-thesaurus.py: Actually use the Czech names.Jan Holesovsky2016-02-251-4/+4
| | | | Change-Id: Ifb47efe7562ca9ccc2324d4ebd966506cae2bec6
* dictionary-to-thesaurus.py: Various cleanups.Jan Holesovsky2016-02-251-12/+66
| | | | | | | * word classifiacation (when available) * word blacklist * ignore some non-translations (eg. irregular verbs) * ignore vulgarisms (when marked), they only add confusion
* Czech: Script and dictionary to generate the Czech thesaurus.Jan Holesovsky2016-02-251-0/+104
slovnik_data_utf8.txt is the English <-> Czech dictionary from http://slovnik.zcu.cz/download.php, licensed under GNU Free Documentation License 1.1 or later. The data are a snapshot from 2016-02-24. dictionary-to-thesaurus.py is a simple script that generates a thesaurus from this dictionary. The idea to generate our thesaurus from a dictionary comes from Zdenek Zabokrtsky (UFAL, Faculty of Mathematics & Physics, Charles University in Prague). The results are far better than I would have imagined; I owe Zdenek some beers :-) Many thanks! The source data are GNU/FDL 1.1 or later, the resulting thesaurus too. The actual addition of the thesaurus to the build system will be done in a separate commit later.