Other neat stuff

O Goireasan Akerbeltz
Jump to navigation Jump to search

Here are some other neat tools and scripts that might come in handy for people working on l10n tools in under-resourced languages.

Alphabetical words

As in, words in which all letters are in the order of the alphabet (like beefily in English). You need a text file with one word on each line, then run the following command:

$ cat FILENAME.txt | while read x; do echo $x `echo $x | sed 's/./\n&/g' | sort -u | tr -d "\n"`; done | egrep '^(.+) \1$'

If you want to allow double letters such as nn, chance | sort -u | to | sort |. If there's a lot of output, paste it into a spreadsheet like LibreOffice Calc or Microsoft Excel and use a sort-by-length function.

Clear translations from a po file

If you have a po file with bad translations and can't get a clean pot file, try this: $ cat old.po | LC_ALL=C sed '1,/^$/!{/^msgstr /,/^$/{/^msgstr /s/.*/msgstr ""/; /^msgstr /!{/./d}}}' | LC_ALL=C sed '${/msgstr\[/s/.*/&\n/}' | LC_ALL=C sed '/^msgstr\[/,/^$/{/./d;/^$/{s/^/msgstr[0] ""\nmsgstr[1] ""\n/}}' | msgattrib --no-obsolete --clear-fuzzy --clear-previous > new.po

or this: $ msgen old.po > new.po

Converting a .lang file to .pot

If the file is UTF-16 encoded and has DOS-style line endings, try:

$ cat English.lang | iconv -f utf16 -t utf8 | tr -d "\015" | sed 's/["\\]/\\&/g' | sed 's/^\([^=]*\)=\(.*\)$/msgctxt "\1"\nmsgid "\2"\nmsgstr ""\n/' > English.pot

Converting a .po file back to .lang format

$ cat Gàidhlig.po | tr -d "\n" | sed 's/msgctxt "/\n&/g' | sed '1d' | sed 's/^msgctxt "//' | sed 's/"msgid ".*msgstr "/=/' | sed 's/"$//' | sed 's/\\\(["\\]\)/\1/g' | iconv -f utf8 -t utf16 > Gaidhlig.lang

Releasing all .ts files in a directory to .qt

$ ls *.ts | xargs lrelease

Setting custom shortcuts on Linux

Need some diacritics or characters not on any keyboard? Consider adding shortcut keys. On Linux you can use stuff like the below to map new key combinations. This particular set creates these within the English (UK) keyboard:
AltGr + | » ǀ
AltGr + / » ǁ
AltGr + ! » ǃ
AltGr + = » ǂ

Of course you'll have to modify for your language but once you get the principle (or even just realise you CAN do that), it's not that hard:

$ sudo sed -i \ -e '18s|.*|key<AB10>{[slash,question,U01C1,dead_abovedot]};|' \
-e '21s|.*|key<AE01>{[1,exclam,U01C3,exclamdown]};|' \
-e '23s|.*|key<LSGT>{[backslash,bar,U01C0,brokenbar]};|' \
-e '24s|.*|key<AE12>{[equal,plus,U01C2,dead_ogonek]};|' \
/usr/share/X11/xkb/symbols/gb && sudo dpkg-reconfigure xkb-data

l10n for Humans
Basics - Projects - Gear - Terminology - Other neat stuff