Projects

O Goireasan Akerbeltz
Am mùthadh mar a bha e 18:17, 13 dhen Ghearran 2012 le Akerbeltz (deasbaireachd | mùthaidhean) (Created page with "I'm writing these in reverse order in the sense that I've done a dozen or so projects now but I'm starting with the ones I'm currently working on first. ==Adaptxt== A neat open-...")
(diofar) ← Mùthadh nas sine | Am mùthadh mu dheireadh (diofar) | Mùthadh nas ùire → (diofar)
Jump to navigation Jump to search

I'm writing these in reverse order in the sense that I've done a dozen or so projects now but I'm starting with the ones I'm currently working on first.

Adaptxt

A neat open-source project for predictive texting. Given how much people text, you should really consider this for you language if no such thing exists currently. The product site is here but the hard work happens here. You might as well bookmark both.

Basics

You'll need to following ingredients:

  • an inclusion.txt file
  • a corpus.txt file
  • an xml file

The inclusion file is a long text file containing all the words in your language you want in the program. If a spellchecker exists for your language, that can be a good starting point.

The corpus file is used to give statistical weighting to each word. For example, thimbel and thing both begin with thi. But thing is way more common than thimbel and the corpus file contains the raw data which will add that info. Before you moan that your language doesn't have a corpus - Adaptxt will work reasonably well even if you can't provide such statistical data, not least of all because it learns as you use it. But it works better straight off if you do.

The xml file is fairly easy. Contains data about the codes for your language (for example, Irish Gaelic is gleie (gle for Irish, ie for Ireland) and so on.

l10n for Humans
Basics - Projects - Gear - Terminology - Other neat stuff