The big problem with Thai spell-checking in LibreOffice is that segmenting the text into words and then checking the spell are two independent processes. Although the segmentation logic primarily works from a long list of words, this list is currently independent of the spell checker's knowledge of the language. This is not good, but it needs some careful programming work and possibly changing of LibreOffice concepts to ensure segmentation and spell-checking use the same list. This is why it is sometimes necessary to tell the segmentation logic where the word breaks are (by inserting ZWSP - menu sequence insert, formatting mark, no-width optional break, short cut ctrl+/) or are not (by inserting WJ - menu sequence insert, formatting mark, no-width no break, no predefined short cut).
For example, both segmentation and spell-checking know แคลิฟอร์เนีย, so that word is accepted without complaint. However, when นี is mistyped as นื, the segmentation does not recognise the misspelt word, and unsurprisingly comes up with an unhelpful segmentation. I can't see any way round this problem but telling the system where the word breaks should be.
With a long foreign name such as เฮมมิงเวย์, what you can reasonably hope for is to be able to override the erroneous segmentation, add the word to the user's personal directory, and then have subsequent occurrences accepted as correct without further ado. Until the segmentation code uses the personal directory, we are stuck with adding WJ to all the occurrences.
๕ เปอร์เซ็นต์ของ is another matter. First, if you delete the spaces, the string of characters is taken to be word containing a number, and is therefore not spell checked. (You can override this by the tick box at tools, options, language settings, writing aids, check words with numbers.) Secondly, there is an error in the dictionary file th_TH.aff. เปอร์เซ็นต์ and 62 other entries have trailing spaces in that file. Consequently, เปอร์เซ็นต์ gets passed to the spell-checker, which suggests adding a trailing space. If the trailing space is added, the spell-checker is again passed เปอร์เซ็นต์, and the spell-checker then suggests adding yet another trailing space!
Thai In Openoffice On Ubuntu Lucid Lynx
Started by Richard W, 2011-02-19 08:18
|
28 replies to this topic
#26Posted 2012-03-25 02:15:54 #27Posted 2012-03-25 17:40:36
Correction:The extraneous spaces are in th_TH.dic, not th_TH.aff.
#28Posted 2012-03-25 21:06:49
Richard, Thanks this explains a lot. You really understand this well. I hope these problems can be worked out. Thailand really needs a good open alternative to Word, which is probably 90 percent pirated here. Out of curiosity does the Thai version of Word have problems too? I used it only a couple of times but seem to remember the spell-checker wasn't all that great.
#29Posted 2012-03-31 08:33:38
My comparisons are probably unfair. The latest version of Word I've used for Thai is Word 2002, and in 2003 that must have been better than Star Office, because when I had to type letters or faxes in Thai I used Word rather than Star Office. I do remember having to fight the line-breaker - Word 2002 didn't understand WJ or ZWSP, so my only tool was plain space. Working WJ and ZWSP enable much robuster victories when fighting LibreOffice's line breaker - the battle comes nearer to simply being me helping out the line-breaker.
I've finally just dug out my Word installation disk and installed Word 2002 on a machine that has an idle Windows XP OS so I can compare spell checking. (The machine normally serves as a Youtube viewer running under Ubuntu.) Word 2002 does know the word แคลิฟอร์เนีย but without understanding of WJ it cannot handle เฮมมิงเวย์ or the misspelling แคลิฟอร์เนืย at all. It splits the words up and I know no way of fixing that in Word 2002. U+FEFF ZWNBSP doesn't work any better, and actually renders! |
Sponsored by: |












