List Of 3000 Most Common Thai Words, anyone got one ? |
![]() |
Unlimited calls from/to and within Thailand (incl. mobiles) from 199 Baht/month
Special: Book your Thailand hotel with up to 75% off: Go smart - Go Agoda ![]() ![]() |
List Of 3000 Most Common Thai Words, anyone got one ? |
2006-12-14 13:48:23
Post
#1
|
|
|
Star Member ![]() ![]() ![]() ![]() ![]() Group: Advanced Members Posts: 1,473 Joined: 2006-08-31 Member No.: 33,919 |
|
|
|
|
2006-12-14 14:49:15
Post
#2
|
|
|
Super Member ![]() ![]() ![]() ![]() ![]() Group: Global Moderators Posts: 1,261 Joined: 2006-02-18 From: Bangkok Member No.: 27,106 |
Grover, the best I can do is I have a list of the 1000 most common words according to four sources of language corpora. I've attached a spreadsheet that I converted to HTML.
The best one is the Mary Haas list. Not sure about Haas, but the other three I know are all computed automatically, so the digits 0 to 9, among other things, count as "words" in their list, as well as some other things that aren't common Thai at all, but appear frequently in their corpora because of a large number of technical texts. Hope this is helpful. This post has been edited by Rikker: 2006-12-14 14:51:28
Attached File(s)
|
|
|
|
2006-12-14 15:15:27
Post
#3
|
|
|
Star Member ![]() ![]() ![]() ![]() ![]() Group: Advanced Members Posts: 1,473 Joined: 2006-08-31 Member No.: 33,919 |
Rikker, that is awesome. I've been looking for something exactly like this for years. Cheers
|
|
|
|
2006-12-14 21:46:33
Post
#4
|
|
|
Super Member ![]() ![]() ![]() ![]() ![]() Group: Advanced Members Posts: 1,570 Joined: 2005-02-08 Member No.: 16,612 |
Sadly there is no phonetical translations.
|
|
|
|
2006-12-14 23:25:49
Post
#5
|
|
|
Super Member ![]() ![]() ![]() ![]() ![]() Group: Global Moderators Posts: 1,261 Joined: 2006-02-18 From: Bangkok Member No.: 27,106 |
|
|
|
|
2006-12-16 11:42:49
Post
#6
|
|
|
Super Member ![]() ![]() ![]() ![]() ![]() Group: Global Moderators Posts: 1,261 Joined: 2006-02-18 From: Bangkok Member No.: 27,106 |
Rikker, that is awesome. I've been looking for something exactly like this for years. Cheers You're very welcome. But I forgot to answer your question. The data comes courtesy of Doug Cooper at CRCL. |
|
|
|
2006-12-16 13:10:20
Post
#7
|
|
|
The Hubgoblin ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Global Moderators Posts: 9,369 Joined: 2003-08-15 Member No.: 3,636 |
You rule Rikker. Fantastic resource!
|
|
|
|
2006-12-16 15:15:21
Post
#8
|
|
|
Super Member ![]() ![]() ![]() ![]() ![]() Group: Advanced Members Posts: 1,304 Joined: 2004-09-20 Member No.: 13,058 |
How should I be reading this list? What do the column headers Haas, Links, Orchid and Tax represent?
What do the numbers mean and why does each list have different numbers? For example the first row, why do three lists have การ but one has เป็น? Haas Links Orchid Tax 366 เป็น 15978 การ 11888 การ 9861 การ I've put it in a slightly cleaner Excel Format attached here. This post has been edited by wasabi: 2006-12-16 15:17:54
Attached File(s)
|
|
|
|
2006-12-16 15:21:03
Post
#9
|
|
|
Star Member ![]() ![]() ![]() ![]() ![]() Group: Advanced Members Posts: 1,473 Joined: 2006-08-31 Member No.: 33,919 |
|
|
|
|
2006-12-16 16:07:22
Post
#10
|
|
|
Senior Member ![]() ![]() ![]() ![]() Group: Members Posts: 194 Joined: 2005-03-31 Member No.: 17,952 |
QUOTE Grover, the best I can do is I have a list of the 1000 most common words according to four sources of language corpora. I've attached a spreadsheet that I converted to HTML. The best one is the Mary Haas list. Not sure about Haas, but the other three I know are all computed automatically, so the digits 0 to 9, among other things, count as "words" in their list, as well as some other things that aren't common Thai at all, but appear frequently in their corpora because of a large number of technical texts. Hope this is helpful. This is interesting, thanks Rikker. I did a similar thing a while back using all the text that people paste into thai2english.com, and for comparison the top 100 results in order were : ที่ , และ , จะ , การ , มี , ใน , ได้ , ของ , เป็น , ให้ , ไป , ก็ , ไม่ , ว่า , แล้ว , มา , กับ , คุณ , ใจ , คน , เรา , ฉัน , แต่ , นะ , นี้ , ครับ , อยู่ , เธอ , กัน , ผม , โดย , มัน , จาก , ต้อง , ด้วย , เลย , ยัง , หรือ , ทำ , ใช้ , คือ , เขา , มาก , ผู้ , บอก , พี่ , ดู , เมื่อ , วัน , อะไร , เรื่อง , ถ้า , ดี , เพราะ , อยาก , ค่ะ , ไม่ได้ , ปี , อีก , เพื่อ , พระ , รัก , นั้น , ตัว , ถึง , งาน , สามารถ , หน้า , เวลา , ใคร , ไทย , เพลง , แบบ , ซึ่ง , ไว้ , ขอ , ส่ง , ต่อ , ความ , ท่าน , อย่าง , ใหม่ , เล่น , ก่อน , หา , บ้าน , ตาม , ทาง , สำหรับ , หนึ่ง , เอา , เค้า , คะ , ทำให้ , ขึ้น , ไม่มี , อ่าน , บาท , ราย , ชื่อ ที่ was the most common by miles (about twice the count of และ), whereas all the others were relatively close. |
|
|
|
2006-12-16 18:02:11
Post
#11
|
|
|
Senior Member ![]() ![]() ![]() ![]() Group: Advanced Members Posts: 676 Joined: 2006-06-14 From: ไม่มีที่อยู่ คนจรจัด Member No.: 31,054 |
QUOTE Grover, the best I can do is I have a list of the 1000 most common words according to four sources of language corpora. I've attached a spreadsheet that I converted to HTML. The best one is the Mary Haas list. Not sure about Haas, but the other three I know are all computed automatically, so the digits 0 to 9, among other things, count as "words" in their list, as well as some other things that aren't common Thai at all, but appear frequently in their corpora because of a large number of technical texts. Hope this is helpful. This is interesting, thanks Rikker. I did a similar thing a while back using all the text that people paste into thai2english.com, and for comparison the top 100 results in order were : ที่ , และ , จะ , การ , มี , ใน , ได้ , ของ , เป็น , ให้ , ไป , ก็ , ไม่ , ว่า , แล้ว , มา , กับ , คุณ , ใจ , คน , เรา , ฉัน , แต่ , นะ , นี้ , ครับ , อยู่ , เธอ , กัน , ผม , โดย , มัน , จาก , ต้อง , ด้วย , เลย , ยัง , หรือ , ทำ , ใช้ , คือ , เขา , มาก , ผู้ , บอก , พี่ , ดู , เมื่อ , วัน , อะไร , เรื่อง , ถ้า , ดี , เพราะ , อยาก , ค่ะ , ไม่ได้ , ปี , อีก , เพื่อ , พระ , รัก , นั้น , ตัว , ถึง , งาน , สามารถ , หน้า , เวลา , ใคร , ไทย , เพลง , แบบ , ซึ่ง , ไว้ , ขอ , ส่ง , ต่อ , ความ , ท่าน , อย่าง , ใหม่ , เล่น , ก่อน , หา , บ้าน , ตาม , ทาง , สำหรับ , หนึ่ง , เอา , เค้า , คะ , ทำให้ , ขึ้น , ไม่มี , อ่าน , บาท , ราย , ชื่อ ที่ was the most common by miles (about twice the count of และ), whereas all the others were relatively close. Mike thats interisting to se how commomly used.. cheers |
|
|
|
2006-12-17 13:07:28
Post
#12
|
|
|
Super Member ![]() ![]() ![]() ![]() ![]() Group: Global Moderators Posts: 1,261 Joined: 2006-02-18 From: Bangkok Member No.: 27,106 |
How should I be reading this list? What do the column headers Haas, Links, Orchid and Tax represent? What do the numbers mean and why does each list have different numbers? For example the first row, why do three lists have การ but one has เป็น? Haas Links Orchid Tax 366 เป็น 15978 การ 11888 การ 9861 การ I've put it in a slightly cleaner Excel Format attached here. Thanks for doing that. My original is in Excel, I just wanted to make sure everyone could access it. The four columns are four different text collections/corpora. One from Mary Haas, another from NECTEC's Linguistics and Knowledge Science Laboratory (LINKS), Chula University's Orchid Corpus (appears to be offline right now), and the one labeled Tax I'm not clear on the exact source, but I think it might be the Thai tax code or a corpus of legal documents of some kind, given the high frequency of tax-related terms in their top 1000 words. The number next to each word is the number of times that word appears in the corpus. The number at the top of each column is just a sum of the total number of occurrences of top 1000 words. As for why the lists have different words in the top spots, well, that has to do with at least three things: [a] the size of the corpus, [b] the variety (or lack of it) of the subject matter collected in the corpus, [c] the method used to count occurrences. The line you've quoted is the top word in each of the four corpora. You can see the Haas corpus is a much smaller corpus, with its top word only occurring 366 times. The other three, all much larger, agree that การ is more common. Orchid is largest at 416,000, but I don't know what constitutes a "word" for the purposes of counting in the Orchid corpus. While English "words" don't correspond to the collections of letters between spaces as much as we tend to think they do, it makes it easy for establishing a clear meaning of "word" for the purpose of gathering corpora (and that is easily countable via automatic means). Thai... a bit trickier. I know the corpora on thai.sealang.net are all counted via number of characters, not words. Also, one telltale sign that Tax is a very narrow corpus subject-matter-wise is the fact that while it is 269,000 words large, it only has 2100 distinct words in it, while even in Haas there are 4000 distinct words out of 27000 total words. This post has been edited by Rikker: 2006-12-17 13:28:30 |
|
|
|
2006-12-17 14:56:13
Post
#13
|
|
|
Super Member ![]() ![]() ![]() ![]() ![]() Group: Advanced Members Posts: 1,304 Joined: 2004-09-20 Member No.: 13,058 |
Thanks for the detailed reply Rikker,
Where are you coming up with Tax having 2100 and Haas having 4000. I see each list having 1000 words? And can you further define what you mean by corpus. Is this some underlying body of work the statistics are based on? What is this body of work for each. |
|
|
|
2006-12-17 18:47:13
Post
#14
|
|
|
The Hubgoblin ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Global Moderators Posts: 9,369 Joined: 2003-08-15 Member No.: 3,636 |
About the word corpus as used in linguistics: http://en.wikipedia.org/wiki/Text_corpus
|
|
|
|
2006-12-17 19:00:48
Post
#15
|
|
|
Senior Member ![]() ![]() ![]() ![]() Group: Members Posts: 312 Joined: 2004-10-01 From: Canada/Thailand Member No.: 13,335 |
|
|
|
|
2006-12-17 19:06:42
Post
#16
|
|
|
The Hubgoblin ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Global Moderators Posts: 9,369 Joined: 2003-08-15 Member No.: 3,636 |
QUOTE Where do I get the translation for the words? Paste them into www.thai2english.com or buy a dictionary. The process of looking words up is good for the memorization process. The most common words have many different functions. เป็น for example - while it often just means 'be' or 'is', as in ผมเป็นหมอ I am a doctor, ...it can also be a grammatical function word indicating result or manner: หั่นเนื้อเป็นชิ้น Cut beef into slices. เป็นเวลาสองวัน ...for two days ...as well as ability: เล้นกีตาร์ไม่เป็น can not play guitar... |
|
|
|
2006-12-23 22:22:15
Post
#17
|
|
|
Senior Member ![]() ![]() ![]() ![]() Group: Members Posts: 119 Joined: 2005-01-13 Member No.: 15,916 |
In english they have a list like this - like the one for the Oxford Advanced learners dictionary, which has a careful selection of common use words. I've been searching for a list like this in Thai for years but never found one. Any pointers would be appreciated. Not sure if this would help, but there's a great vocabulary builder from a company called Unforgettable Languages that uses easy memory aids for commonly used words. This is a great addition to your language learning IMHO. It is an easy way to pick up, in this case, about 230 commonly used words. I used it for Thai and Mandarin. It can be found at: www.unforgettablelanguages.com |
|
|
|
2006-12-24 00:54:50
Post
#18
|
|
|
Senior Member ![]() ![]() ![]() ![]() Group: Banned Posts: 110 Joined: 2004-11-03 Member No.: 14,090 |
|
|
|
|
2006-12-24 04:16:52
Post
#19
|
|
|
Star Member ![]() ![]() ![]() ![]() ![]() Group: Advanced Members Posts: 1,473 Joined: 2006-08-31 Member No.: 33,919 |
In english they have a list like this - like the one for the Oxford Advanced learners dictionary, which has a careful selection of common use words. I've been searching for a list like this in Thai for years but never found one. Any pointers would be appreciated. Not sure if this would help, but there's a great vocabulary builder from a company called Unforgettable Languages that uses easy memory aids for commonly used words. This is a great addition to your language learning IMHO. It is an easy way to pick up, in this case, about 230 commonly used words. I used it for Thai and Mandarin. It can be found at: www.unforgettablelanguages.com that is a good link indeed. a good system for vocab building. eg. imagine a fat GUY (gai) eating a large chicken. and so on. This post has been edited by Grover: 2006-12-24 04:43:08 |
|
|
|
2009-01-05 15:01:59
Post
#20
|
|
|
Member ![]() ![]() Group: Members Posts: 11 Joined: 2007-06-06 From: Chiangrai Member No.: 47,199 |
I realize this thread is getting old, but this has been really helpful to me. Why waste time learning a whole dictionary right away? Start with the 100 most common words and work up to the 1000 most common then perhaps 2500. That's a good vocabulary!
Using Rikkors lists along with thai2eng, and some others on ThailandQA.com, I have massaged the data trying to find concensus or at least trends. I will try to attach the spreadsheets I am using here, but I really don't spend much time using forums, so I may not succeed. Two spreadsheets. All the lists were included, sorted, duplicates removed, then trimmed. Frequency table provided showing the degree of correlation between the lists. As Rikkor kindly pointed out, the tax list seems to contain tax related terms and numbers and special characters were deleted. All errors are mine alone and suggestions and corrections are gratefully accepted. Happy new year.
Attached File(s)
100_Most_Common___Combined.xls ( 175K )
Number of downloads: 416
1000_Most_Common___Combined.xls ( 1.03MB )
Number of downloads: 367 |
|
|
|
2009-01-06 09:14:23
Post
#21
|
|
|
Super Member ![]() ![]() ![]() ![]() ![]() Group: Advanced Members Posts: 1,147 Joined: 2005-01-13 From: Bangkok Member No.: 15,902 |
Two spreadsheets. All the lists were included, sorted, duplicates removed, then trimmed. Frequency table provided showing the degree of correlation between the lists. Nice. I went at it from a slightly different angle. Also, I'm going for 3000 as (I might be wrong) 1000 doesn't have enough word combos. I didn't grab the whole frequency list as you did, only Mary's. Then I added from AUA, Byki (not all), AWL, Thai-language.com starred, etc. Mike from Thai2English.com also has a frequency list that will come in at some point. Then there's a dictionary with the supposedly top 3000 but I found what I believe are 3 mistakes just in the first couple of pages, so I backed off from seriously checking it against mine. My eventual aim is to put each with phrases as words on their own don't work with the way I learn. Then when I get to a certain point, I'll have someone in the know look at as there are sure to be a ton of iffy words. But right now, I'm just nibbling away and enjoying the finding of new words as I go. |
|
|
|
2009-01-06 13:12:51
Post
#22
|
|
|
Advanced Member ![]() ![]() ![]() Group: Members Posts: 94 Joined: 2008-05-12 Member No.: 61,971 |
A question for the advanced members, how well do you think this list relates to spoken Thai as opposed to the written Thai that provides the source ?
I would say that ก็ has to be number 1 in terms of spoken Thai surely ! |
|
|
|
2009-01-06 14:18:11
Post
#23
|
|
|
The Hubgoblin ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Global Moderators Posts: 9,369 Joined: 2003-08-15 Member No.: 3,636 |
I'd forgot about pinning this topic. Great thing that you brought it back again.
QUOTE My eventual aim is to put each with phrases as words on their own don't work with the way I learn. It's not just you - it won't work for anyone who wants to speak anything resembling intelligible Thai. The example I posted above about how เป็น is used, is just a brief introduction to the word, and can be extended, not to mention the same thing could [should!] be repeated for most of the most common words. In other words, these words can have completely different functions depending on the context. If one doesn't learn grammatical patterns as well as idioms too, the words by themselves, with just one translation in English and no usage examples, won't do you much more good than getting 50 tons of bricks and mortar and an order to reconstruct Wat Benjamabophit, Suvarnabhumi airport, or Baiyoke 2. QUOTE A question for the advanced members, how well do you think this list relates to spoken Thai as opposed to the written Thai that provides the source ? I would say that ก็ has to be number 1 in terms of spoken Thai surely ! For one, I think you'll find much more particles. Especially if you properly distinguish between ครับ อะ ฮะ ค่ะ จ้ะ วะ หว่า etc... you're right about ก็ too - it's a hesitation word. |
|
|
|
2009-01-31 12:12:54
Post
#24
|
|
|
Senior Member ![]() ![]() ![]() ![]() Group: Members Posts: 177 Joined: 2006-04-04 From: เเฮปี้เพลส Member No.: 28,513 |
This list has 1172 common words.
http://www.thai-language.com/ref/starred |
|
|
|
2009-03-08 23:03:03
Post
#25
|
|
|
Newbie ![]() Group: Members Posts: 1 Joined: 2009-03-08 Member No.: 78,616 |
thank you very very very much
|
|
|
|
![]() ![]() |
Similar Topics
| Topic Title | Replies | Topic Starter | Views | Last Action | |||
|---|---|---|---|---|---|---|---|
![]() |
66 | webfact | 6,946 | Today, 2010-03-22 18:10:42 Last post by: chiang mai |
|||
![]() |
16 | webfact | 731 | Today, 2010-03-22 17:57:10 Last post by: moresomekl |
|||
![]() |
5 | tropo | 59 | Today, 2010-03-22 17:42:24 Last post by: twschw |
|||
|
Time is now: 2010-03-22 18:14:27 |