Thailand Forum

Welcome Guest ( Log In | Register )

 Unlimited calls from/to and within Thailand (incl. mobiles) from 199 Baht/month
  Special: Book your Thailand hotel with up to 75% off: Go smart - Go Agoda
2 Pages V   1 2 >  
Reply to this topicStart new topic
List Of 3000 Most Common Thai Words, anyone got one ?
Grover
post 2006-12-14 13:48:23
Post #1


Star Member
*****

Group: Advanced Members
Posts: 1,473
Joined: 2006-08-31
Member No.: 33,919



In english they have a list like this - like the one for the Oxford Advanced learners dictionary, which has a careful selection of common use words. I've been searching for a list like this in Thai for years but never found one. Any pointers would be appreciated.
Go to the top of the page
 
Rikker
post 2006-12-14 14:49:15
Post #2


Super Member
*****

Group: Global Moderators
Posts: 1,261
Joined: 2006-02-18
From: Bangkok
Member No.: 27,106



Grover, the best I can do is I have a list of the 1000 most common words according to four sources of language corpora. I've attached a spreadsheet that I converted to HTML.

The best one is the Mary Haas list. Not sure about Haas, but the other three I know are all computed automatically, so the digits 0 to 9, among other things, count as "words" in their list, as well as some other things that aren't common Thai at all, but appear frequently in their corpora because of a large number of technical texts.

Hope this is helpful.

This post has been edited by Rikker: 2006-12-14 14:51:28
Attached File(s)
Attached File  thai_word_frequencies.htm ( 745.68K ) Number of downloads: 87622
 
Go to the top of the page
 
+Quote Post
Grover
post 2006-12-14 15:15:27
Post #3


Star Member
*****

Group: Advanced Members
Posts: 1,473
Joined: 2006-08-31
Member No.: 33,919



Rikker, that is awesome. I've been looking for something exactly like this for years. Cheers thumbsup.gif Where did you get it from BTW ?
Go to the top of the page
 
+Quote Post
johnh101
post 2006-12-14 21:46:33
Post #4


Super Member
*****

Group: Advanced Members
Posts: 1,570
Joined: 2005-02-08
Member No.: 16,612



Sadly there is no phonetical translations.
Go to the top of the page
 
+Quote Post
Rikker
post 2006-12-14 23:25:49
Post #5


Super Member
*****

Group: Global Moderators
Posts: 1,261
Joined: 2006-02-18
From: Bangkok
Member No.: 27,106



thai2english.com to the rescue! copy + paste = phonetic translations
Go to the top of the page
 
+Quote Post
Rikker
post 2006-12-16 11:42:49
Post #6


Super Member
*****

Group: Global Moderators
Posts: 1,261
Joined: 2006-02-18
From: Bangkok
Member No.: 27,106



QUOTE(Grover @ 2006-12-14 03:15:27) *
Rikker, that is awesome. I've been looking for something exactly like this for years. Cheers thumbsup.gif Where did you get it from BTW ?


You're very welcome. But I forgot to answer your question. The data comes courtesy of Doug Cooper at CRCL.
Go to the top of the page
 
+Quote Post
meadish_sweetbal...
post 2006-12-16 13:10:20
Post #7


The Hubgoblin
*******

Group: Global Moderators
Posts: 9,369
Joined: 2003-08-15
Member No.: 3,636



You rule Rikker. Fantastic resource! jap.gif
Go to the top of the page
 
+Quote Post
wasabi
post 2006-12-16 15:15:21
Post #8


Super Member
*****

Group: Advanced Members
Posts: 1,304
Joined: 2004-09-20
Member No.: 13,058



How should I be reading this list? What do the column headers Haas, Links, Orchid and Tax represent?
What do the numbers mean and why does each list have different numbers?

For example the first row, why do three lists have การ but one has เป็น?

Haas Links Orchid Tax
366 เป็น 15978 การ 11888 การ 9861 การ



I've put it in a slightly cleaner Excel Format attached here.

This post has been edited by wasabi: 2006-12-16 15:17:54
Attached File(s)
Attached File  frequency.xls ( 176K ) Number of downloads: 709
 
Go to the top of the page
 
+Quote Post
Grover
post 2006-12-16 15:21:03
Post #9


Star Member
*****

Group: Advanced Members
Posts: 1,473
Joined: 2006-08-31
Member No.: 33,919



QUOTE(meadish_sweetball @ 2006-12-16 13:10:20) *
You rule Rikker. Fantastic resource! jap.gif


can you pin it meadish? It's good for beginners.
Go to the top of the page
 
+Quote Post
mike_l
post 2006-12-16 16:07:22
Post #10


Senior Member
****

Group: Members
Posts: 194
Joined: 2005-03-31
Member No.: 17,952



QUOTE
Grover, the best I can do is I have a list of the 1000 most common words according to four sources of language corpora. I've attached a spreadsheet that I converted to HTML.

The best one is the Mary Haas list. Not sure about Haas, but the other three I know are all computed automatically, so the digits 0 to 9, among other things, count as "words" in their list, as well as some other things that aren't common Thai at all, but appear frequently in their corpora because of a large number of technical texts.

Hope this is helpful.


This is interesting, thanks Rikker. I did a similar thing a while back using all the text that people paste into thai2english.com, and for comparison the top 100 results in order were :

ที่ , และ , จะ , การ , มี , ใน , ได้ , ของ , เป็น , ให้ , ไป , ก็ , ไม่ , ว่า , แล้ว , มา , กับ , คุณ , ใจ , คน , เรา , ฉัน , แต่ , นะ , นี้ , ครับ , อยู่ , เธอ , กัน , ผม , โดย , มัน , จาก , ต้อง , ด้วย , เลย , ยัง , หรือ , ทำ , ใช้ , คือ , เขา , มาก , ผู้ , บอก , พี่ , ดู , เมื่อ , วัน , อะไร , เรื่อง , ถ้า , ดี , เพราะ , อยาก , ค่ะ , ไม่ได้ , ปี , อีก , เพื่อ , พระ , รัก , นั้น , ตัว , ถึง , งาน , สามารถ , หน้า , เวลา , ใคร , ไทย , เพลง , แบบ , ซึ่ง , ไว้ , ขอ , ส่ง , ต่อ , ความ , ท่าน , อย่าง , ใหม่ , เล่น , ก่อน , หา , บ้าน , ตาม , ทาง , สำหรับ , หนึ่ง , เอา , เค้า , คะ , ทำให้ , ขึ้น , ไม่มี , อ่าน , บาท , ราย , ชื่อ

ที่ was the most common by miles (about twice the count of และ), whereas all the others were relatively close. jap.gif
Go to the top of the page
 
+Quote Post
In the Rai!
post 2006-12-16 18:02:11
Post #11


Senior Member
****

Group: Advanced Members
Posts: 676
Joined: 2006-06-14
From: ไม่มีที่อยู่ คนจรจัด
Member No.: 31,054



QUOTE(mike_l @ 2006-12-16 16:07:22) *
QUOTE
Grover, the best I can do is I have a list of the 1000 most common words according to four sources of language corpora. I've attached a spreadsheet that I converted to HTML.

The best one is the Mary Haas list. Not sure about Haas, but the other three I know are all computed automatically, so the digits 0 to 9, among other things, count as "words" in their list, as well as some other things that aren't common Thai at all, but appear frequently in their corpora because of a large number of technical texts.

Hope this is helpful.


This is interesting, thanks Rikker. I did a similar thing a while back using all the text that people paste into thai2english.com, and for comparison the top 100 results in order were :

ที่ , และ , จะ , การ , มี , ใน , ได้ , ของ , เป็น , ให้ , ไป , ก็ , ไม่ , ว่า , แล้ว , มา , กับ , คุณ , ใจ , คน , เรา , ฉัน , แต่ , นะ , นี้ , ครับ , อยู่ , เธอ , กัน , ผม , โดย , มัน , จาก , ต้อง , ด้วย , เลย , ยัง , หรือ , ทำ , ใช้ , คือ , เขา , มาก , ผู้ , บอก , พี่ , ดู , เมื่อ , วัน , อะไร , เรื่อง , ถ้า , ดี , เพราะ , อยาก , ค่ะ , ไม่ได้ , ปี , อีก , เพื่อ , พระ , รัก , นั้น , ตัว , ถึง , งาน , สามารถ , หน้า , เวลา , ใคร , ไทย , เพลง , แบบ , ซึ่ง , ไว้ , ขอ , ส่ง , ต่อ , ความ , ท่าน , อย่าง , ใหม่ , เล่น , ก่อน , หา , บ้าน , ตาม , ทาง , สำหรับ , หนึ่ง , เอา , เค้า , คะ , ทำให้ , ขึ้น , ไม่มี , อ่าน , บาท , ราย , ชื่อ

ที่ was the most common by miles (about twice the count of และ), whereas all the others were relatively close. jap.gif



Mike thats interisting to se how commomly used..

cheers thumbsup.gif
Go to the top of the page
 
+Quote Post
Rikker
post 2006-12-17 13:07:28
Post #12


Super Member
*****

Group: Global Moderators
Posts: 1,261
Joined: 2006-02-18
From: Bangkok
Member No.: 27,106



QUOTE(wasabi @ 2006-12-16 03:15:21) *
How should I be reading this list? What do the column headers Haas, Links, Orchid and Tax represent?
What do the numbers mean and why does each list have different numbers?

For example the first row, why do three lists have การ but one has เป็น?

Haas Links Orchid Tax
366 เป็น 15978 การ 11888 การ 9861 การ

I've put it in a slightly cleaner Excel Format attached here.


Thanks for doing that. My original is in Excel, I just wanted to make sure everyone could access it.

The four columns are four different text collections/corpora. One from Mary Haas, another from NECTEC's Linguistics and Knowledge Science Laboratory (LINKS), Chula University's Orchid Corpus (appears to be offline right now), and the one labeled Tax I'm not clear on the exact source, but I think it might be the Thai tax code or a corpus of legal documents of some kind, given the high frequency of tax-related terms in their top 1000 words.

The number next to each word is the number of times that word appears in the corpus. The number at the top of each column is just a sum of the total number of occurrences of top 1000 words.

As for why the lists have different words in the top spots, well, that has to do with at least three things: [a] the size of the corpus, [b] the variety (or lack of it) of the subject matter collected in the corpus, [c] the method used to count occurrences.

The line you've quoted is the top word in each of the four corpora. You can see the Haas corpus is a much smaller corpus, with its top word only occurring 366 times. The other three, all much larger, agree that การ is more common. Orchid is largest at 416,000, but I don't know what constitutes a "word" for the purposes of counting in the Orchid corpus. While English "words" don't correspond to the collections of letters between spaces as much as we tend to think they do, it makes it easy for establishing a clear meaning of "word" for the purpose of gathering corpora (and that is easily countable via automatic means). Thai... a bit trickier. I know the corpora on thai.sealang.net are all counted via number of characters, not words.

Also, one telltale sign that Tax is a very narrow corpus subject-matter-wise is the fact that while it is 269,000 words large, it only has 2100 distinct words in it, while even in Haas there are 4000 distinct words out of 27000 total words.

This post has been edited by Rikker: 2006-12-17 13:28:30
Go to the top of the page
 
+Quote Post
wasabi
post 2006-12-17 14:56:13
Post #13


Super Member
*****

Group: Advanced Members
Posts: 1,304
Joined: 2004-09-20
Member No.: 13,058



Thanks for the detailed reply Rikker,

Where are you coming up with Tax having 2100 and Haas having 4000. I see each list having 1000 words? And can you further define what you mean by corpus. Is this some underlying body of work the statistics are based on? What is this body of work for each.
Go to the top of the page
 
+Quote Post
meadish_sweetbal...
post 2006-12-17 18:47:13
Post #14


The Hubgoblin
*******

Group: Global Moderators
Posts: 9,369
Joined: 2003-08-15
Member No.: 3,636



About the word corpus as used in linguistics: http://en.wikipedia.org/wiki/Text_corpus
Go to the top of the page
 
+Quote Post
WilliamCave
post 2006-12-17 19:00:48
Post #15


Senior Member
****

Group: Members
Posts: 312
Joined: 2004-10-01
From: Canada/Thailand
Member No.: 13,335



Thanks for the list looks very good but were do I get the translation for the words.
QUOTE(meadish_sweetball @ 2006-12-16 13:10:20) *
You rule Rikker. Fantastic resource! jap.gif
Go to the top of the page
 
+Quote Post
meadish_sweetbal...
post 2006-12-17 19:06:42
Post #16


The Hubgoblin
*******

Group: Global Moderators
Posts: 9,369
Joined: 2003-08-15
Member No.: 3,636



QUOTE
Where do I get the translation for the words?


Paste them into www.thai2english.com or buy a dictionary. The process of looking words up is good for the memorization process.

The most common words have many different functions. เป็น for example - while it often just means 'be' or 'is', as in

ผมเป็นหมอ I am a doctor,
...it can also be a grammatical function word indicating result or manner:
หั่นเนื้อเป็นชิ้น Cut beef into slices. เป็นเวลาสองวัน ...for two days
...as well as ability:
เล้นกีตาร์ไม่เป็น can not play guitar...
Go to the top of the page
 
+Quote Post
No beleeeeve!
post 2006-12-23 22:22:15
Post #17


Senior Member
****

Group: Members
Posts: 119
Joined: 2005-01-13
Member No.: 15,916



QUOTE(Grover @ 2006-12-14 13:48:23) *
In english they have a list like this - like the one for the Oxford Advanced learners dictionary, which has a careful selection of common use words. I've been searching for a list like this in Thai for years but never found one. Any pointers would be appreciated.


Not sure if this would help, but there's a great vocabulary builder from a company called Unforgettable Languages that uses easy memory aids for commonly used words. This is a great addition to your language learning IMHO. It is an easy way to pick up, in this case, about 230 commonly used words. I used it for Thai and Mandarin.

It can be found at: www.unforgettablelanguages.com
Go to the top of the page
 
+Quote Post
boxig
post 2006-12-24 00:54:50
Post #18


Senior Member
****

Group: Banned
Posts: 110
Joined: 2004-11-03
Member No.: 14,090



QUOTE(johnh101 @ 2006-12-14 21:46:33) *
Sadly there is no phonetical translations.

Check this page: Translation and Phonetic

box
Go to the top of the page
 
+Quote Post
Grover
post 2006-12-24 04:16:52
Post #19


Star Member
*****

Group: Advanced Members
Posts: 1,473
Joined: 2006-08-31
Member No.: 33,919



QUOTE(No beleeeeve! @ 2006-12-23 22:22:15) *
QUOTE(Grover @ 2006-12-14 13:48:23) *
In english they have a list like this - like the one for the Oxford Advanced learners dictionary, which has a careful selection of common use words. I've been searching for a list like this in Thai for years but never found one. Any pointers would be appreciated.


Not sure if this would help, but there's a great vocabulary builder from a company called Unforgettable Languages that uses easy memory aids for commonly used words. This is a great addition to your language learning IMHO. It is an easy way to pick up, in this case, about 230 commonly used words. I used it for Thai and Mandarin.

It can be found at: www.unforgettablelanguages.com


that is a good link indeed. a good system for vocab building.
eg. imagine a fat GUY (gai) eating a large chicken. and so on.

This post has been edited by Grover: 2006-12-24 04:43:08
Go to the top of the page
 
+Quote Post
DJPogo98
post 2009-01-05 15:01:59
Post #20


Member
**

Group: Members
Posts: 11
Joined: 2007-06-06
From: Chiangrai
Member No.: 47,199



I realize this thread is getting old, but this has been really helpful to me. Why waste time learning a whole dictionary right away? Start with the 100 most common words and work up to the 1000 most common then perhaps 2500. That's a good vocabulary!

Using Rikkors lists along with thai2eng, and some others on ThailandQA.com, I have massaged the data trying to find concensus or at least trends. I will try to attach the spreadsheets I am using here, but I really don't spend much time using forums, so I may not succeed.

Two spreadsheets. All the lists were included, sorted, duplicates removed, then trimmed. Frequency table provided showing the degree of correlation between the lists.

As Rikkor kindly pointed out, the tax list seems to contain tax related terms and numbers and special characters were deleted.

All errors are mine alone and suggestions and corrections are gratefully accepted.

Happy new year.
Attached File(s)
Attached File  100_Most_Common___Combined.xls ( 175K ) Number of downloads: 416
Attached File  1000_Most_Common___Combined.xls ( 1.03MB ) Number of downloads: 367
 
Go to the top of the page
 
+Quote Post
desi
post 2009-01-06 09:14:23
Post #21


Super Member
*****

Group: Advanced Members
Posts: 1,147
Joined: 2005-01-13
From: Bangkok
Member No.: 15,902



QUOTE (DJPogo98 @ 2009-01-05 15:01:59) *
Two spreadsheets. All the lists were included, sorted, duplicates removed, then trimmed. Frequency table provided showing the degree of correlation between the lists.


Nice. I went at it from a slightly different angle. Also, I'm going for 3000 as (I might be wrong) 1000 doesn't have enough word combos. I didn't grab the whole frequency list as you did, only Mary's. Then I added from AUA, Byki (not all), AWL, Thai-language.com starred, etc. Mike from Thai2English.com also has a frequency list that will come in at some point. Then there's a dictionary with the supposedly top 3000 but I found what I believe are 3 mistakes just in the first couple of pages, so I backed off from seriously checking it against mine.

My eventual aim is to put each with phrases as words on their own don't work with the way I learn. Then when I get to a certain point, I'll have someone in the know look at as there are sure to be a ton of iffy words. But right now, I'm just nibbling away and enjoying the finding of new words as I go.
Go to the top of the page
 
+Quote Post
mynextgig
post 2009-01-06 13:12:51
Post #22


Advanced Member
***

Group: Members
Posts: 94
Joined: 2008-05-12
Member No.: 61,971



A question for the advanced members, how well do you think this list relates to spoken Thai as opposed to the written Thai that provides the source ?

I would say that ก็ has to be number 1 in terms of spoken Thai surely !
Go to the top of the page
 
+Quote Post
meadish_sweetbal...
post 2009-01-06 14:18:11
Post #23


The Hubgoblin
*******

Group: Global Moderators
Posts: 9,369
Joined: 2003-08-15
Member No.: 3,636



I'd forgot about pinning this topic. Great thing that you brought it back again.

QUOTE
My eventual aim is to put each with phrases as words on their own don't work with the way I learn.


It's not just you - it won't work for anyone who wants to speak anything resembling intelligible Thai. The example I posted above about how เป็น is used, is just a brief introduction to the word, and can be extended, not to mention the same thing could [should!] be repeated for most of the most common words.

In other words, these words can have completely different functions depending on the context.

If one doesn't learn grammatical patterns as well as idioms too, the words by themselves, with just one translation in English and no usage examples, won't do you much more good than getting 50 tons of bricks and mortar and an order to reconstruct Wat Benjamabophit, Suvarnabhumi airport, or Baiyoke 2.

QUOTE
A question for the advanced members, how well do you think this list relates to spoken Thai as opposed to the written Thai that provides the source ?

I would say that ก็ has to be number 1 in terms of spoken Thai surely !


For one, I think you'll find much more particles. Especially if you properly distinguish between ครับ อะ ฮะ ค่ะ จ้ะ วะ หว่า etc... you're right about ก็ too - it's a hesitation word.
Go to the top of the page
 
+Quote Post
klons
post 2009-01-31 12:12:54
Post #24


Senior Member
****

Group: Members
Posts: 177
Joined: 2006-04-04
From: เเฮปี้เพลส
Member No.: 28,513



This list has 1172 common words.
http://www.thai-language.com/ref/starred
Go to the top of the page
 
+Quote Post
smallear
post 2009-03-08 23:03:03
Post #25


Newbie
*

Group: Members
Posts: 1
Joined: 2009-03-08
Member No.: 78,616



thank you very very very much biggrin.gif biggrin.gif biggrin.gif
Go to the top of the page
 
+Quote Post
« Next Oldest · Thai language · Next Newest »
 

2 Pages V   1 2 >
Reply to this topicStart new topic

 

Collapse

> Similar Topics                    

    Topic Title Replies Topic Starter Views Last Action
No new   66 webfact 6,946 Today, 2010-03-22 18:10:42
Last post by: chiang mai
No new   16 webfact 731 Today, 2010-03-22 17:57:10
Last post by: moresomekl
No New Posts   5 tropo 59 Today, 2010-03-22 17:42:24
Last post by: twschw



RSS Time is now: 2010-03-22 18:14:27

Thailand Hotel links: Phuket hotels | Bangkok Hotels | Pattaya Hotels | Koh Samui Hotels |Thailand hotell
Thaivisa.com Links: Thailand News | Broadband Speed Test | Business News | Thai Stocks | Baht Exchange Rates | Thailand Weather | Tourist visa | Work permit | Non-Immigrant visa | Residency | Visa run | Reentry permit | Overstay | Finance | FAQ | Incorporation | Newsletter | Thailand Shopping | Links | About | Search | Tag cloud

THAILAND'S LEADING EXPATRIATE PORTAL - Advertise here now!

Thailand Expat Forum © 2002-2010 Thai Visa - thaivisa.com | All rights reserved.

Bangkok Hotels | Pattaya Hotels | Thailand hotell | Chiang Mai hotels