Learning vocabulary in a new language

Saturday, March 29, 2008

There's a really good study on how much vocabulary a person has to know in a language to understand most of it, and this page should be bookmarked by all language students:


English is the sample language here but the point is the same for other languages as well. Take a look at how much of a language's written material you can understand simply by learning the most frequent words first:

Table 1: Coverage and Standard Deviation with Varying Vocabulary Size
[Text Length = 1,000 / Sample Size = 4 / Iteration = 1,000]

Vocabulary Size Coverage (%) SD
100 53.1 1.60
200 60.1 1.63
300 63.9 1.67
400 66.8 1.69
500 69.4 1.68
600 71.2 1.68
700 72.9 1.60
800 74.2 1.66
900 75.5 1.62
1,000 76.8 1.61
2,000 84.2 1.35
3,000 87.9 1.23
4,000 90.4 1.08
5,000 92.0 1.00
6,000 93.1 0.87
7,000 94.0 0.77
8,000 94.7 0.77
9,000 95.2 0.69
10,000 95.7 0.72
11,000 96.0 0.61
12,000 96.3 0.58
13,000 96.6 0.55
14,000 96.9 0.51

That means that with 100 words you can already read half, with 900 words you can read 75%, and once you've gone past 4,000 words you can understand 90%. There's a certain point you get to in a language where you start to be able to grasp the meaning of words you don't know simply from context. I don't remember at what level this begins but I think it comes after a few thousand words, and once a person reaches this level the only way to get fluent is through massive amounts of material, which means just chilling and reading for hours and hours a day.

See the Wikipedia page on collocation for more on this subject. Only reading and hearing reams and reams of material will make one aware of how you can say that a person is tall but mountains are high, that coffee is weak as opposed to thick, and how to use all the other words that might seem to the non-native speaker to be acceptable but really are not.

This is one reason why I like Ecclesiastes. About 3,000 words in total, and some 950 or so individual words to know after you understand the whole thing, plus lots of repetition and context. That means that anyone who has memorized the whole thing will now be able to understand 75% of the language. Not bad for a single book.

(of course that requires full memorization, not just skimming through a few times. That's the hard part)


Unknown said...

I have added the link you provided to a Wikipedia article:

Me said...

Cool, that's a good link. Bob Petry first provided it on Auxlang a few months back and I found it again the other day by chance.

Antonielly said...

I have added the link you provided to a Wikipedia article:

  © Blogger templates Newspaper by Ourblogtemplates.com 2008

Back to TOP