Update on article count for Baltic language Wikipedias, December 2009

Tuesday, December 15, 2009

All three Wikipedias from the Baltic countries are about to reach a new milestone, so here's where they are at the moment:

Lithuanian is now at 98,468, so getting quite close to 100,000. Passing 100,000 might not happen for another week or two or three though. The Lithuanian Wikipedia has always seemed to be quite conscious of the article count, and recently they've added a bar as well:

Next is Estonian, which is about to reach 70,000.

And finally Latvian:

Not sure what the deal is with Latvian - just barely on the verge of 25,000 articles is quite low. Even Occitan has over 20,000 and Latin is now at 33,000+. The populations of the respective countries are 3.3 million for Lithuania, 1.3 million for Estonia, and 2.2 million for Latvia. Perhaps it's just lack of interest. The Korean Wikipedia spent quite a bit of time at a pretty tiny size for its population, but after reaching something over 30,000 it really began to take off and now is at 122,000+, and has now become quite the source of information for a lot of subjects that can't be found anywhere else. Latvian just may not have reached that critical mass yet.

While we're on the subject, a bit of commentary on the Baltic languages: everybody knows that Estonian is not a Baltic language in spite of being a Baltic country, and this is written in every guidebook to the Baltic states. This is of course true, but this is looking at the languages from a historical perspective: Estonian is part of one branch, Latvian and Lithuanian are from another, etc. But what is often more important to the reader is not whether two languages are historically related (i.e. the descendants of a single historical language), but whether learning Language A is helpful in learning Language B. That is, whether the knowledge and habits one has acquired from learning one help when moving on to another. And from this point of view, the three languages are quite complementary. Here are a few ways in which the three complement each other:

- no articles. Translating from one language without articles into another without them takes less effort than if one has them and the other doesn't.

- use of suffixes. That is, the action in using a word takes place at the end. Lithuanian internetas becomes interneto (of the internet), Estonian internet becomes interneti. Lithuanian ekonomika becomes ekonomikoje (in the/a economy), in Estonian majandus (economy) becomes majanduses.

- word order is generally SVO, but is still relatively free and can be switched around for fun or emphasis.

- Latvian and Estonian both have the stress on the first syllable (Lithuanian is different here though).

- Estonian has acquired a few hundred loanwords from the Baltic languages, such as ratas (wheel).

- Estonian doesn't have vowel harmony (other Finno-Ugric languages do).

- And of course there is the influence of other neighboring languages, such as Russian and some others, which have also brought in a certain amount of shared vocabulary.

This is a little similar to Turkish and Persian, two languages that also have completely different linguistic ancestors but have a fair amount of shared influence, making one helpful in learning the other. Turkish and Persian have a much larger shared corpus than Estonian and Latvian/Lithuanian, but then again the three Baltic states all use the Latin alphabet whereas Turkish uses the Latin alphabet and Iran the Perso-Arabic script, so they may just balance each other out in this way.


Anonymous said...

"The Korean Wikipedia [...] has now become quite the source of information for a lot of subjects that can't be found anywhere else."

Just curious, what are some examples?

데이빛 / Mithridates said...

Here's a good example:


Non-spammy information on a currently popular show. One of the characters on that show is a guy that is half-Korean half-French/Canadian (but looks white):


He was born in St. Pierre et Miquelon. Finding information on subjects like this without having to navigate through script-heavy page after page used to be almost impossible.

