The real list of Wikipedias

Thursday, February 26, 2009

The main list of Wikipedias in all languages can be seen here. This list automatically sorts the various Wikipedias by number of articles, after first dividing them into 1) over a million 2) over 100,000 3) over 10,000 4) over 1,000, and so on. Thanks to the Volapük Wikipedia though it's been shown that the true worth of a Wikipedia does not always come from the number of articles, but often from the number of edits. One reason for this is that different Wikipedias have different criteria for what makes an article, and a lot of them will merge several related pages into one large page where another Wikipedia will prefer them to be split up into smaller ones. First of all, here's the list according to article count for those above 100,000:

(ignoring English which is at #1 with a few bazillion articles)

Language Language (local) Articles
2 German Deutsch 871,777
3 French Français 771,020
4 Polish Polski 582,772
5 Japanese 日本語 565,807
6 Italian Italiano 544,834
7 Dutch Nederlands 521,965
8 Portuguese Português 462,176
9 Spanish Español 448,483
10 Russian Русский 363,562
11 Swedish Svenska 307,663
12 Chinese 中文 235,558
13 Norwegian (Bokmål) Norsk (Bokmål) 209,082
14 Finnish Suomi 195,145
15 Catalan Català 161,975
16 Ukrainian Українська 139,193
17 Turkish Türkçe 124,946
18 Romanian Română 121,754
19 Czech Čeština 120,387
20 Hungarian Magyar 119,588
21 Volapük Volapük 118,759
22 Esperanto Esperanto 111,257
23 Slovak Slovenčina 105,729
24 Danish Dansk 103,774
25 Indonesian Bahasa Indonesia 100,349

In some cases this is accurate with German and French being at the top, some other fairly large languages in the middle and languages with a speaking population of around 10 million somewhere above 100,000. However, some Wikipedias there have an extraordinarily high number of edits compared to their article size (Spanish) while some are extremely low (Volapük). Here's what the list looks like compared to the number of edits:

Language Language (local) Edits
2 German Deutsch 59,249,161
3 French Français 40,855,127
9 Spanish Español 25,862,703
5 Japanese 日本語 25,047,937
6 Italian Italiano 24,525,940
4 Polish Polski 16,838,187
7 Dutch Nederlands 16,133,979
10 Russian Русский 14,495,738
8 Portuguese Português 14,484,935
12 Chinese 中文 9,500,869
11 Swedish Svenska 8,757,938
14 Finnish Suomi 6,474,549
17 Turkish Türkçe 5,426,589
13 Norwegian (Bokmål) Norsk (Bokmål) 5,371,832
20 Hungarian Magyar 5,171,646
19 Czech Čeština 3,742,312
15 Catalan Català 3,385,668
24 Danish Dansk 3,004,581
18 Romanian Română 2,849,603
25 Indonesian Bahasa Indonesia 2,385,576
16 Ukrainian Українська 2,311,234
23 Slovak Slovenčina 2,159,140
22 Esperanto Esperanto 2,139,079
21 Volapük Volapük 1,745,080

All of a sudden Spanish rockets up to fourth place, and Volapük is now at the bottom.

This part of the chart only shows languages with at least 100,000 articles though, and there are actually quite a few Wikipedias under 100,000 with many more edits than the ones above, and these are:

Language Language (local) Edits
28 Hebrew עברית 6,931,960
27 Arabic العربية 3,866,850
26 Korean 한국어 3,093,020
33 Bulgarian Български 2,414,382
32 Serbian Српски / Srpski 2,269,694
35 Persian فارسی 1,969,766
30 Vietnamese Tiếng Việt 1,859,050
31 Slovenian Slovenščina 1,751,257

Hebrew really stands out here in that in terms of number of edits it's just after Swedish, which is in 11th place.

Finally, Volapük: even though the 100,000+ articles were mostly created by one person using a bot, why is the number of edits still relatively high? If the articles were uploaded once and then left alone, shouldn't the number of edits be only slightly larger than the number of articles?

That would be the case except for the presence of other interwiki bots, whose responsibility is to update links between Wikipedias of every language. That means that if you create an article on your Wikipedia and then link it to another one, eventually a bot will be along to update those links when someone in another language creates an article on the same thing. Add to that bots that update image links and a variety of other things, and you end up with quite a few edits for even the smallest of articles, none of which add any content noticeable to the reader. That's why you'll see page histories like this:

For an article that in the end only has this much text for the reader to peruse:

Alà dei Sardi binon zif in topäd: Sardegna, in Litaliyän. Alà dei Sardi topon videtü 40° 39’ N e lunetü 9° 20’ L.

Sürfat ela Alà dei Sardi binon mö 188,6 km².

Alà dei Sardi labon belödanis 1 949 (2001).

Bottom line: Spanish and Hebrew are much bigger than you might think, Volapük is not. Oh, and the Afrikaans Wikipedia is a bit of a hidden gem with a high amount of detail (here's their current featured article) in spite of still having under 12,000 articles. There are some fantastic editors over there.


Anonymous said...

I feel you should take a look at malayalam wikipedia as well. It has 3rd largest page depth after English and Hebrew. You may want to take a look at the long pages -പ്രത്യേകം:LongPages

  © Blogger templates Newspaper by 2008

Back to TOP