Estimating German language ability / interest by language using Wikipedia

Monday, March 11, 2013

The following are the results of an interesting test that took about 15 minutes to do, one that I assumed would not work but actually turned out some pretty interesting data. The test was as follows: count the number of users on a Wikipedia that claim some sort of German language proficiency, and then use that number compared with the total number of users for a rough gauge of how much interest / ability there is in German by country. Some Wikipedias like Czech simply did not seem to use these user tags, but most did, and the results actually came into line with what one might expect:

  • Frisian: 13,930 users, 151 German (one per 92 users)
  • Polish: 544,192 users, 3711 German (one per 147 users)
  • Danish: 182,113 users, 1064 German (one per 171 users)
  • Dutch: 517,111 users, 2896 German (one per 178 users)
  • Afrikaans 49,328 users, 263 German (one per 187 users)
  • Slovenian: 107,074 users, 527 German (one per 203 users)
  • Catalan: 139,175 users, 663 German (one per 209 users)
  • Norwegian: 264,415 users, 1180 German (one per 224 users)
  • Bulgarian: 131,623 users, 535 German (one per 246 users)
  • French: 1,509,780 users, 5846 German (one per 258 users)
  • Russian: 1,011,704 users, 2712 German (one per 373 users)
  • Korean: 219,443 users, 390 German (one per 562 users)
  • Japanese: 713,836 users, 1026 German (one per 696 users)

Frisian is a bit small so those numbers could be biased (German admins helping to run it, for example, or Dutch speakers who are interested in languages and know some Frisian along with a good amount of German), but the larger ones seem about right. Korean and Japanese are right at the bottom, Russian has far fewer per capita than Polish which borders Germany, Slovenian shows quite a few, and just about every Germanic Wikipedia shows one per 200 or fewer.


