Testing out Google's translation service from Turkish to English using "Dünyanın en yaşlı insanı Özbekistan'da"

Tuesday, February 03, 2009

Karakalpakça, Türk Lehçelerinin Kıpçak grubuna girmektedir. Kazak ve Nogay lehçelerine de yakındır. Bunlar da yazı dilinde önce Arap alfabesini, 1928-1940 arası Latin alfabesini ve daha sonra Kiril alfabesini kullanmışlardır. <-- Google translates this as "Karakalpakça, Kıpçak group are of Turkic Languages. And is close to the Kazakh NOGAY dialects. This is the first written language of Arabic alphabet, between 1928-1940 and then the Latin alphabet Cyrillic alphabet was used."

Half a day has passed since this post and now Google's translation service in Turkish has finally come to Korea where I live. Now it's time to see how accurate it is. I would expect a language like Turkish to be somewhere in between Korean and a language like French in terms of accuracy, because though it has a pretty different grammar it still has a lot of words that correspond directly to English (ekonomi for economy vs. 경제 - gyeongje in Korean is one example) and there's no worrying about how to translate the varying levels of politeness you find in Korean.

So let's go with this article about the oldest person in the world being in Uzbekistan. I'll keep my translation on the more literal side.


Turkish
Google's translation
My translation
Notes
Dünyanın en yaşlı insanı Özbekistan'daThe world's most elderly people in Uzbekistan The world's oldest person is in Uzbekistan

Özbekistan nüfus müdürlüğünün yaptığı açıklamaya göre dünyanın en yaşlı insanı özbekistan'da yaşıyor.
According to the directorate that the population of Uzbekistan the world's most elderly people are living in Uzbekistan. According to a statement by Uzbekistan's population directorship, the oldest person in the world lives in Uzbekistan.
Tried a few other combinations and doesn't recognize the word insan as singular unless you put 'bir' in front of it
Özbekistan nüfus müdürlüğü Özbekistan'ın Karakalpakistan Özerk Bölgesi'nin Törtkül ilçesinde yaşayan 1880 doğumlu Tuti Yusupova dünyanın en yaşlı insanı olduğunu açıkladı.. .The office population of Uzbekistan Uzbekistan Karakalpakistan Autonomous Region, born in 1880 who live in districts Törtkül the world's most elderly people Tuti Yusupova is announcedUzbekistan's population directorship announced that Tuti Yusuyova, who lives in the district of Törtkül in the Autonomous Region of Karakalpakistan in Uzbekistan, is the oldest person in the world..
This is where it gets difficult - Google doesn't recognize the key part of the sentence: "Özbekistan nüfus müdürlüğü ...(info here) açıkladı" or "Uzbekistan's population directorship announced (something)". If you change it to Özbekistan'ın nüfus müdürlüğü açıkladı then it does well, giving "Uzbekistan's population office announced", but it doesn't recognize it without the 'ın' after Uzbekistan.
Özbekistanlı yaşlı kadın 1 Temmuz 2009 yılında ise 130 yaşından gün almaya başlayacak
1 July 2009 in Uzbekistan with the elderly woman started to take 130 years to the dayAs of July 2009 the old Uzbek woman will begin passing 130 years.
Seems to confuse Özbekistanlı (Uzbek - add li/lı etc. to a country and you have a demonym) with Özbekistanla (with Uzbekistan).
Yuspova'ya emeği ve çalışkan görevleri için daha önceki yıllarda "Şeref" madalyası ile ödüllendiren hükümet, şimdi onun adını dünya rekorlar kitabına yazdırmak istiyor.
For tasks and hardworking labor Yuspova'ya in previous years "Honor" medal to reward the government, it wants to print the name of the book world records.The government, which presented Yuspova with a medal of "honour" in previous years for her tasks and industrious work, now wants to write her name in the book of world records.
The key part of the sentence here is "hükümet yazdırmak istiyor" - "the government wants to write in (the) book", and everything leading up to the word "hükümet" is explaining the government's previous actions. If I change the sentence to "Hükümet, şimdi kitabına yazdırmak istiyor" then it's a bit better with "The government now wants to print the book". It wouldn't be "print" though - yazdırmak here would mean that they want others to write her name in the book (they don't print the book themselves).
Görme ve konuşmada bir sıkıntısı çekmeyen Yusupova, kulakları ağır işittiyor.
In speaking of a problem and you do not see Yusupova, ear heavy işittiyor.Yusuyova, who has no difficulties with seeing and talking, is hard of hearing.
This is probably the worst part of the translation. Ağır işitmek means to be hard of hearing but it literally translates ağır as heavy and doesn't even recognize the verb to hear.
Uzun ömürlü olmasının nedenini helal rızka bağlayan Yuspova, mutlu hayatının Allah'ın verdiği hediyesi olarak yorumluyor.
Linking the cause of longevity rızka helal Yuspova, happy life as the gift of God that is interpreted. Linking the reason for longevity to "honest bread", Yuzpova interprets her happy life as a gift from God/Allah.
Rızk means daily bread and helal refers to something that is lawful or honestly earned, so she's talking about the bread earned from an honest day's work.
17 yaşında evlenen ve bir kız bir erkek çocuğunu büyüten Tuti Yuspova'nın 100'den fazla torunları var.
17 years old, married and raised a daughter to a boy more than 100 grandchildren have Tuti Yuspova'nın.Tuti Yuspova, who married at 17 and raised a daughter and a son, has over 100 grandchildren.
She raised a daughter to a boy, hm?



Conclusion: still needs a lot of work but certainly better than nothing and a lot of fun. It can of course be used as a quick dictionary as well by putting periods between words like this:
17 yaşında. evlenen. ve. bir. kız. bir. erkek. çocuğunu. büyüten. fazla. torunları. var.
which gives the result:
17 years old. who marry. and. one. girl. one. male. children. raised. much. and grandchildren. there is.
which is much faster than a dictionary, and can also work better in that it recognizes permutations in the words whereas a dictionary won't be able to recognize something like çocuğunu, because the word itself is çocuk and you have to know that before you can look it up.

I intend to send in a lot of input to Google to improve the translation service. Anybody know how long it takes for them to incorporate user input?

3 comments:

Anonymous said...

Ah - vous habitez la Corée (RoK bien sur). Dites-moi - anciennement on écrivait le Coréen, comme le japonais, dans un mélange de chinois et de coréen. Mais il parait que c'est a cette heure extremement rare? (Je suis mi-lettré en chinois et japonais. J'ai aussi des notions du vietnamien.)

Novparl.

데이빛 / Mithridates said...

Yes, nu solmen li old coreanes save qualmen scrir li hanja (hanzi/kanji, li lítteres chinese). In jurnales on posse trovar hanja simplic o por nómines (kim = 金, lee = 李 etc.) ma si on monstrar hanja a un corean, 8 persones ex 10 ne save qualmen leer li lítteres.

Ma recentmen li studie de chinese ha devenit plu popular, e nu un li scoles li studiantes deve studiar ili. Fórsan quelc annus pos on va posser trovar plu mult adultes qui posse leer hanja.

Naturalmen, li conossentie de hanja es pensat quam un pruva de education, e mult adultes pensa que un yun person qui ne save ili es stupid, dunc si on vole maritar se con permission/aprobation del genitores, un conossentie de hanja es bon.

hkyson said...

Thanks a lot for making this comparison. I find the idea of using periods to do dictionary lookups potentially very helpful, and I'll try it out later on.

I am starting to play with the Google translator by

(1) taking some text in simple English from Wikipedia,

(2) translating it into Spanish,

(3) making further changes to the English until I get a reasonable Spanish translation, and then

(4) translating my revised English text into other languages.

If you are interested in taking a look at what I have come up with, go to my blog, "Interlingua multilingue" at http://www.interlinguamultilingue.blogspot.com.

Best regards!

Harleigh Kyson Jr.

FEEDJIT Live Traffic Feed

  © Blogger templates Newspaper by Ourblogtemplates.com 2008

Back to TOP