Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interestingly Google detects these words as Greek. I know they are nonsensical and not actually Greek but I'm wondering if any Greek speakers might be able to provide some insights. Are these gibberish words close to meaningful words? (clear shot in the dark here) Maybe a linguist could find more meaning?


As a native Greek, no, they don't make any sense.. sort of. My hunch is that they read significantly more like Latin than they do Greek. However it tells us something about google translate.

The reason "Apoploe vesrreaitais" is detected as Greek is because the first "word" is "phonetically" similar to the word απόπλους, which means sailing/shipping and it is rooted in ancient Greek. If we were to write Αποπλοuς using roman characters, we would write apoplous or apoloi (plural, in Greek is αποπλοΐ). So I think that the model understands that "oe" suffix is used to represent the Greek suffix "οι" that is used for plurals. The rest of the word is rather close phonetically, so there is some model that maps phonetic representations to the correct word.

The other phrase seems to be combined of words classified as Portuguese, Spanish, Lithuanian, and Luxembourgish.


This is a great response (I also suspected we'd learn something from the Google Translate black box). And I agree with the idea of being closer to Latin gibberish. The phonetic relationships are a great hint to what's actually going on.

My hypothesis here is more that these models are trained more on western languages than others and thus our latent representation of "language" is going to appear like Latin gibberish due to a combination of the evolution of these languages as well as human bias. ("It's all Greek to me")


I don't think that's how language detection works, they most likely use the frequencies of n-grams to detect language probability. It's still detected as Greek if you change to "Apoulon vesrreaitais", just because it kind of looks the way Greek words look, not because it resembles any specific word.


You are wrong. Had it been that simple I would __not__ have suggested that and for whatever reason I find your reply borderline infuriating but I can't pinpoint exactly why that is.

Regardless, here is me, a native speaker, disproving your hypothesis.

I tried the following words in google translate elefantas ailaifantas ailaiphantas elaiphandas elaiphandac.

The suggested detections are ελέφαντας, αιλαιφάντας, αιλαιφάντας, ελαϊφάντας, ελαϊφάντας, however, the translations are elephant, illuminated, illuminated, elephant, elephant respectively. The first is correct. When mapping the roman characters back to greek, there is loss of information, this is seen in the umlaut above iota which makes the pronunciation from ε [e] - like to αϊ [ai̯], and the emphasis denoted via the mark above epsilon (έ).

Notice that all all the words have an edit distance of >=4, a soundex distance of at most 1, and a metaphone distance of at most 1 [1]. The suggested words as I said above are near homophones of the correct word bar a few minor details.

[1] http://www.ripelacunae.net/projects/levenshtein


> for whatever reason I find your reply borderline infuriating but I can't pinpoint exactly why that is.

I guess that says more about you than about my reply. Also, I'm a native speaker as well. That doesn't really have any bearing, my comment above comes from what I know about common implementations of language detection algorithms, not so much from looking at how Google Translate behaves.


And I was honest about how I felt given how you structured it.

It does have a lot of bearing actually. While I am a native speaker, my spelling skills are atrocious as everything is a sequence of sounds in my head more so than a sequence of letters. To get around my spelling issues I frequently use homophones to find the correct spelling of a word which uses soundex or similar algorithms to find the correct word along with character mappings between the two languages.

Regardless, I believe I have proved the hypothesis to not be true.


Or maybe it’s a subtle joke by Google as a play on the idiom “it’s all Greek to me”?


Or for something that is only somewhat subtle, it's a chicken and egg problem.


One could conjecture that "Apoploe" is similar to από πουλί, "from bird". But I don't have much support for that conjecture.


The word is απόπλους, or αποπλοΐ




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: