Statistical machine translation

Statistical machine translation is like teaching a computer to talk like a person from another country. Imagine you want to talk to someone who speaks a different language, like Spanish. You don't know Spanish, but you have a book that gives you words and phrases in both English and Spanish. You can use this book to 'translate' what you want to say into Spanish, so that the other person can understand you. This is kind of what happens when a computer does statistical machine translation.

The computer has a big database of text in two languages, like English and Spanish. It uses this database to learn how the two languages are related. For example, it might learn that the English sentence "I am happy" can be translated into Spanish as "Estoy feliz." It uses this information to 'translate' one language into the other.

However, because different languages have different rules and ways of expressing things, the computer might not always get it right. Just like how you might use the wrong word or phrase when talking to someone in a different language. To help make the translations more accurate, the computer uses lots of different techniques, like looking at the context of the sentence, and trying to figure out the most likely translation based on what it knows about the two languages.

So, statistical machine translation is like a computer trying to learn how to talk in another language by using a big book with lots of words and phrases in both languages, and then using what it learns to translate between the languages as accurately as possible.

Related topics others have asked about:

Apptek, Cache language model, Duolingo, Europarl corpus, Example-based machine translation, Google Translate, Hybrid machine translation, Language Weaver, Microsoft Translator, Moses (machine translation), Rule-based machine translation