Despite their simplicity, word embeddings capture interesting semantic structure. Word embeddings are trained to predict the words around a given word using context (e.g., the five words preceding and the five words following a given word). For this, we used a method we introduced in a previous paper, in which the system first learns word embeddings (vectorial representations of words) for every word in each language. The first step toward our ambitious goal was for the system to learn a bilingual dictionary, which associates a word with its plausible translations in the other language. And it may only be the beginning of ways in which these principles can be applied to machine learning and artificial intelligence. This new method opens the door to faster, more accurate translations for many more languages. For low-resource languages, there is now a way to learn to translate between, say, Urdu and English by having access only to text in English and completely unrelated text in Urdu – without having any of the respective translations. This is an important finding for MT in general and especially for the majority of the 6,500 languages in the world for which the pool of available translation training resources is either nonexistent or so small that it cannot be used with existing systems. To give some idea of the level of advancement, an improvement of 1 BLEU point (a common metric for judging the accuracy of MT) is considered a remarkable achievement in this field our methods showed an improvement of more than 10 BLEU points. Our new approach provides a dramatic improvement over previous state-of-the-art unsupervised approaches and is equivalent to supervised approaches trained with nearly 100,000 reference translations. Research we are presenting at EMNLP 2018 outlines our recent accomplishments with that task. Training an MT model without access to any translation resources at training time (known as unsupervised translation) was the necessary next step. As a result, MT currently works well only for the small subset of languages for which a volume of translations is readily available. To do this well, current machine translation (MT) systems require access to a considerable volume of translated text (e.g., pairs of the same text in both English and Spanish). Automatic language translation is important to Facebook as a way to allow the billions of people who use our services to connect and communicate in their preferred language.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |