2 |
A Large-Scale Study of Machine Translation in the Turkic Languages ...
|
|
Mirzakhalov, Jamshidbek; Babu, Anoop; Ataman, Duygu; Kariev, Sherzod; Tyers, Francis; Abduraufov, Otabek; Hajili, Mammad; Ivanova, Sardana; Khaytbaev, Abror; Laverghetta, Antonio; Moydinboyev, Behzodbek; Onal, Esra; Pulatova, Shaxnoza; Wahab, Ahsan; Firat, Orhan; Chellappan, Sriram. - : arXiv, 2021
|
|
Abstract:
Recent advances in neural machine translation (NMT) have pushed the quality of machine translation systems to the point where they are becoming widely adopted to build competitive systems. However, there is still a large number of languages that are yet to reap the benefits of NMT. In this paper, we provide the first large-scale case study of the practical application of MT in the Turkic language family in order to realize the gains of NMT for Turkic languages under high-resource to extremely low-resource scenarios. In addition to presenting an extensive analysis that identifies the bottlenecks towards building competitive systems to ameliorate data scarcity, our study has several key contributions, including, i) a large parallel corpus covering 22 Turkic languages consisting of common public datasets in combination with new datasets of approximately 2 million parallel sentences, ii) bilingual baselines for 26 language pairs, iii) novel high-quality test sets in three different translation domains and iv) ... : 9 pages, 1 figure, 8 tables. Main proceedings of EMNLP 2021 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://dx.doi.org/10.48550/arxiv.2109.04593 https://arxiv.org/abs/2109.04593
|
|
BASE
|
|
Hide details
|
|
6 |
A Prototype Free/Open-Source Morphological Analyser and Generator for Sakha ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Evaluating Multiway Multilingual NMT in the Turkic Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Do RNN States Encode Abstract Phonological Alternations? ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
A morphological analyser for K’iche’ ; Un analizador morfológico para el idioma k’iche’
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Multi-script morphological transducers and transcribers for seven Turkic languages
|
|
|
|
In: Proceedings of the Workshop on Turkic and Languages in Contact with Turkic; Vol 5 (2020); 173-185 ; 2641-3485 (2021)
|
|
BASE
|
|
Show details
|
|
15 |
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Dependency analysis of noun incorporation in polysynthetic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|