1 |
An investigation of English-Irish machine translation and associated resources
|
|
Dowling, Meghan. - : Dublin City University. School of Computing, 2022. : Dublin City University. ADAPT, 2022
|
|
In: Dowling, Meghan orcid:0000-0003-1637-4923 (2022) An investigation of English-Irish machine translation and associated resources. PhD thesis, Dublin City University. (2022)
|
|
Abstract:
As an official language in both Ireland and the European Union (EU), there is a high demand for English-Irish (EN-GA) translation in public administration. The difficulty that translators currently face in meeting this demand leads to the need for reliable domain-specific user-driven EN-GA machine translation (MT). This landscape provides a timely opportunity to address some research questions surrounding MT for the EN-GA language pair. To this end, we assess the corpora available for training data-driven MT systems, including publicly-available data, data collected through EU-supported data collection efforts and web-crawling, showing that though Irish is a low-resource language it is possible to increase the corpora available through concerted data collection efforts. We investigate how increased corpora affect domain-specific (public administration) statistical MT (SMT) and neural MT (NMT) systems using automatic metrics. The effect that different SMT and NMT parameters have on these automatic values is also explored, using sentence-level metrics to identify specific areas where output differs greatly between MT systems and providing a linguistic analysis of each. With EN-GA SMT and NMT automatic evaluation scores showing inconclusive results, we investigate the usefulness of EN-GA hybrid MT through the use of monolingual data as a source of artificial data creation via backtranslation. We evaluate these results using automatic metrics and linguistic analysis. Although results indicate that the addition of artificial data did not have a positive impact on EN-GA MT, repeated experiments involving Scottish Gaelic show that the method holds promise, given suitable conditions. Finally, given that the intended use-case of EN-GA MT is in the workflow of a professional translator, we conduct an in-depth human evaluation study for EN-GA SMT and NMT, providing a human-derived assessment of EN-GA MT quality and comparison of EN-GA SMT and NMT. We include a survey of translator opinions and recommendations surrounding EN-GA SMT and NMT as well as an analysis of data gathered through the post-editing of MT output. We compare these results to those generated automatically and provide recommendations for future work on EN-GA MT, in particular with regards to its use in a professional translation workflow within public administration.
|
|
Keyword:
Artificial intelligence; Computational linguistics; Linguistics; Machine learning; Machine translating; Translating and interpreting
|
|
URL: http://doras.dcu.ie/26574/
|
|
BASE
|
|
Hide details
|
|
2 |
A human evaluation of English-Irish statistical and neural machine translation
|
|
|
|
In: Dowling, Meghan orcid:0000-0003-1637-4923 , Castilho, Sheila orcid:0000-0002-8416-6555 , Moorkens, Joss orcid:0000-0003-4864-5986 , Lynn, Teresa and Way, Andy orcid:0000-0001-5736-5930 (2020) A human evaluation of English-Irish statistical and neural machine translation. In: Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 6 Nov 2020, Lisbon, Portugal. (2020)
|
|
BASE
|
|
Show details
|
|
3 |
Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Leveraging backtranslation to improve machine translation for Gaelic language
|
|
|
|
In: Dowling, Meghan orcid:0000-0003-1637-4923 , Lynn, Teresa and Way, Andy orcid:0000-0001-5736-5930 (2019) Leveraging backtranslation to improve machine translation for Gaelic language. In: Third Celtic Language Technology Workshop 2019, 19-23 Aug 2019, Dublin, Ireland. (2019)
|
|
BASE
|
|
Show details
|
|
5 |
Adapting NMT to caption translation in Wikimedia Commons for low-resource languages
|
|
|
|
In: Poncelas, Alberto orcid:0000-0002-5089-1687 , Sarasola, Kepa orcid:0000-0003-4349-6088 , Dowling, Meghan orcid:0000-0003-1637-4923 , Way, Andy orcid:0000-0001-5736-5930 , Labaka, Gorka orcid:0000-0003-4611-2502 and Alegria, Iñaki orcid:0000-0002-0272-1472 (2019) Adapting NMT to caption translation in Wikimedia Commons for low-resource languages. Procesamiento de Lenguaje Natural, 63 . pp. 33-40. ISSN 1135-5948 (2019)
|
|
BASE
|
|
Show details
|
|
6 |
Adapting NMT to caption translation in Wikimedia Commons for low-resource languages
|
|
|
|
In: Poncelas, Alberto orcid:0000-0002-5089-1687 , Sarasola, Kepa orcid:0000-0003-4349-6088 , Dowling, Meghan orcid:0000-0003-1637-4923 , Way, Andy orcid:0000-0001-5736-5930 , Labaka, Gorka orcid:0000-0003-4611-2502 and Alegria, Iñaki orcid:0000-0002-0272-1472 (2019) Adapting NMT to caption translation in Wikimedia Commons for low-resource languages. Procesamiento del Lenguaje Natural, 63 . pp. 33-40. ISSN 1135-5948 (2019)
|
|
BASE
|
|
Show details
|
|
7 |
Investigating backtranslation for the improvement of English-Irish machine translation
|
|
|
|
In: Dowling, Meghan orcid:0000-0003-1637-4923 , Lynn, Teresa and Way, Andy orcid:0000-0001-5736-5930 (2019) Investigating backtranslation for the improvement of English-Irish machine translation. Teanga, 26 . pp. 1-25. ISSN 0332-205X (2019)
|
|
BASE
|
|
Show details
|
|
8 |
Adapting NMT to caption translation in Wikimedia Commons for low-resource languages ; Adaptando NMT a la traducción de pies de imagen en Wikimedia Commons para idiomas con pocos recursos
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Is all that glitters in MT quality estimation really gold standard?
|
|
|
|
In: Graham, Yvette, Baldwin, Timothy, Dowling, Meghan orcid:0000-0003-1637-4923 , Eskevich, Maria, Lynn, Teresa and Tounsi, Lamia (2016) Is all that glitters in MT quality estimation really gold standard? In: 26th International Conference on Computational Linguistics, 11-17 Dec 2016, Osaka, Japan. ISBN 978-4-87974-702-0 (2016)
|
|
BASE
|
|
Show details
|
|
11 |
Tapadoir: developing a statistical machine translation engine and associated resources for Irish
|
|
|
|
In: Dowling, Meghan orcid:0000-0003-1637-4923 , Cassidy, Lauren, Maguire, Eimear, Lynn, Teresa, Srivastava, Ankit and Judge, John (2015) Tapadoir: developing a statistical machine translation engine and associated resources for Irish. In: 4th Biennial Workshop on Less-Resourced Languages (LRC 2015), 28 Nov 2015, Poznan, Poland. (2015)
|
|
BASE
|
|
Show details
|
|
|
|