1 |
An investigation of English-Irish machine translation and associated resources
|
|
Dowling, Meghan. - : Dublin City University. School of Computing, 2022. : Dublin City University. ADAPT, 2022
|
|
In: Dowling, Meghan orcid:0000-0003-1637-4923 (2022) An investigation of English-Irish machine translation and associated resources. PhD thesis, Dublin City University. (2022)
|
|
Abstract:
As an official language in both Ireland and the European Union (EU), there is a high demand for English-Irish (EN-GA) translation in public administration. The difficulty that translators currently face in meeting this demand leads to the need for reliable domain-specific user-driven EN-GA machine translation (MT). This landscape provides a timely opportunity to address some research questions surrounding MT for the EN-GA language pair. To this end, we assess the corpora available for training data-driven MT systems, including publicly-available data, data collected through EU-supported data collection efforts and web-crawling, showing that though Irish is a low-resource language it is possible to increase the corpora available through concerted data collection efforts. We investigate how increased corpora affect domain-specific (public administration) statistical MT (SMT) and neural MT (NMT) systems using automatic metrics. The effect that different SMT and NMT parameters have on these automatic values is also explored, using sentence-level metrics to identify specific areas where output differs greatly between MT systems and providing a linguistic analysis of each. With EN-GA SMT and NMT automatic evaluation scores showing inconclusive results, we investigate the usefulness of EN-GA hybrid MT through the use of monolingual data as a source of artificial data creation via backtranslation. We evaluate these results using automatic metrics and linguistic analysis. Although results indicate that the addition of artificial data did not have a positive impact on EN-GA MT, repeated experiments involving Scottish Gaelic show that the method holds promise, given suitable conditions. Finally, given that the intended use-case of EN-GA MT is in the workflow of a professional translator, we conduct an in-depth human evaluation study for EN-GA SMT and NMT, providing a human-derived assessment of EN-GA MT quality and comparison of EN-GA SMT and NMT. We include a survey of translator opinions and recommendations surrounding EN-GA SMT and NMT as well as an analysis of data gathered through the post-editing of MT output. We compare these results to those generated automatically and provide recommendations for future work on EN-GA MT, in particular with regards to its use in a professional translation workflow within public administration.
|
|
Keyword:
Artificial intelligence; Computational linguistics; Linguistics; Machine learning; Machine translating; Translating and interpreting
|
|
URL: http://doras.dcu.ie/26574/
|
|
BASE
|
|
Hide details
|
|
2 |
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
|
|
|
|
In: ISSN: 2375-4699 ; EISSN: 2375-4702 ; ACM Transactions on Asian and Low-Resource Language Information Processing ; https://hal.inria.fr/hal-03616853 ; ACM Transactions on Asian and Low-Resource Language Information Processing, ACM, In press, ⟨10.1145/3523179⟩ (2022)
|
|
BASE
|
|
Show details
|
|
3 |
The contextual logic
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03195162 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Is Old French tougher to parse?
|
|
|
|
In: 20th International Workshop on Treebanks and Linguistic Theories ; https://hal.archives-ouvertes.fr/hal-03506500 ; 20th International Workshop on Treebanks and Linguistic Theories, Mar 2022, Sofia, Bulgaria (2022)
|
|
BASE
|
|
Show details
|
|
5 |
A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning
|
|
|
|
In: https://hal.inria.fr/hal-03536340 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
6 |
Learning and controlling the source-filter representation of speech with a variational autoencoder
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03650569 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
7 |
Thirty Years of Machine Translation in Language Teaching and Learning: A Review of the Literature
|
|
|
|
In: L2 Journal, vol 14, iss 1 (2022)
|
|
BASE
|
|
Show details
|
|
9 |
Hippocampal ensembles represent sequential relationships among an extended sequence of nonspatial events.
|
|
|
|
In: Nature communications, vol 13, iss 1 (2022)
|
|
BASE
|
|
Show details
|
|
10 |
Assessing the impact of OCR noise on multilingual event detection over digitised documents
|
|
|
|
In: ISSN: 1432-5012 ; EISSN: 1432-1300 ; International Journal on Digital Libraries ; https://hal.archives-ouvertes.fr/hal-03635985 ; International Journal on Digital Libraries, Springer Verlag, 2022, ⟨10.1007/s00799-022-00325-2⟩ (2022)
|
|
BASE
|
|
Show details
|
|
11 |
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
|
|
|
|
In: Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II ; https://hal.archives-ouvertes.fr/hal-03635971 ; Matthias Hagen; Suzan Verberne; Craig Macdonald; Christin Seifert; Krisztian Balog; Kjetil Nørvåg; Vinay Setty. Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, 13186, Springer International Publishing, pp.347-354, 2022, Lecture Notes in Computer Science, 978-3-030-99738-0. ⟨10.1007/978-3-030-99739-7_44⟩ (2022)
|
|
BASE
|
|
Show details
|
|
12 |
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
|
|
|
|
In: Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021) ; https://hal.inria.fr/hal-03527328 ; Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021), Jan 2022, punta cana, Dominican Republic ; https://aclanthology.org/2021.wnut-1.47/ (2022)
|
|
BASE
|
|
Show details
|
|
13 |
Annotation of Morphological Errors in L2 Russian Corpus Analysis
|
|
|
|
In: 21st Annual Second Language Acquisition and Teaching Interdisciplinary Roundtable ; https://hal.archives-ouvertes.fr/hal-03620469 ; 21st Annual Second Language Acquisition and Teaching Interdisciplinary Roundtable, University of Arizona, Feb 2022, Tucson, United States (2022)
|
|
BASE
|
|
Show details
|
|
14 |
Cross-Situational Learning Towards Robot Grounding
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03628290 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
15 |
Cross-Situational Learning Towards Robot Grounding
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03628290 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
16 |
A Methodology for the Comparison of Human Judgments With Metrics for Coreference Resolution
|
|
|
|
In: HumEval at ACL ; https://hal.archives-ouvertes.fr/hal-03650294 ; HumEval at ACL, May 2022, Dublin, Ireland ; https://humeval.github.io/ (2022)
|
|
BASE
|
|
Show details
|
|
17 |
Le modèle Transformer: un « couteau suisse » pour le traitement automatique des langues
|
|
|
|
In: Techniques de l'Ingenieur ; https://hal.archives-ouvertes.fr/hal-03619077 ; Techniques de l'Ingenieur, Techniques de l'ingénieur, 2022, ⟨10.51257/a-v1-in195⟩ ; https://www.techniques-ingenieur.fr/base-documentaire/innovation-th10/innovations-en-electronique-et-tic-42257210/transformer-des-reseaux-de-neurones-pour-le-traitement-automatique-des-langues-in195/ (2022)
|
|
BASE
|
|
Show details
|
|
18 |
The use of MT by undergraduate translation students for different learning tasks
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03547415 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
19 |
Can machines learn to see without visual databases?
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03526569 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
20 |
АКТУАЛЬНЫЕ ТЕНДЕНЦИИ ЦИФРОВИЗАЦИИ ИНОЯЗЫЧНОГО ОБУЧЕНИЯ В НЕЯЗЫКОВОМ ВУЗЕ ... : CURRENT TRENDS IN DIGITALIZATION OF FOREIGN LANGUAGE EDUCATION IN A NON-LINGUISTIC UNIVERSITY ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|