1 |
An investigation into multi-word expressions in machine translation
|
|
Han, Lifeng. - : Dublin City University. School of Computing, 2022. : Dublin City University. ADAPT, 2022
|
|
In: Han, Lifeng orcid:0000-0002-3221-2185 (2022) An investigation into multi-word expressions in machine translation. PhD thesis, Dublin City University. (2022)
|
|
Abstract:
Multi-word Expressions (MWEs) present challenges in natural language processing and computational linguistics due to their popular usage, richness in variety, idiomaticity, and non-decompositionality, which are present in the text content in which they are used. This is a typical level of expectation in the machine translation (MT) field where we require algorithms to perform a translation from one human language to another automatically while requiring high-quality output including features such as adequacy, fluency, and keeping the same or making creative and correct style decisions in that output. In this thesis, we carry out an extensive investigation into MWEs in Neural MT. Firstly, we carry out a review of relevant literature which includes experimental work on re-examining state-of-the-art models that combine knowledge of MWEs into MT systems, but with new language pairs setting to see what gaps might exist in the published literature. Secondly, we propose our new models on how to address MWE translations. This includes a design where we treat MWEs as low-frequency words and phrases translation issues, by integrating language-specific features such as strokes and radicals representation of Chinese characters into the learning model, expecting that this will facilitate improved accuracy. Thirdly, to properly examine different MT models' performances in the context of MWEs, we need to carry out a new evaluation methodology, and in light of this, we create a multilingual parallel corpus with MWE annotations (AlphaMWE). During the creation of this corpus, we classify the MT issues on MWE-related content into several categories with the expectation that this will help future MT researchers to focus on one or some of these in order to achieve a new state of the art in MT performance, ultimately moving towards human parity. Finally, we propose a new methodology for human in the loop MT evaluation with MWE considerations (HiLMeMe).
|
|
Keyword:
Machine translating
|
|
URL: http://doras.dcu.ie/26559/
|
|
BASE
|
|
Hide details
|
|
2 |
Chinese character decomposition for neural MT with multi-word expressions
|
|
|
|
In: Han, Lifeng orcid:0000-0002-3221-2185 , Jones, Gareth J.F. orcid:0000-0003-2923-8365 , Smeaton, Alan F. orcid:0000-0003-1028-8389 and Bolzoni, Paolo (2021) Chinese character decomposition for neural MT with multi-word expressions. In: 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), 31 May- 2 June 2021, Reykjavik, Iceland (Online). (In Press) (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Translation quality assessment: a brief survey on manual and automatic methods
|
|
|
|
In: Han, Lifeng orcid:0000-0002-3221-2185 , Jones, Gareth J.F. orcid:0000-0003-2923-8365 and Smeaton, Alan F. orcid:0000-0003-1028-8389 (2021) Translation quality assessment: a brief survey on manual and automatic methods. In: MoTra21: Workshop on Modelling Translation: Translatology in the Digital Age, 31 May- 2 Jun 2021, Rejkjavik, Iceland (Online). (In Press) (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Proactive information retrieval
|
|
Sen, Procheta. - : Dublin City University. School of Computing, 2021. : Dublin City University. ADAPT, 2021
|
|
In: Sen, Procheta (2021) Proactive information retrieval. PhD thesis, Dublin City University. (2021)
|
|
BASE
|
|
Show details
|
|
5 |
AlphaMWE: construction of multilingual parallel corpora with MWE annotations
|
|
|
|
In: Han, Lifeng orcid:0000-0002-3221-2185 , Jones, Gareth J.F. orcid:0000-0003-2923-8365 and Smeaton, Alan F. orcid:0000-0003-1028-8389 (2020) AlphaMWE: construction of multilingual parallel corpora with MWE annotations. In: Joint Workshop on Multiword Expressions and Electronic Lexicons (MWE-LEX 2020), 13 Dec 2020, Barcelona, Spain (Online). (2020)
|
|
BASE
|
|
Show details
|
|
6 |
MultiMWE: building a multi-lingual multi-word expression (MWE) parallel corpora
|
|
|
|
In: Han, Lifeng orcid:0000-0002-3221-2185 , Jones, Gareth J.F. orcid:0000-0003-2923-8365 and Smeaton, Alan F. orcid:0000-0003-1028-8389 (2020) MultiMWE: building a multi-lingual multi-word expression (MWE) parallel corpora. In: 12th International Conference on Language Resources and Evaluation (LREC), 11-16 May, 2020, Marseille, France. (Virtual). (2020)
|
|
BASE
|
|
Show details
|
|
7 |
LSTM language model adaptation with images and titles for multimedia automatic speech recognition
|
|
|
|
In: Moriya, Yasufumi and Jones, Gareth J.F. orcid:0000-0003-2923-8365 (2019) LSTM language model adaptation with images and titles for multimedia automatic speech recognition. In: IEEE SLT 2018 - Workshop on Spoken Language Technology, 18-21 Dec 2018, Athens, Greece. ISBN 978-1-5386-4334-1 (2019)
|
|
BASE
|
|
Show details
|
|
8 |
Tempo-lexical context driven word embedding for cross-session search task extraction
|
|
|
|
In: Sen, Procheta, Ganguly, Debasis orcid:0000-0003-0050-7138 and Jones, Gareth J.F. orcid:0000-0003-2923-8365 (2018) Tempo-lexical context driven word embedding for cross-session search task extraction. In: 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1-6 June 2018, New Orleans, LA, USA. (2018)
|
|
BASE
|
|
Show details
|
|
9 |
Spoken content retrieval beyond pipeline integration of automatic speech recognition and information retrieval
|
|
Racca, David. - : Dublin City University. School of Computing, 2018. : Dublin City University. ADAPT, 2018
|
|
In: Racca, David (2018) Spoken content retrieval beyond pipeline integration of automatic speech recognition and information retrieval. PhD thesis, Dublin City University. (2018)
|
|
BASE
|
|
Show details
|
|
10 |
Utilization of multimodal interaction signals for automatic summarisation of academic presentations
|
|
Curtis, Keith. - : Dublin City University. School of Computing, 2018
|
|
In: Curtis, Keith (2018) Utilization of multimodal interaction signals for automatic summarisation of academic presentations. PhD thesis, Dublin City University. (2018)
|
|
BASE
|
|
Show details
|
|
11 |
Promoting user engagement and learning in search tasks by effective document representation
|
|
Arora, Piyush. - : Dublin City University. School of Computing, 2018. : Dublin City University. ADAPT, 2018
|
|
In: Arora, Piyush orcid:0000-0002-4261-2860 (2018) Promoting user engagement and learning in search tasks by effective document representation. PhD thesis, Dublin City University. (2018)
|
|
BASE
|
|
Show details
|
|
12 |
Identifying effective translations for cross-lingual Arabic-to-English user-generated speech search
|
|
|
|
In: Khwileh, Ahmad, Afli, Haithem orcid:0000-0002-7449-4707 , Jones, Gareth J.F. orcid:0000-0003-2923-8365 and Way, Andy orcid:0000-0001-5736-5930 (2017) Identifying effective translations for cross-lingual Arabic-to-English user-generated speech search. In: Third Arabic Natural Language Processing Workshop (WANLP), 3 Apr 2017, Valencia, Spain. (2017)
|
|
BASE
|
|
Show details
|
|
13 |
Identifying effective translations for cross-lingual Arabic-to-English user-generated speech search
|
|
|
|
In: Khwileh, Ahmad, Afli, Haithem orcid:0000-0002-7449-4707 , Jones, Gareth J.F. orcid:0000-0003-2923-8365 and Way, Andy orcid:0000-0001-5736-5930 (2017) Identifying effective translations for cross-lingual Arabic-to-English user-generated speech search. In: Proceedings of The Third Arabic Natural Language Processing Workshop (WANLP), 3-4 Apr 2017, Valencia, Spain. (2017)
|
|
BASE
|
|
Show details
|
|
14 |
How do users perceive information: analyzing user feedback while annotating textual units
|
|
|
|
In: Arora, Piyush orcid:0000-0003-0055-345X and Jones, Gareth J.F. orcid:0000-0003-2923-8365 (2017) How do users perceive information: analyzing user feedback while annotating textual units. In: CHIIR 2017 Workshop on Supporting Complex Search Tasks, 11 Mar 2017, Oslo, Norway. (2017)
|
|
BASE
|
|
Show details
|
|
15 |
CLEF 2017 NewsREEL Overview: A Stream-based Recommender Task for Evaluation and Education
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Report on CLEF 2017: Experimental IR Meets Multilinguality, Multimodality, and Interaction
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Retrievability of code mixed microblogs
|
|
|
|
In: Ganguly, Debasis orcid:0000-0003-0050-7138 , Bandyopadhyay, Ayan, Mitra, Mandar and Jones, Gareth J.F. orcid:0000-0003-2923-8365 (2016) Retrievability of code mixed microblogs. In: 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 17-21 July 2016, Pisa, Italy. ISBN 978-1-4503-4069-4 (2016)
|
|
BASE
|
|
Show details
|
|
19 |
Joint estimation of topics and hashtag relevance in cross-lingual tweets
|
|
|
|
In: Sen, Procheta, Ganguly, Debasis orcid:0000-0003-0050-7138 and Jones, Gareth J.F. orcid:0000-0003-2923-8365 (2016) Joint estimation of topics and hashtag relevance in cross-lingual tweets. In: ACM on International Conference on the Theory of Information Retrieval, ICTIR 2016, 12- 6 Sept 2016., Newark, DE, USA. ISBN 978-1-4503-4497-5 (2016)
|
|
BASE
|
|
Show details
|
|
20 |
FaDA: fast document aligner using word embedding
|
|
|
|
In: Lohar, Pintu, Ganguly, Debasis orcid:0000-0003-0050-7138 , Afli, Haithem orcid:0000-0002-7449-4707 , Way, Andy orcid:0000-0001-5736-5930 and Jones, Gareth J.F. orcid:0000-0003-2923-8365 (2016) FaDA: fast document aligner using word embedding. Prague Bulletin of Mathematical Linguistics (106). pp. 169-179. ISSN 1804-0462 (2016)
|
|
BASE
|
|
Show details
|
|
|
|