DE eng

Search in the Catalogues and Directories

Hits 1 – 7 of 7

1
Deep Syntax in Statistical Machine Translation
Graham, Yvette. - : Dublin City University. National Centre for Language Technology (NCLT), 2011. : Dublin City University. School of Computing, 2011
In: Graham, Yvette (2011) Deep Syntax in Statistical Machine Translation. PhD thesis, Dublin City University. (2011)
Abstract: Statistical Machine Translation (SMT) via deep syntactic transfer employs a three-stage architecture, (i) parse source language (SL) input, (ii) transfer SL deep syntactic structure to the target language (TL), and (iii) generate a TL translation. The deep syntactic transfer architecture achieves a high level of language pair independence compared to other Machine Translation (MT) approaches, as translation is carried out at the more language independent deep syntactic representation. TL word order can be generated independently of SL word order and therefore no reordering model between source and target words is required. In addition, words in dependency relations are adjacent in the deep syntactic structure, allowing the extraction of more general transfer rules, compared to other rules/phrases extracted from the surface form corpus, as such words are often distant in surface form strings, as well as allowing the use of a TL deep syntax language model, which models a deeper notion of fluency than a string-based language model and may lead to better lexical choice. The deep syntactic representation also contains words in lemma form with morpho-syntactic information, and this enables new inflections of lemmas not observed in bilingual training data, that are out of coverage for other SMT approaches, to fall within coverage of deep syntactic transfer. In this thesis, we adapt existing methods already successful in Phrase-Based SMT (PB-SMT) to deep syntactic transfer as well as presenting new methods of our own. We present a new definition for consistent deep syntax transfer rules, inspired by the definition for a consistent phrase in PB-SMT, and we extract all rules consistent with the node alignment, as smaller rules provide high coverage of unseen data, while larger rules provide more fluent combinations of TL words. Since large numbers of consistent transfer rules exist per sentence pair, we also provide an efficient method of extracting rules as well as an efficient method of storing them. We also present a deep syntax translation model, as in other SMT approaches, we use a log-linear combination of features functions, and include a translation model computed from relative frequencies of transfer rules, lexical weighting, as well as a deep syntax language model and string-based language model. In addition, we describe methods of carrying out transfer decoding, the search for TL deep syntactic structures, and how we efficiently integrate a deep syntax trigram language model to decoding, as well as methods of translating morpho-syntactic information separately from lemmas, using an adaptation of Factored Models. Finally, we include an experimental evaluation, in which we compare MT output for different configurations of our SMT via deep syntactic transfer system. We investigate various methods of word alignment, methods of translating morpho-syntactic information, limits on transfer rule size, different beam sizes during transfer decoding, generating from different sized lists of TL decoder output structures, as well as deterministic versus non-deterministic generation. We also include an evaluation of the deep syntax language model in isolation to the MT system and compare it to a string-based language model. Finally, we compare the performance and types of translations our system produces with a state-of-the-art phrase-based statistical machine translation system and although the deep syntax system in general currently under-performs, it does achieve state-of-the-art performance for translation of a specific syntactic construction, the compound noun, and for translations within coverage of the TL precision grammar used for generation. We provide the software for transfer rule extraction, as well as the transfer decoder, as open source tools to assist future research.
Keyword: Lexical Functional Grammar; Machine translating
URL: http://doras.dcu.ie/16078/
BASE
Hide details
2
f-align: An open-source alignment tool for LFG f-structures
In: Bryl, Anton and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) f-align: An open-source alignment tool for LFG f-structures. In: AMTA, 31 Oct - 4th Nov 2010, Denver, Colorado. (2010)
BASE
Show details
3
LFG without C-structures
In: Cetinoglu, Ozlem, Foster, Jennifer orcid:0000-0002-7789-4853 , Nivre, Joakim, Hogan, Deirdre, Cahill, Aoife orcid:0000-0002-3519-7726 and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) LFG without C-structures. In: the 9th International Workshop on Treebanks and Linguistic Theories, 3 - 4 Dec. 2010, Tartu Estonia. (2010)
BASE
Show details
4
Closing the gap between stochastic and rule-based LFG grammars
In: Hautli, Annette, Cetinoglu, Ozlem and van Genabith, Josef orcid:0000-0003-1322-7944 (2010) Closing the gap between stochastic and rule-based LFG grammars. In: the LFG10 Conference, 18-20 July 2010, Ottowa, Canada. (2010)
BASE
Show details
5
German particle verbs and pleonastic prepositions
In: Rehbein, Ines and van Genabith, Josef (2006) German particle verbs and pleonastic prepositions. In: Third ACL-SIGSEM Workshop on Prepositions, 3 April 2006, Trento, Italy. (2006)
BASE
Show details
6
Automatic extraction of large-scale multilingual lexical resources
O'Donovan, Ruth. - : Dublin City University. School of Computing, 2006
In: O'Donovan, Ruth (2006) Automatic extraction of large-scale multilingual lexical resources. PhD thesis, Dublin City University. (2006)
BASE
Show details
7
Evaluating automatic LFG f-structure annotation for the Penn-II treebank
In: Burke, Michael, Cahill, Aoife orcid:0000-0002-3519-7726 , McCarthy, Mairéad, O'Donovan, Ruth, van Genabith, Josef and Way, Andy orcid:0000-0001-5736-5930 (2004) Evaluating automatic LFG f-structure annotation for the Penn-II treebank. Research on Language and Computation, 2 (4). pp. 523-547. ISSN 1570-7075 (2004)
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
7
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern