2 |
The Role of Non-monotonic Reasoning in Future Development of Artificial Intelligence (Dagstuhl Perspectives Workshop 19072)
|
|
|
|
BASE
|
|
Show details
|
|
4 |
AUTNES Content Analysis of Party Websites 2013 ... : AUTNES Content Analysis of Party Websites 2013 ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
AUTNES Content Analysis of Party Leader Statements 2002 ... : AUTNES Content Analysis of Party Leader Statements 2002 ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
AUTNES Content Analysis of Party Leader Statements 2006 ... : AUTNES Content Analysis of Party Leader Statements 2006 ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
AUTNES Content Analysis of Party Leader Statements 2008 ... : AUTNES Content Analysis of Party Leader Statements 2008 ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
The Effectiveness of Dual Language and Sheltered English Immersion ESOL Programs: A Comparative Study
|
|
|
|
In: Doctoral Dissertations and Projects (2017)
|
|
BASE
|
|
Show details
|
|
9 |
Discourse-level features for statistical machine translation
|
|
|
|
In: http://infoscience.epfl.ch/record/206235 (2015)
|
|
Abstract:
Machine Translation (MT) has progressed tremendously in the past two decades. The rule-based and interlingua approaches have been superseded by statistical models, which learn the most likely translations from large parallel corpora. System design does not amount anymore to crafting syntactical transfer rules, nor does it rely on a semantic representation of the text. Instead, a statistical MT system learns the most likely correspondences and re-ordering of chunks of source words and target words from parallel corpora that have been word-aligned. With this procedure and millions of parallel source and target language sentences, systems can generate translations that are intelligible and require minimal post-editing efforts from the human user. Nevertheless, it has been recognized that the statistical MT paradigm may fall short of modeling a number of linguistic phenomena that are established beyond the phrase level. Research in statistical MT has addressed discourse phenomena explicitly only in the past four years. When it comes to textual coherence structure, cohesive ties relate sentences and entire paragraphs argumentatively to each other. This text structure has to be rendered appropriately in the target text so that it conveys the same meaning as the source text. The lexical and syntactical means through which these cohesive markers are expressed may diverge considerably between languages. Frequently, these markers include discourse connectives, which are function words such as however, instead, since, while, which relate spans of text to each other, e.g. for temporal ordering, contrast or causality. Moreover, to establish the same temporal ordering of events described in a text, the conjugation of verbs has to be coherently translated. The present thesis proposes methods for integrating discourse features into statistical MT. We pre-process the source text prior to automatic translation, focusing on two specific discourse phenomena: discourse connectives and verb tenses. Hand-crafted rules are not required in our proposal; instead, machine learning classifiers are implemented that learn to recognize discourse relations and predict translations of verb tenses. Firstly, we have designed new sets of semantically-oriented features and classifiers to advance the state of the art in automatic disambiguation of discourse connectives. We hereby profited from our multilingual setting and incorporated features that are based on MT and on the insights we gained from contrastive linguistic analysis of parallel corpora. In their best configurations, our classifiers reach high performances (0.7 to 1.0 F1 score) and can therefore reliably be used to automatically annotate the large corpora needed to train SMT systems. Issues of manual annotation and evaluation are discussed as well, and solutions are provided within new annotation and evaluation procedures. As a second contribution, we implemented entire SMT systems that can make use of the (automatically) annotated discourse information. Overall, the thesis confirms that these techniques are a practical solution that leads to global improvements in translation in ranges of 0.2 to 0.5 BLEU score. Further evaluation reveals that in terms of connectives and verb tenses, our statistical MT systems improve the translation of these phenomena in ranges of up to 25%, depending on the performance of the automatic classifiers and on the data sets used.
|
|
URL: https://doi.org/10.5075/epfl-thesis-6501 https://infoscience.epfl.ch/record/206235/files/EPFL_TH6501.pdf http://infoscience.epfl.ch/record/206235
|
|
BASE
|
|
Hide details
|
|
10 |
Discourse-level features for statistical machine translation ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Disambiguating Discourse Connectives for Statistical Machine Translation
|
|
|
|
In: http://infoscience.epfl.ch/record/209083 (2015)
|
|
BASE
|
|
Show details
|
|
13 |
Automatic Speech Recognition and Translation of a Swiss German Dialect: Walliserdeutsch
|
|
|
|
In: http://infoscience.epfl.ch/record/202570 (2014)
|
|
BASE
|
|
Show details
|
|
14 |
Cross-linguistic annotation of narrativity for English/French verb tense disambiguation
|
|
|
|
In: http://infoscience.epfl.ch/record/198436 (2014)
|
|
BASE
|
|
Show details
|
|
15 |
English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling
|
|
|
|
In: http://infoscience.epfl.ch/record/198442 (2014)
|
|
BASE
|
|
Show details
|
|
16 |
English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling
|
|
|
|
In: ISBN: 978-2-9517408-8-4 ; Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) (2014)
|
|
BASE
|
|
Show details
|
|
19 |
Using a Massively Parallel Computer Architecture to Model the Ocean's Biological Pump
|
|
|
|
In: Senior Theses (2013)
|
|
BASE
|
|
Show details
|
|
20 |
Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies
|
|
|
|
In: http://infoscience.epfl.ch/record/192355 (2013)
|
|
BASE
|
|
Show details
|
|
|
|