DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5 6...9
Hits 21 – 40 of 167

21
INFODENS: An Open-source Framework for Learning Text Representations ...
BASE
Show details
22
Query Translation for Cross-lingual Search in the Academic Search Engine PubPsych ...
BASE
Show details
23
Query Translation for Cross-lingual Search in the Academic Search Engine PubPsych ...
BASE
Show details
24
A Hybrid Machine Translation Framework for an Improved Translation Workflow
Pal, Santanu. - : Saarländische Universitäts- und Landesbibliothek, 2018
BASE
Show details
25
Evaluating Evaluation Measures
Rehbein, Ines [Verfasser]; Van Genabith, Josef [Verfasser]; Nivre, Joakim [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
26
Why is it so difficult to compare treebanks? TIGER and TüBa-D
Rehbein, Ines [Verfasser]; Van Genabith, Josef [Verfasser]; De Smedt, Koenraad [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
27
Automatic acquisition of LFG resources for German - as good as it gets
Rehbein, Ines [Verfasser]; Van Genabith, Josef [Verfasser]; Butt, Miriam [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
28
Treebank Annotation Schemes and Parser Evaluation for German
Rehbein, Ines [Verfasser]; van Genabith, Josef van [Verfasser]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
29
German particle verbs and pleonastic prepositions
Rehbein, Ines [Verfasser]; Van Genabith, Josef [Verfasser]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
30
Massively Multilingual Neural Grapheme-to-Phoneme Conversion ...
BASE
Show details
31
An Empirical Analysis of NMT-Derived Interlingual Embeddings and their Use in Parallel Sentence Identification ...
BASE
Show details
32
Predicting the Law Area and Decisions of French Supreme Court Cases ...
BASE
Show details
33
Pluricentric languages : automatic identification and linguistic variation ; Plurizentrische Sprachen : automatische Spracherkennung und linguistische Variation
Abstract: Language Identification is a well-known research topic in NLP. State-of-the-art methods consist of the application of n-gram language models to distinguish languages automatically with well over 95% accuracy. This level of success is obtained when discriminating between languages that are typologically not closely related (e.g. Finnish and Spanish), or due to the contrast between languages with unique character sets such as Greek or Hebrew. Recent studies show that one of the main difficulties of n-gram based methods is the identification of closely related languages. The research presented in this thesis goes one step further and investigates computational methods to identify standard national varieties of pluricentric languages such as Portuguese, Spanish, French, and English. It explores different computational methods and different sets of features for this task that go beyond character and word language models. The main objective is to investigate the extent to which it is possible to identify language varieties automatically in both monolingual and in real-world (multilingual) settings and to establish what are the main challenges of this task in comparison to general purpose language identification models. This research shows, for example, that it is possible to discriminate between Brazilian and European Portuguese with 99.8% accuracy using journalistic texts. Another contribution of this thesis is the use of linguistically motivated features such as POS tags and morphological information to discriminate between language varieties with results of up to 83.1% accuracy in discriminating between Mexican and Peninsular Spanish texts. An additional aspect of this thesis is the use of classification output in corpus-driven contrastive linguistics research as explained in Chapter 6. Classification methods combined with linguistically meaningful features are able to provide empirical evidence on the convergences and divergences of language varieties in terms of lexicon, orthography, morphology and syntax. ; Die Sprachidentifikation ist ein wichtiges Forschungsthema in der Computerlinguistik. Aktuelle Verfahren nutzen n-gram-Sprachmodelle, um Sprachen automatisch voneinander zu unterscheiden, und erzielen dabei Genauigkeiten von über 95%. Entsprechende Leistungen werden dabei insbesondere dann erzielt, wenn die Algorithmen Sprachen, die typologisch nicht eng miteinander verwandt sind (z.B. Finnisch und Spanisch), klassifizieren oder aber auf Sprachen mit eindeutigen Zeichensätzen wie Griechisch oder Hebräisch. Studien zeigen jedoch, dass eine der Hauptschwierigkeiten n-gram-basierter Verfahren in der Identifikation ähnlicher Sprachen besteht. Die vorliegende Arbeit geht daher einen Schritt über existierende Methoden hinaus und untersucht Identifikationsverfahren für plurizentrische Sprachen wie das Portugiesische, Spanische, Französische und Englische. Dafür werden Algorithmen und Merkmale verwendet, die reichere Mengen linguistischer Information kodieren als zeichen- oder wortbasierte Sprachmodelle. Das Hauptziel der Arbeit besteht dabei darin zu untersuchen, inwieweit es möglich ist, Sprachvarietäten sowohl in einsprachigen als auch in mehrsprachigen Settings automatisch zu identifizieren. Auf Grundlage dieser Experimente ist es darüber hinaus müglich zu bewerten, welche die wesentlichen Schwierigkeiten des beschriebenen Ansatzes im Vergleich zu generischen Modelle sind. Ein Nebenaspekt dieser Arbeit ist zudem die Verwendung des Klassifikationsoutputs in der korpus-basierten kontrastiven Linguistik, denn Klassifikationsverfahren auf Grundlage interpretierbarer sprachlicher Merkmale sind in der Lage, empirische Erkenntnisse über die Konvergenzen und Divergenzen dieser Sprachvarietäten in Bezug auf Lexikon, Rechtschreibung, Morphologie und Syntax zu liefern.
Keyword: computational linguistics; Computerlinguistik; ddc:400; Korpus; language identification; language varieties; Linguistik; natural language processing; Sprachvariante
URL: https://doi.org/10.22028/D291-23660
http://nbn-resolving.org/urn:nbn:de:bsz:291-scidok-66749
BASE
Hide details
34
Improving translation memory matching and retrieval using paraphrases
In: 30 ; 1 ; 19 ; 40 (2016)
BASE
Show details
35
A Minimally Supervised Approach for Synonym Extraction with Word Embeddings
In: Prague Bulletin of Mathematical Linguistics , Vol 105, Iss 1, Pp 111-142 (2016) (2016)
BASE
Show details
36
Statistical post-editing and quality estimation for machine translation systems
Béchara, Hanna. - : Dublin City University. School of Computing, 2014
In: Béchara, Hanna (2014) Statistical post-editing and quality estimation for machine translation systems. Master of Science thesis, Dublin City University. (2014)
BASE
Show details
37
Predicting sentence translation quality using extrinsic and language independent features
In: Bicici, Ergun, Groves, Declan and van Genabith, Josef orcid:0000-0003-1322-7944 (2013) Predicting sentence translation quality using extrinsic and language independent features. Machine Translation, 27 (3-4). pp. 171-192. ISSN 0922-6567 (2013)
BASE
Show details
38
Working with a small dataset - semi-supervised dependency parsing for Irish
In: Lynn, Teresa, Foster, Jennifer orcid:0000-0002-7789-4853 , Dras, Mark orcid:0000-0001-9908-7182 and van Genabith, Josef orcid:0000-0003-1322-7944 (2013) Working with a small dataset - semi-supervised dependency parsing for Irish. In: Fourth Workshop on Statistical Parsing of Morphologically Rich Languages, 18 Oct 2013, Seattle, WA. USA. (2013)
BASE
Show details
39
Computer assisted (language) learning (CA(L)L) for the inclusive classroom
Greene, Cara N.. - : Dublin City University. Centre for Next Generation Localisation (CNGL), 2013. : Dublin City University. National Centre for Language Technology (NCLT), 2013. : Dublin City University. School of Computing, 2013
In: Greene, Cara N. (2013) Computer assisted (language) learning (CA(L)L) for the inclusive classroom. PhD thesis, Dublin City University. (2013)
BASE
Show details
40
Domain adaptation for statistical machine translation of corporate and user-generated content
Banerjee, Pratyush. - : Dublin City University. School of Computing, 2013
In: Banerjee, Pratyush (2013) Domain adaptation for statistical machine translation of corporate and user-generated content. PhD thesis, Dublin City University. (2013)
BASE
Show details

Page: 1 2 3 4 5 6...9

Catalogues
0
0
1
0
5
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
161
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern