DE eng

Search in the Catalogues and Directories

Hits 1 – 10 of 10

1
Data Weighted Training Strategies for Grammatical Error Correction ...
BASE
Show details
2
No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models ...
BASE
Show details
3
Multilingual Open Relation Extraction Using Cross-lingual Projection ...
Faruqui, Manaal; Kumar, Shankar. - : arXiv, 2015
BASE
Show details
4
1993-2007 United Nations Parallel Text
Franz, Alex; Kumar, Shankar; Brants, Thorsten. - : Linguistic Data Consortium, 2013. : https://www.ldc.upenn.edu, 2013
Abstract: *Introduction* 1993-2007 United Nations Parallel Text was developed by Google Research. It consists of United Nations (UN) parliamentary documents from 1993 through 2007 in the official languages of the UN: Arabic, Chinese, English, French, Russian, and Spanish. There are 673,670 raw text documents and 520,283 word alignment documents. UN parliamentary documents are available from the UN Official Document System (UN ODS) at http://ods.un.org/. UN ODS, in its main UNDOC database, contains the full text of all types of UN parliamentary documents. It has complete coverage datng from 1993 and variable coverage before that. Documents exist in one or more of the official languages of the UN: Arabic, Chinese, English, French, Russian, and Spanish. UN ODS also contains a large number of German documents, marked with the language other, but these are not included in this dataset. For more information, see the UN ODS documentation at http://documents.un.org/help_E.htm. For more details of the UN bibliographic systems, see http://www.un.org/depts/dhl/unbisref_manual/. LDC has released parallel UN parliamentary documents in English, French and Spanish spanning the period 1988-1993, UN Parallel Text (Complete) (LDC94T4A). *Data* The data is presented as raw text and word-aligned text. The raw text is very close to what was extracted from the original word processing documents in UN ODS (e.g., Word, WordPerfect, PDF), converted to UTF-8 encoding. The word-aligned text was normalized, tokenized, aligned at the sentence-level, further broken into sub-sentential chunk-pairs, and then aligned at the word. The sentence, chunk, and word alignment operations were performed separately for each individual language pair. The files are presented in tar files and compressed using the bzip2 compression utility. The bzip2 utility is standard in most Linux releases. For Windows users, there are a variety of decompression software options. 7-Zip will decompress tar and bzip2 formats. Note that in the data/aligned folder, the en-zh-1993.tar.bz2 and en-zh-1994.tar.bz2 archives decompress into empty folders. This is intentional as there is no Chinese aligned data for those two years. *Samples* Please view this raw English sample, raw French sample, aligned English-French sample. *Updates* None at this time.
URL: https://catalog.ldc.upenn.edu/LDC2013T06
BASE
Hide details
5
1993-2007 United Nations Parallel Text ...
Franz, Alex; Kumar, Shankar; Brants, Thorsten. - : Linguistic Data Consortium, 2013
BASE
Show details
6
Segmentation and alignment of parallel text for statistical machine translation
In: Natural language engineering. - Cambridge : Cambridge University Press 13 (2007) 3, 235-260
BLLDB
Show details
7
A weighted finite state transducer translation template model for statistical machine translation
In: Natural language engineering. - Cambridge : Cambridge University Press 12 (2006) 1, 35-75
BLLDB
Show details
8
Minimum Bayes-Risk Decoding for Statistical Machine Translation
In: DTIC (2004)
BASE
Show details
9
Normalization of non-standard words
In: Computer speech and language. - Amsterdam [u.a.] : Elsevier 15 (2001) 3, 287-334
OLC Linguistik
Show details
10
Normalization of non-standard words
In: Computer speech and language. - Amsterdam [u.a.] : Elsevier 15 (2001) 3, 287-333
BLLDB
Show details

Catalogues
0
0
1
0
0
0
0
Bibliographies
3
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
6
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern