41 |
Introduction of Automation for the Production of Bilingual, Parallel-Aligned Text
|
|
|
|
In: DTIC (2011)
|
|
Abstract:
As the study and application of statistical machine translation (SMT) grows, progress is often circumscribed by a lack of data. The statistical models that govern statistical machine translation (SMT) engines rely on many large bilingual text corpora, each comprised of vast numbers of bilingual text segments. For certain languages, corpora already exist and help to power translation engines. Regrettably, this is not the case for every language the Army is interested in, making the creation or acquisition of such data a priority. To this end, a language expert in Dari and Pashto was hired, who collected, prepared, and ensured the quality of bilingual text. To explore ways in which to aid the expert, a variety of the steps performed by the expert and necessary to the process were automated. The hypothesis was that automation of selected processes would improve efficiency, measured in terms of both speed of production and quantity of data produced, even when time to correct automation-caused errors was accounted for. As predicted, the net result of introducing automation was an increase in both the rate of producing correct bilingual segments and the number produced. The implications of these results for improving larger bilingual data creation and acquisition efforts are discussed. ; The original document contains color images.
|
|
Keyword:
*AFGHANISTAN; *AUTOMATION; *BILINGUAL DATA PRODUCTION; *BILINGUAL PARALLEL TEXT; *DATA MINING; *ENGLISH LANGUAGE; *FOREIGN LANGUAGES; *MACHINE TRANSLATION; *STATISTICAL ANALYSIS; *STATISTICAL MACHINE TRANSLATION; ACCURACY; ALIGNMENT; ARMY OPERATIONS; ARMY PERSONNEL; Computer Programming and Software; DARI LANGUAGE; DARI-ENGLISH TRANSLATION; EFFICIENCY; Linguistics; NATURAL LANGUAGE; PARSERS; PASHTO LANGUAGE; PASHTO-ENGLISH TRANSLATION; PIPELINE PROJECT; PRODUCTION; SEGMENTATION; SOFTWARE TOOLS; Statistics and Probability
|
|
URL: http://www.dtic.mil/docs/citations/ADA552756 http://oai.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA552756
|
|
BASE
|
|
Hide details
|
|
44 |
An OCL-Based approach to derive constraint test cases for database applications
|
|
|
|
BASE
|
|
Show details
|
|
45 |
Verification Architectures: Compositional Reasoning for Real-time Systems
|
|
|
|
In: Integrated Formal Methods - IFM 2010 ; https://hal.inria.fr/inria-00525132 ; Integrated Formal Methods - IFM 2010, INRIA Nancy Grand Est, Oct 2010, Nancy, France. pp.152-167 (2010)
|
|
BASE
|
|
Show details
|
|
46 |
Maritime Domain Awareness via Agent Learning and Collaboration
|
|
|
|
In: DTIC (2010)
|
|
BASE
|
|
Show details
|
|
47 |
Enhancing a Web Crawler with Arabic Search Capability
|
|
|
|
In: DTIC (2010)
|
|
BASE
|
|
Show details
|
|
48 |
Entity Profiling for Intelligence Using the Graphical Overview of Social and Semantic Interactions of People (GOSSIP) Software Tool
|
|
|
|
In: DTIC (2010)
|
|
BASE
|
|
Show details
|
|
49 |
Why Smalltalk wins the host languages shootout
|
|
|
|
In: http://scg.unibe.ch/archive/papers/Reng09bLanguageShootout.pdf (2009)
|
|
BASE
|
|
Show details
|
|
51 |
Blog Fingerprinting: Identifying Anonymous Posts Written by an Author of Interest Using Word and Character Frequency Analysis
|
|
|
|
In: DTIC (2009)
|
|
BASE
|
|
Show details
|
|
52 |
Using Adversary Text to Detect Adversary Phase Changes
|
|
|
|
In: DTIC (2009)
|
|
BASE
|
|
Show details
|
|
53 |
Design Of Domain-Specific Software Systems With Parametric Code Templates ...
|
|
|
|
BASE
|
|
Show details
|
|
54 |
Design Of Domain-Specific Software Systems With Parametric Code Templates ...
|
|
|
|
BASE
|
|
Show details
|
|
55 |
CEMAP II: An Architecture and Specifications to Facilitate the Importing of Real-World Data into the CASOS Software Suite
|
|
|
|
In: DTIC (2008)
|
|
BASE
|
|
Show details
|
|
56 |
A Sensemaking Visualization Tool with Military Doctrinal Elements
|
|
|
|
In: DTIC (2008)
|
|
BASE
|
|
Show details
|
|
59 |
DESIDERATA FOR LINGUISTIC SOFTWARE DESIGN
|
|
|
|
In: International Journal of English Studies; Vol. 8 No. 1 (2008): Monograph: Software-aided Analysis of Language; 67-94 ; International Journal of English Studies; Vol. 8 Núm. 1 (2008): Monograph: Software-aided Analysis of Language; 67-94 ; 1989-6131 ; 1578-7044 (2008)
|
|
BASE
|
|
Show details
|
|
60 |
Conceiving and Implementing a language-oriented approach for the design of automated learning scenarios
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-00156874 ; Génie logiciel [cs.SE]. Université des Sciences et Technologie de Lille - Lille I, 2007. Français (2007)
|
|
BASE
|
|
Show details
|
|
|
|