1 |
Methods for Evaluating Text Extraction Toolkits: An Exploratory Investigation
|
|
|
|
In: DTIC (2015)
|
|
BASE
|
|
Show details
|
|
2 |
A Comparative Study of PDF Generation Methods: Measuring Loss of Fidelity When Converting Arabic and Persian MS Word Files to PDF
|
|
|
|
In: DTIC (2011)
|
|
BASE
|
|
Show details
|
|
3 |
Reliable Electronic Text: The Elusive Prerequisite for a Host of Human Language Technologies
|
|
|
|
In: DTIC (2010)
|
|
Abstract:
Electronic text for use by human language technologies originates from a number of sources direct keyboard entry, optical character recognition, speech recognition, and text-containing computer files. In particular, text-containing computer files may elude processing by an array of human language technology applications (e.g., search, language ID, machine translation, and text analytics). This paper brings to light the effort required to extract electronic text from these files preserve its integrity, and, for some use cases, preserve its structure. It explores a series of specific human language technologies, highlighting the following aspects for each: relevant use cases, the impact of text extraction or conversion errors, the criticality of dependable text extraction and reliable electronic text, and the importance of experimentation and/or testing prior to use. Overall, this paper promotes the successful use of human language technology by equipping the reader to be discerning about the use of human language technology applications with text-containing files.
|
|
Keyword:
*COMPUTER FILES; *LANGUAGE; *MACHINE TRANSLATION; *MICROCOMPUTERS; *SPEECH RECOGNITION; *TEXT PROCESSING; Computer Programming and Software; Computer Systems Management and Standards; CONVERSION; ERRORS; HUMAN RESOURCES; HUMANS; Linguistics; OPTICAL CHARACTER RECOGNITION
|
|
URL: http://oai.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA546707 http://www.dtic.mil/docs/citations/ADA546707
|
|
BASE
|
|
Hide details
|
|
4 |
A Methodology for End-to-End Evaluation of Arabic Document Image Processing Software
|
|
|
|
In: DTIC (2006)
|
|
BASE
|
|
Show details
|
|
|
|