1 |
HuSpaCy: an industrial-strength Hungarian natural language processing toolkit ...
|
|
|
|
Abstract:
Although there are a couple of open-source language processing pipelines available for Hungarian, none of them satisfies the requirements of today's NLP applications. A language processing pipeline should consist of close to state-of-the-art lemmatization, morphosyntactic analysis, entity recognition and word embeddings. Industrial text processing applications have to satisfy non-functional software quality requirements, what is more, frameworks supporting multiple languages are more and more favored. This paper introduces HuSpaCy, an industry-ready Hungarian language processing toolkit. The presented tool provides components for the most important basic linguistic analysis tasks. It is open-source and is available under a permissive license. Our system is built upon spaCy's NLP components resulting in an easily usable, fast yet accurate application. Experiments confirm that HuSpaCy has high accuracy while maintaining resource-efficient prediction capabilities. ... : Camera-ready manuscript: - Fixed various grammatical error. - Restructured the evaluation section. - Updated scores in accordance with the v0.4.2 release ...
|
|
Keyword:
68T50; Computation and Language cs.CL; FOS Computer and information sciences; I.2.7; Machine Learning stat.ML
|
|
URL: https://arxiv.org/abs/2201.01956 https://dx.doi.org/10.48550/arxiv.2201.01956
|
|
BASE
|
|
Hide details
|
|
9 |
Universal Dependencies 2.2
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-01930733 ; 2018 (2018)
|
|
BASE
|
|
Show details
|
|
12 |
Universal Dependencies 2.1
|
|
|
|
In: https://hal.inria.fr/hal-01682188 ; 2017 (2017)
|
|
BASE
|
|
Show details
|
|
15 |
Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data
|
|
|
|
BASE
|
|
Show details
|
|
|
|