1 |
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
|
|
|
|
In: Proceedings of the International Workshop on Challenges & Perspectives in Creating Large Language Models 2022 (BigScience 2022) ; https://hal.inria.fr/hal-03639144 ; Proceedings of the International Workshop on Challenges & Perspectives in Creating Large Language Models 2022 (BigScience 2022), May 2022, Dublin, France (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0 ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model ...
|
|
|
|
Abstract:
In this work, we show the process of building a large-scale training set from digital and digitized collections at a national library. The resulting Bidirectional Encoder Representations from Transformers (BERT)-based language model for Norwegian outperforms multilingual BERT (mBERT) models in several token and sequence classification tasks for both Norwegian Bokmål and Norwegian Nynorsk. Our model also improves the mBERT performance for other languages present in the corpus such as English, Swedish, and Danish. For languages not included in the corpus, the weights degrade moderately while keeping strong multilingual properties. Therefore, we show that building high-quality models within a memory institution using somewhat noisy optical character recognition (OCR) content is feasible, and we hope to pave the way for other memory institutions to follow. ... : Accepted to NoDaLiDa 2021 ...
|
|
Keyword:
Computation and Language cs.CL; Digital Libraries cs.DL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2104.09617 https://dx.doi.org/10.48550/arxiv.2104.09617
|
|
BASE
|
|
Hide details
|
|
4 |
The futility of STILTs for the classification of lexical borrowings in Spanish ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Automatic quantitative metrical analysis of Spanish Poetry with Rantanplan: a first approach ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Automatic quantitative metrical analysis of Spanish Poetry with Rantanplan: a first approach ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Rantanplan: Fast and Accurate Syllabification and Scansion of Spanish Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Rantanplan: Fast and Accurate Syllabification and Scansion of Spanish Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
PoetryLab as Infrastructure for the Analysis of Spanish Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
PoetryLab: An Open Source Toolkit for the Analysis of Spanish Poetry Corpora ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Rantanplan: Fast and Accurate Syllabification and Scansion of Spanish Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Rantanplan: Fast and Accurate Syllabification and Scansion of Spanish Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
PoetryLab as Infrastructure for the Analysis of Spanish Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
PoetryLab as Infrastructure for the Analysis of Spanish Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
PoetryLab as Infrastructurefor the Analysis of Spanish Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
PoetryLab as Infrastructure for the Analysis of Spanish Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
PoetryLab: An Open Source Toolkit for the Analysis of Spanish Poetry Corpora ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
PoetryLab as Infrastructurefor the Analysis of Spanish Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
PoetryLab An Open Source Toolkit for the Analysis of Spanish Poetry Corpora ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
PoetryLab An Open Source Toolkit for the Analysis of Spanish Poetry Corpora ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|