1 |
Quality Assurance of Generative Dialog Models in an Evolving Conversational Agent Used for Swedish Language Practice ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Slangvolution: A Causal Analysis of Semantic Change and Frequency Dynamics in Slang ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Similarity between person roles in a card sorting experiment ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Ensemble of Opinion Dynamics Models to Understand the Role of the Undecided in the Vaccination Debate ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Generating Authentic Adversarial Examples beyond Meaning-preserving with Doubly Round-trip Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Pirá: A Bilingual Portuguese-English Dataset for Question-Answering about the Ocean ...
|
|
Paschoal, André F. A.; Pirozelli, Paulo; Freire, Valdinei; Delgado, Karina V.; Peres, Sarajane M.; José, Marcos M.; Nakasato, Flávio; Oliveira, André S.; Brandão, Anarosa A. F.; Costa, Anna H. R.; Cozman, Fabio G.. - : arXiv, 2022
|
|
Abstract:
Current research in natural language processing is highly dependent on carefully produced corpora. Most existing resources focus on English; some resources focus on languages such as Chinese and French; few resources deal with more than one language. This paper presents the Pirá dataset, a large set of questions and answers about the ocean and the Brazilian coast both in Portuguese and English. Pirá is, to the best of our knowledge, the first QA dataset with supporting texts in Portuguese, and, perhaps more importantly, the first bilingual QA dataset that includes this language. The Pirá dataset consists of 2261 properly curated question/answer (QA) sets in both languages. The QA sets were manually created based on two corpora: abstracts related to the Brazilian coast and excerpts of United Nation reports about the ocean. The QA sets were validated in a peer-review process with the dataset contributors. We discuss some of the advantages as well as limitations of Pirá, as this new resource can support a set ... : https://github.com/C4AI/Pira ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2202.02398 https://arxiv.org/abs/2202.02398
|
|
BASE
|
|
Hide details
|
|
11 |
A comparative study of several parameterizations for speaker recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
A Neural Pairwise Ranking Model for Readability Assessment ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
A bilingual approach to specialised adjectives through word embeddings in the karstology domain ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Speaker verification in mismatch training and testing conditions ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Universal Conditional Masked Language Pre-training for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
SMDT: Selective Memory-Augmented Neural Document Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Learning How to Translate North Korean through South Korean ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation? ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Conditional Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|