1 |
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
|
|
|
|
In: https://hal.inria.fr/hal-03550289 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Linguistic resources for paraphrase generation in Portuguese: a Lexicon-Grammar approach
|
|
|
|
In: ISSN: 1574-020X ; EISSN: 1574-0218 ; Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-03548861 ; Language Resources and Evaluation, Springer Verlag, 2022, ⟨10.1007/s10579-021-09561-5⟩ ; https://link.springer.com/article/10.1007/s10579-021-09561-5 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Machine Learning approaches for Topic and Sentiment Analysis in multilingual opinions and low-resource languages: From English to Guarani
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Investigating alignment interpretability for low-resource NMT
|
|
|
|
In: ISSN: 0922-6567 ; EISSN: 1573-0573 ; Machine Translation ; https://hal.archives-ouvertes.fr/hal-03139744 ; Machine Translation, Springer Verlag, 2021, ⟨10.1007/s10590-020-09254-w⟩ (2021)
|
|
Abstract:
International audience ; The attention mechanism in Neural Machine Translation (NMT) models added flexibility to translation systems, and the possibility to visualize soft-alignments between source and target representations. While there is much debate about the relationship between attention and the yielded output for neural models [26, 35, 43, 38], in this paper we propose a different assessment, investigating soft-alignment interpretability in low-resource scenarios. We experimented with different architectures (RNN [5], 2D-CNN [15], and Transformer [39]), comparing them with regards to their ability to produce directly exploitable alignments. For evaluating exploitability, we replicated the Unsupervised Word Segmentation (UWS) task from Godard et al. [22]. There, source words are translated into unsegmented phone sequences. Posterior to training, the resulting soft-alignments are used for producing segmentation over the target side. Our results showed that a RNN-based NMT model produced the most exploitable alignments in this scenario. We then investigated methods for increasing its UWS scores by comparing the following methodologies: monolingual pre-training, input representation augmentation (hybrid model), and explicit word length optimization during training. We reached the best results by using the hybrid model, which uses an intermediate monolingual-rooted segmentation from a non-parametric Bayesian model [25] to enrich the input representation before training.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO]Computer Science [cs]; attention mechanism; computational language documentation; low-resource languages; neural machine translation; sequence-tosequence models; unsupervised word segmentation
|
|
URL: https://doi.org/10.1007/s10590-020-09254-w https://hal.archives-ouvertes.fr/hal-03139744
|
|
BASE
|
|
Hide details
|
|
6 |
The Zero Resource Speech Challenge 2021: Spoken language modelling
|
|
|
|
In: ISSN: 0162-8828 ; IEEE Transactions on Pattern Analysis and Machine Intelligence ; https://hal.inria.fr/hal-03329301 ; IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2021, pp.1-1. ⟨10.1109/TPAMI.2021.3083839⟩ (2021)
|
|
BASE
|
|
Show details
|
|
7 |
The Zero Resource Speech Challenge 2021: Spoken language modelling
|
|
|
|
In: Interspeech 2021 - Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-03329301 ; Interspeech 2021 - Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic. ⟨10.1109/TPAMI.2021.3083839⟩ (2021)
|
|
BASE
|
|
Show details
|
|
8 |
Cross-lingual Representation Learning for Natural Language Processing
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Sign Language in Light of Mathematics Education:An Exploration Within Semiotic and Embodiment Theories of Learning Mathematics
|
|
|
|
In: American Annals of the Deaf, vol 166, iss 3 (2021)
|
|
BASE
|
|
Show details
|
|
10 |
Multitask Transformer Model-based Fintech Customer Service Chatbot NLU System with DECO-LGG SSP-based Data ; DECO-LGG 반자동 증강 학습데이터 활용 멀티태스크 트랜스포머 모델 기반 핀테크 CS 챗봇 NLU 시스템
|
|
|
|
In: Annual Conference on Human and Language Technology ; https://hal.archives-ouvertes.fr/hal-03603903 ; Annual Conference on Human and Language Technology, Oct 2021, Séoul, South Korea. pp.461-466 ; http://www.koreascience.or.kr/journal/OOGHAK.page (2021)
|
|
BASE
|
|
Show details
|
|
11 |
On Multi-domain Sentence Level Sentiment Analysis for Roman Urdu ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
From 'big' to 'much': On the grammaticalization of two gradable adjectives in Swedish ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
From 'big' to 'much': On the grammaticalization of two gradable adjectives in Swedish ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Creating Corpus-Informed Materials for the English as a Foreign Language Classroom: A step-by-step guide for (trainee) teachers using online resources ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Creating Corpus-Informed Materials for the English as a Foreign Language Classroom: A step-by-step guide for (trainee) teachers using online resources ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|