2 |
Investigating alignment interpretability for low-resource NMT
|
|
|
|
In: ISSN: 0922-6567 ; EISSN: 1573-0573 ; Machine Translation ; https://hal.archives-ouvertes.fr/hal-03139744 ; Machine Translation, Springer Verlag, 2021, ⟨10.1007/s10590-020-09254-w⟩ (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Is there a bilingual disadvantage for word segmentation? A computational modeling approach
|
|
|
|
In: ISSN: 0305-0009 ; EISSN: 1469-7602 ; Journal of Child Language ; https://hal.archives-ouvertes.fr/hal-03498905 ; Journal of Child Language, Cambridge University Press (CUP), 2021, pp.1-28. ⟨10.1017/S0305000921000568⟩ (2021)
|
|
BASE
|
|
Show details
|
|
4 |
SM to: Is there a bilingual disadvantage for word segmentation? A computational modeling approach ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Early Tashelhiyt Berber word segmentation: the role of the Possible Word Constraint ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Discovering structure in speech recordings: Unsupervised learning of word and phoneme like units for automatic speech recognition
|
|
|
|
In: Fraunhofer IAIS (2021)
|
|
BASE
|
|
Show details
|
|
7 |
Handling cross and out-of-domain samples in Thai word segmentation
|
|
|
|
In: 1003 ; 1016 (2021)
|
|
BASE
|
|
Show details
|
|
8 |
Measuring (online) word segmentation in adults and children
|
|
|
|
In: Dutch Journal of Applied Linguistics, Vol 10 (2021) (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Investigating Language Impact in Bilingual Approaches for Computational Language Documentation
|
|
|
|
In: Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020), ; SLTU-CCURL workshop, LREC 2020 ; https://hal.archives-ouvertes.fr/hal-02895907 ; SLTU-CCURL workshop, LREC 2020, May 2020, Marseille, France (2020)
|
|
Abstract:
International audience ; For endangered languages, data collection campaigns have to accommodate the challenge that many of them are from oral tradition, and producing transcriptions is costly. Therefore, it is fundamental to translate them into a widely spoken language to ensure interpretability of the recordings. In this paper we investigate how the choice of translation language affects the posterior documentation work and potential automatic approaches which will work on top of the produced bilingual corpus. For answering this question, we use the MaSS multilingual speech corpus (Boito et al., 2020) for creating 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment. Our results highlight that the choice of language for translation influences the word segmentation performance, and that different lexicons are learned by using different aligned translations. Lastly, this paper proposes a hybrid approach for bilingual word segmentation, combining boundary clues extracted from a non-parametric Bayesian model (Goldwater et al., 2009a) with the attentional word segmentation neural model from Godard et al. (2018). Our results suggest that incorporating these clues into the neural models' input representation increases their translation and alignment quality, specially for challenging language pairs.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO]Computer Science [cs]; attention mechanism; computational language documentation; sequence-to-sequence models; word segmentation
|
|
URL: https://hal.archives-ouvertes.fr/hal-02895907 https://hal.archives-ouvertes.fr/hal-02895907/document https://hal.archives-ouvertes.fr/hal-02895907/file/2003.13325.pdf
|
|
BASE
|
|
Hide details
|
|
10 |
F0 Slope and Mean: Cues to Speech Segmentation in French
|
|
|
|
In: Interspeech 2020 ; https://hal.archives-ouvertes.fr/hal-03042331 ; Interspeech 2020, Oct 2020, Shanghai, China. pp.1610-1614, ⟨10.21437/Interspeech.2020-2509⟩ (2020)
|
|
BASE
|
|
Show details
|
|
11 |
The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Infants Segment Words from Songs—An EEG Study
|
|
|
|
In: Brain Sciences ; Volume 10 ; Issue 1 (2020)
|
|
BASE
|
|
Show details
|
|
16 |
Not all words are equally acquired: transitional probabilities and instructions affect the electrophysiological correlates of statistical learning
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Controlling Utterance Length in NMT-based Word Segmentation with Attention
|
|
|
|
In: International Workshop on Spoken Language Translation ; https://hal.archives-ouvertes.fr/hal-02343206 ; International Workshop on Spoken Language Translation, Nov 2019, Hong-Kong, China (2019)
|
|
BASE
|
|
Show details
|
|
18 |
Segmentability Differences Between Child-Directed and Adult-Directed Speech: A Systematic Test With an Ecologically Valid Corpus
|
|
|
|
In: EISSN: 2470-2986 ; Open Mind ; https://hal.archives-ouvertes.fr/hal-02274050 ; Open Mind, MIT Press, 2019, 3, pp.13-22. ⟨10.1162/opmi_a_00022⟩ (2019)
|
|
BASE
|
|
Show details
|
|
19 |
Unsupervised word discovery for computational language documentation ; Découverte non-supervisée de mots pour outiller la linguistique de terrain
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-02286425 ; Artificial Intelligence [cs.AI]. Université Paris Saclay (COmUE), 2019. English. ⟨NNT : 2019SACLS062⟩ (2019)
|
|
BASE
|
|
Show details
|
|
20 |
MiNgMatch—A Fast N-gram Model for Word Segmentation of the Ainu Language
|
|
|
|
In: Information ; Volume 10 ; Issue 10 (2019)
|
|
BASE
|
|
Show details
|
|
|
|