2 |
Methods, Models and Tools for Improving the Quality of Textual Annotations
|
|
|
|
In: Modelling; Volume 3; Issue 2; Pages: 224-242 (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Models of diachronic semantic change using word embeddings ; Modèles diachroniques à base de plongements de mot pour l'analyse du changement sémantique
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03199801 ; Document and Text Processing. Université Paris-Saclay, 2021. English. ⟨NNT : 2021UPASG006⟩ (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Teachers of Color's Perception on Identity and Academic Success: A Reflective Narrative
|
|
|
|
In: All Antioch University Dissertations & Theses (2021)
|
|
BASE
|
|
Show details
|
|
5 |
A Survey on Multilingual Hate Speech Detection and Classification by Machine Learning Techniques ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
A Survey on Multilingual Hate Speech Detection and Classification by Machine Learning Techniques ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text ...
|
|
Ionov, Maxim. - : Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021
|
|
BASE
|
|
Show details
|
|
8 |
Towards Learning Terminological Concept Systems from Multilingual Natural Language Text ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Improving Multilingual Models for the Swedish Language : Exploring CrossLingual Transferability and Stereotypical Biases
|
|
|
|
BASE
|
|
Show details
|
|
10 |
NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish
|
|
|
|
In: Applied Sciences ; Volume 11 ; Issue 21 (2021)
|
|
BASE
|
|
Show details
|
|
11 |
Inductive Bias and Modular Design for Sample-Efficient Neural Language Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Analyzing Non-Textual Content Elements to Detect Academic Plagiarism
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Learning to scale multilingual representations for vision-language tasks
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Inductive Bias and Modular Design for Sample-Efficient Neural Language Learning
|
|
|
|
Abstract:
Most of the world's languages suffer from the paucity of annotated data. This curbs the effectiveness of supervised learning, the most widespread approach to modelling language. Instead, an alternative paradigm could take inspiration from the propensity of children to acquire language from limited stimuli, in order to enable machines to learn any new language from a few examples. The abstract mechanisms underpinning this ability include 1) a set of in-born inductive biases and 2) the deep entrenchment of language in other perceptual and cognitive faculties, combined with the ability to transfer and recombine knowledge across these domains. The main contribution of my thesis is giving concrete form to both these intuitions. Firstly, I argue that endowing a neural network with the correct inductive biases is equivalent to constructing a prior distribution over its weights and its architecture (including connectivity patterns and non-linear activations). This prior is inferred by "reverse-engineering" a representative set of observed languages and harnessing typological features documented by linguists. Thus, I provide a unified framework for cross-lingual transfer and architecture search by recasting them as hierarchical Bayesian neural models. Secondly, the skills relevant to different language varieties and different tasks in natural language processing are deeply intertwined. Hence, the neural weights modelling the data for each of their combinations can be imagined as lying in a structured space. I introduce a Bayesian generative model of this space, which is factorised into latent variables representing each language and each task. By virtue of this modular design, predictions can generalise to unseen combinations by extrapolating from the data of observed combinations. The proposed models are empirically validated on a spectrum of language-related tasks (character-level language modelling, part-of-speech tagging, named entity recognition, and common-sense reasoning) and a typologically diverse sample of about a hundred languages. Compared to a series of competitive baselines, they achieve better performances in new languages in zero-shot and few-shot learning settings. In general, they hold promise to extend state-of-the-art language technology to under-resourced languages by means of sample efficiency and robustness to the cross-lingual variation. ; ERC (Consolidator Grant 648909) Lexical Google Research Faculty Award 2018
|
|
Keyword:
Bayesian Models; Deep Learning; Inductive Bias; Linguistic Typology; Modularity; Multilingual Natural Language Processing; Neural Networks; Sample Efficiency; Systematic Generalisation
|
|
URL: https://doi.org/10.17863/CAM.66424 https://www.repository.cam.ac.uk/handle/1810/319303
|
|
BASE
|
|
Hide details
|
|
17 |
SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis
|
|
|
|
In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020)
|
|
BASE
|
|
Show details
|
|
18 |
Assessing English Writing in Multilingual Writers in Higher Education: A Longitudinal Study
|
|
|
|
In: Applied Linguistics and English as a Second Language Dissertations (2019)
|
|
BASE
|
|
Show details
|
|
19 |
Character language models for generalization of multilingual named entity recognition
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Multilingual Information Access (MLIA) Tools on Google and WorldCat: Bi/Multilingual University Students’ Experience and Perceptions
|
|
|
|
In: FIMS Publications (2019)
|
|
BASE
|
|
Show details
|
|
|
|