2 |
The corpora they are a-changing: a case study in Italian newspapers
|
|
|
|
In: Basile, Pierpaolo orcid:0000-0002-0545-1105 , Caputo, Annalina orcid:0000-0002-7144-8545 , Caselli, Tommaso orcid:0000-0003-2936-0256 , Cassotti, Pierluigi and Varvara, Rossella orcid:0000-0001-9957-2807 (2021) The corpora they are a-changing: a case study in Italian newspapers. In: 2nd International Workshop on Computational Approaches to Historical Language Change 2021, Online. (2021)
|
|
BASE
|
|
Show details
|
|
3 |
DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues
|
|
|
|
In: Castilho, Sheila orcid:0000-0002-8416-6555 , Cavalheiro Camargo, João Lucas orcid:0000-0003-3746-1225 , Menezes, Miguel and Way, Andy orcid:0000-0001-5736-5930 (2021) DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues. In: Sixth Conference on Machine Translation (WMT21), 10-11 Nov 2021, Punta Cana, Dominican Republic (Online). ISBN 978-1-954085-94-7 (2021)
|
|
BASE
|
|
Show details
|
|
4 |
English machine reading comprehension: new approaches to answering multiple-choice questions
|
|
Dzendzik, Daria. - : Dublin City University. School of Computing, 2021. : Dublin City University. ADAPT, 2021
|
|
In: Dzendzik, Daria (2021) English machine reading comprehension: new approaches to answering multiple-choice questions. PhD thesis, Dublin City University. (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Chinese character decomposition for neural MT with multi-word expressions
|
|
|
|
In: Han, Lifeng orcid:0000-0002-3221-2185 , Jones, Gareth J.F. orcid:0000-0003-2923-8365 , Smeaton, Alan F. orcid:0000-0003-1028-8389 and Bolzoni, Paolo (2021) Chinese character decomposition for neural MT with multi-word expressions. In: 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), 31 May- 2 June 2021, Reykjavik, Iceland (Online). (In Press) (2021)
|
|
BASE
|
|
Show details
|
|
6 |
cushLEPOR uses LABSE distilled knowledge to improve correlation with human translation evaluations
|
|
|
|
In: Erofeev, Gleb, Sorokina, Irina, Han, Lifeng orcid:0000-0002-3221-2185 and Gladkoff, Serge (2021) cushLEPOR uses LABSE distilled knowledge to improve correlation with human translation evaluations. In: Machine Translation Summit 2021, 16-20 Aug 2021, USA (online). (In Press) (2021)
|
|
BASE
|
|
Show details
|
|
7 |
Monte Carlo modelling of confidence intervals in translation quality evaluation (TQE) and post-editing dstance (PED) measurement
|
|
|
|
In: Alekseeva, Alexandra orcid:0000-0002-7990-4592 , Gladkoff, Serge, Sorokina, Irina and Han, Lifeng orcid:0000-0002-3221-2185 (2021) Monte Carlo modelling of confidence intervals in translation quality evaluation (TQE) and post-editing dstance (PED) measurement. In: Metrics 2021: Workshop on Informetric and Scientometric Research (SIG-MET), 23-24 Oct 2021, Online. (2021)
|
|
BASE
|
|
Show details
|
|
8 |
Meta-evaluation of machine translation evaluation methods
|
|
|
|
In: Han, Lifeng orcid:0000-0002-3221-2185 (2021) Meta-evaluation of machine translation evaluation methods. In: Workshop on Informetric and Scientometric Research (SIG-MET), 23-24 Oct 2021, Online. (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Proactive information retrieval
|
|
Sen, Procheta. - : Dublin City University. School of Computing, 2021. : Dublin City University. ADAPT, 2021
|
|
In: Sen, Procheta (2021) Proactive information retrieval. PhD thesis, Dublin City University. (2021)
|
|
BASE
|
|
Show details
|
|
10 |
Is there a bilingual disadvantage for word segmentation? A computational modeling approach
|
|
|
|
In: ISSN: 0305-0009 ; EISSN: 1469-7602 ; Journal of Child Language ; https://hal.archives-ouvertes.fr/hal-03498905 ; Journal of Child Language, Cambridge University Press (CUP), 2021, pp.1-28. ⟨10.1017/S0305000921000568⟩ (2021)
|
|
BASE
|
|
Show details
|
|
11 |
SCALa: A blueprint for computational models of language acquisition in social context
|
|
|
|
In: ISSN: 0010-0277 ; EISSN: 1873-7838 ; Cognition ; https://hal.inria.fr/hal-03373586 ; Cognition, Elsevier, 2021, 213, pp.104779. ⟨10.1016/j.cognition.2021.104779⟩ (2021)
|
|
BASE
|
|
Show details
|
|
12 |
Buzz or Change: How the Social Network Structure Conditions the Fate of Lexical Innovations on Twitter
|
|
|
|
In: 8th Conference on CMC and Social Media Corpora for the Humanities (CMC-Corpora 2021) ; https://hal.archives-ouvertes.fr/hal-03426028 ; 8th Conference on CMC and Social Media Corpora for the Humanities (CMC-Corpora 2021), Oct 2021, Nijmegen, Radboud University, Netherlands (2021)
|
|
BASE
|
|
Show details
|
|
13 |
Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics ; Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics: Dagstuhl Seminar 21351
|
|
|
|
In: Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-03507948 ; Universals of Linguistic Idiosyncrasy in Multilingual Computational Linguistics, Aug 2021, pp.89--138, 2021, 2192-5283. ⟨10.4230/DagRep.11.7.89⟩ ; https://gitlab.com/unlid/dagstuhl-seminar/-/wikis/home (2021)
|
|
BASE
|
|
Show details
|
|
14 |
Do Infants Really Learn Phonetic Categories?
|
|
|
|
In: EISSN: 2470-2986 ; Open Mind ; https://hal.archives-ouvertes.fr/hal-03550830 ; Open Mind, MIT Press, 2021, 5, pp.113-131. ⟨10.1162/opmi_a_00046⟩ (2021)
|
|
BASE
|
|
Show details
|
|
15 |
Type-logical investigations: proof-theoretic, computational and linguistic aspects of modern type-logical grammars
|
|
|
|
In: https://hal-lirmm.ccsd.cnrs.fr/tel-03452731 ; Computation and Language [cs.CL]. Université Montpellier, 2021 (2021)
|
|
BASE
|
|
Show details
|
|
16 |
Weak supervision for learning discourse structure in multi-party dialogues ; Supervision distante pour l'apprentissage de structures discursives dans les conversations multi-locuteurs
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03622653 ; Artificial Intelligence [cs.AI]. Université Paul Sabatier - Toulouse III, 2021. English. ⟨NNT : 2021TOU30138⟩ (2021)
|
|
Abstract:
The main objective of this thesis is to improve the automatic capture of semantic information with the goal of modeling and understanding human communication. We have advanced the state of the art in discourse parsing, in particular in the retrieval of discourse structure from chat, in order to implement, at the industrial level, tools to help explore conversations. These include the production of automatic summaries, recommendations, dialogue acts detection, identification of decisions, planning and semantic relations between dialogue acts in order to understand dialogues. In multi-party conversations it is important to not only understand the meaning of a participant's utterance and to whom it is addressed, but also the semantic relations that tie it to other utterances in the conversation and give rise to different conversation threads. An answer must be recognized as an answer to a particular question; an argument, as an argument for or against a proposal under discussion; a disagreement, as the expression of a point of view contrasted with another idea already expressed. Unfortunately, capturing such information using traditional supervised machine learning methods from quality hand-annotated discourse data is costly and time-consuming, and we do not have nearly enough data to train these machine learning models, much less deep learning models. Another problem is that arguably, no amount of data will be sufficient for machine learning models to learn the semantic characteristics of discourse relations without some expert guidance; the data are simply too sparse. Long distance relations, in which an utterance is semantically connected not to the immediately preceding utterance, but to another utterance from further back in the conversation, are particularly difficult and rare, though often central to comprehension. It is therefore necessary to find a more efficient way to retrieve discourse structures from large corpora of multi-party conversations, such as meeting transcripts or chats. This is one goal this thesis achieves. In addition, we not only wanted to design a model that predicts discourse structure for multi-party conversation without requiring large amounts of hand-annotated data, but also to develop an approach that is transparent and explainable so that it can be modified and improved by experts. The method detailed in this thesis achieves this goal as well. ; L'objectif principal de cette thèse est d'améliorer l'inférence automatique pour la modélisation et la compréhension des communications humaines. En particulier, le but est de faciliter considérablement l'analyse du discours afin d'implémenter, au niveau industriel, des outils d'aide à l'exploration des conversations. Il s'agit notamment de la production de résumés automatiques, de recommandations, de la détection des actes de dialogue, de l'identification des décisions, de la planification et des relations sémantiques entre les actes de dialogue afin de comprendre les dialogues. Dans les conversations à plusieurs locuteurs, il est important de comprendre non seulement le sens de l'énoncé d'un locuteur et à qui il s'adresse, mais aussi les relations sémantiques qui le lient aux autres énoncés de la conversation et qui donnent lieu à différents fils de discussion. Une réponse doit être reconnue comme une réponse à une question particulière ; un argument, comme un argument pour ou contre une proposition en cours de discussion ; un désaccord, comme l'expression d'un point de vue contrasté par rapport à une autre idée déjà exprimée. Malheureusement, les données de discours annotées à la main et de qualités sont coûteuses et prennent du temps, et nous sommes loin d'en avoir assez pour entraîner des modèles d'apprentissage automatique traditionnels, et encore moins des modèles d'apprentissage profond. Il est donc nécessaire de trouver un moyen plus efficace d'annoter en structures discursives de grands corpus de conversations multi-locuteurs, tels que les transcriptions de réunions ou les chats. Un autre problème est qu'aucune quantité de données ne sera suffisante pour permettre aux modèles d'apprentissage automatique d'apprendre les caractéristiques sémantiques des relations discursives sans l'aide d'un expert ; les données sont tout simplement trop rares. Les relations de longue distance, dans lesquelles un énoncé est sémantiquement connecté non pas à l'énoncé qui le précède immédiatement, mais à un autre énoncé plus antérieur/tôt dans la conversation, sont particulièrement difficiles et rares, bien que souvent centrales pour la compréhension. Notre objectif dans cette thèse a donc été non seulement de concevoir un modèle qui prédit la structure du discours pour une conversation multipartite sans nécessiter de grandes quantités de données annotées manuellement, mais aussi de développer une approche qui soit transparente et explicable afin qu'elle puisse être modifiée et améliorée par des experts.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; Attachements; Attachment; Computational linguistics; Data programming; Discourse relations; Discourse structure; Linguistique computationnelle; Programmation par les données; Relations discursives; Structure discursive; Supervision distante; Weak supervision
|
|
URL: https://tel.archives-ouvertes.fr/tel-03622653/document https://tel.archives-ouvertes.fr/tel-03622653/file/2021TOU30138b.pdf https://tel.archives-ouvertes.fr/tel-03622653
|
|
BASE
|
|
Hide details
|
|
18 |
Arc-Eager Construction Provides Learning Advantage Beyond Stack Management
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Neuro-computational models of language processing
|
|
|
|
In: EISSN: 2333-9691 ; Annual Review of Linguistics ; https://hal.archives-ouvertes.fr/hal-03334485 ; Annual Review of Linguistics, Annual Reviews, In press, ⟨10.1146/lingbuzz/006147⟩ (2021)
|
|
BASE
|
|
Show details
|
|
20 |
Handling Heavily Abbreviated Manuscripts: HTR engines vs text normalisation approaches
|
|
|
|
In: International Conference on Document Analysis and Recognition 2021 ; https://hal-enc.archives-ouvertes.fr/hal-03279602 ; International Conference on Document Analysis and Recognition 2021, 2021, Lausanne, Switzerland. pp.306-316, ⟨10.1007/978-3-030-86159-9_21⟩ (2021)
|
|
BASE
|
|
Show details
|
|
|
|