Page: 1 2 3 4 5 6 7 8 9... 690
81 |
Adapting BigScience Multilingual Model to Unseen Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
82 |
On Efficiently Acquiring Annotations for Multilingual Models ...
|
|
|
|
BASE
|
|
Show details
|
|
83 |
Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
84 |
Does Corpus Quality Really Matter for Low-Resource Languages? ...
|
|
|
|
BASE
|
|
Show details
|
|
85 |
IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages ...
|
|
|
|
BASE
|
|
Show details
|
|
86 |
mSLAM: Massively multilingual joint pre-training for speech and text ...
|
|
|
|
BASE
|
|
Show details
|
|
87 |
On the Representation Collapse of Sparse Mixture of Experts ...
|
|
|
|
BASE
|
|
Show details
|
|
88 |
Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom ...
|
|
|
|
BASE
|
|
Show details
|
|
89 |
L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models ...
|
|
|
|
BASE
|
|
Show details
|
|
90 |
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts ...
|
|
|
|
Abstract:
Despite the advances in digital healthcare systems offering curated structured knowledge, much of the critical information still lies in large volumes of unlabeled and unstructured clinical texts. These texts, which often contain protected health information (PHI), are exposed to information extraction tools for downstream applications, risking patient identification. Existing works in de-identification rely on using large-scale annotated corpora in English, which often are not suitable in real-world multilingual settings. Pre-trained language models (LM) have shown great potential for cross-lingual transfer in low-resource settings. In this work, we empirically show the few-shot cross-lingual transfer property of LMs for named entity recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke domain. We annotate a gold evaluation dataset to assess few-shot setting performance where we only use a few hundred ... : Accepted by BioNLP'22 ...
|
|
Keyword:
Computation and Language cs.CL; Cryptography and Security cs.CR; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://arxiv.org/abs/2204.04775 https://dx.doi.org/10.48550/arxiv.2204.04775
|
|
BASE
|
|
Hide details
|
|
91 |
A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model ...
|
|
|
|
BASE
|
|
Show details
|
|
92 |
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
93 |
Factual Consistency of Multilingual Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
94 |
Examining Scaling and Transfer of Language Model Architectures for Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
95 |
MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
96 |
Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi ...
|
|
|
|
BASE
|
|
Show details
|
|
100 |
From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8 9... 690
|
|