1 |
Improved statistical machine translation using monolingual paraphrases ...
|
|
|
|
Abstract:
We propose a novel monolingual sentence paraphrasing method for augmenting the training data for statistical machine translation systems "for free" -- by creating it from data that is already available rather than having to create more aligned data. Starting with a syntactic tree, we recursively generate new sentence variants where noun compounds are paraphrased using suitable prepositions, and vice-versa -- preposition-containing noun phrases are turned into noun compounds. The evaluation shows an improvement equivalent to 33%-50% of that of doubling the amount of training data. ... : machine translation, SMT, paraphrasing, data augmentation. arXiv admin note: substantial text overlap with arXiv:1912.01113 ...
|
|
Keyword:
68T50; Artificial Intelligence cs.AI; Computation and Language cs.CL; F.2.2; I.2.7; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2109.15119 https://dx.doi.org/10.48550/arxiv.2109.15119
|
|
BASE
|
|
Hide details
|
|
2 |
Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
A Neighbourhood Framework for Resource-Lean Content Flagging ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
SUper Team at SemEval-2016 Task 3: Building a feature-rich system for community question answering ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
RuleBERT: Teaching Soft Rules to Pre-Trained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|