1 |
Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings ...
|
|
|
|
Abstract:
We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a novel adaptation of the triplet loss into a linear classification objective. We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings for clustering. Our model achieves a new state-of-the-art on a standard stream clustering dataset of English documents. ... : To appear in Proceedings of The 16th Conference of the European Chapter of the Association for Computational Linguistics ...
|
|
Keyword:
Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences; I.2.7; Information Retrieval cs.IR
|
|
URL: https://arxiv.org/abs/2101.11059 https://dx.doi.org/10.48550/arxiv.2101.11059
|
|
BASE
|
|
Hide details
|
|
2 |
Improving Factual Consistency of Abstractive Summarization via Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
InfoSurgeon: Cross-Media Fine-grained Information Consistency Checking for Fake News Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Cross-language Sentence Selection via Data Augmentation and Rationale Training ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Event Guided Denoising for Multilingual Relation Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for Low Resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Event-Guided Denoising for Multilingual Relation Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Zero-Shot Stance Detection: A Dataset and Model using Generalized Topic Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Incorporating Terminology Constraints in Automatic Post-Editing ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
A Unified Feature Representation for Lexical Connotations ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Contextual Analysis of Social Media: The Promise and Challenge of Eliciting Context in Social Media Posts with Natural Language Processing
|
|
|
|
In: Proc AAAI ACM Conf AI Ethics Soc (2020)
|
|
BASE
|
|
Show details
|
|
17 |
Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars
|
|
|
|
BASE
|
|
Show details
|
|
|
|