1 |
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
|
|
|
|
In: Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021) ; https://hal.archives-ouvertes.fr/hal-03466171 ; Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), Aug 2021, Online, France. pp.96-120, ⟨10.18653/v1/2021.gem-1.10⟩ (2021)
|
|
BASE
|
|
Show details
|
|
2 |
THEaiTRobot 1.0
|
|
Rosa, Rudolf; Dušek, Ondřej; Kocmi, Tom. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021. : The Švanda Theatre in Smíchov, 2021. : The Academy of Performing Arts in Prague, Theatre Faculty (DAMU), 2021
|
|
BASE
|
|
Show details
|
|
3 |
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
RankME: Reliable Human Ratings for Natural Language Generation ...
|
|
|
|
Abstract:
Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), which combines the use of continuous scales and relative assessments. We show that RankME significantly improves the reliability and consistency of human ratings compared to traditional evaluation methods. In addition, we show that it is possible to evaluate NLG systems according to multiple, distinct criteria, which is important for error analysis. Finally, we demonstrate that RankME, in combination with Bayesian estimation of system quality, is a cost-effective alternative for ranking multiple NLG systems. ... : Accepted to NAACL 2018 (The 2018 Conference of the North American Chapter of the Association for Computational Linguistics) ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.1803.05928 https://arxiv.org/abs/1803.05928
|
|
BASE
|
|
Hide details
|
|
11 |
Better Conversations by Modeling,Filtering,and Optimizing for Coherence and Diversity ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
The E2E Dataset: New Challenges For End-to-End Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|