1 |
Ara-Women-Hate: The first Arabic Hate Speech corpus regarding Women ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Towards the Early Detection of Child Predators in Chat Rooms: A BERT-based Approach ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
STaCK: Sentence Ordering with Temporal Commonsense Knowledge ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Graphine: A Dataset for Graph-aware Terminology Definition Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
End-to-end style-conditioned poetry generation: What does it take to learn from examples alone? ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
To what extent do human explanations of model behavior align with actual model behavior? ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Time-aware Graph Neural Network for Entity Alignment between Temporal Knowledge Graphs ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
What’s Hidden in a One-layer Randomly Weighted Transformer? ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Finetuning Pretrained Transformers into RNNs ...
|
|
The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Chen, Weizhu; Ilharco, Gabriel; Kasai, Jungo; Mao, Yi; Pappas, Nikolaos; Peng, Hao; Smith, Noah; Yogatama, Dani; Zhang, Yizhe. - : Underline Science Inc., 2021
|
|
Abstract:
Anthology paper link: https://aclanthology.org/2021.emnlp-main.830/ Abstract: Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. This comes with a significant computational overhead, as the attention mechanism scales with a quadratic complexity in sequence length. Efficient transformer variants have received increasing interest from recent works. Among them, a linear-complexity recurrent variant has proven well suited for autoregressive generation. It approximates the softmax attention with randomized or heuristic feature maps, but can be difficult to train or yield suboptimal accuracy. This work aims to convert a pretrained transformer into its efficient recurrent counterpart, improving the efficiency while retaining the accuracy. Specifically, we propose a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, we replace the softmax attention with its linear-complexity recurrent alternative and then finetune. With a learned feature map, our ...
|
|
Keyword:
Computational Linguistics; Machine Learning; Machine Learning and Data Mining; Natural Language Processing; Neural Network
|
|
URL: https://dx.doi.org/10.48448/w4sb-sz82 https://underline.io/lecture/37314-finetuning-pretrained-transformers-into-rnns
|
|
BASE
|
|
Hide details
|
|
12 |
Pruning Neural Machine Translation for Speed Using Group Lasso ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Elementary-Level Math Word Problem Generation using Pre-Trained Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Knowledge Graph Representation Learning using Ordinary Differential Equations ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
What Models Know About Their Attackers: Deriving Attacker Information From Latent Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Mind the Context: The Impact of Contextualization in Neural Module Networks for Grounding Visual Referring Expressions ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
ProtoInfoMax: Prototypical Networks with Mutual Information Maximization for Out-of-Domain Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|