DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4
Hits 1 – 20 of 69

1
Analyzing Gender Representation in Multilingual Models ...
BASE
Show details
2
Universal Dependencies 2.9
Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
BASE
Show details
3
Universal Dependencies 2.8.1
Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
BASE
Show details
4
Universal Dependencies 2.8
Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
BASE
Show details
5
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models ...
BASE
Show details
6
Including Signed Languages in Natural Language Processing ...
BASE
Show details
7
Including Signed Languages in Natural Language Processing ...
BASE
Show details
8
Contrastive Explanations for Model Interpretability ...
BASE
Show details
9
Provable Limitations of Acquiring Meaning from Ungrounded Form: What will Future Language Models Understand? ...
BASE
Show details
10
Measuring and Improving Consistency in Pretrained Language Models ...
BASE
Show details
11
Aligning Faithful Interpretations with their Social Attribution ...
BASE
Show details
12
Amnesic Probing: Behavioral Explanation With Amnesic Counterfactuals ...
BASE
Show details
13
Data Augmentation for Sign Language Gloss Translation ...
BASE
Show details
14
Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent ...
Abstract: Anthology paper link: https://aclanthology.org/2021.emnlp-main.133/ Abstract: The capacity of neural networks like the widely adopted transformer is known to be very high. Evidence is emerging that they learn successfully due to inductive bias in the training routine, typically a variant of gradient descent (GD). To better understand this bias, we study the tendency for transformer parameters to grow in magnitude ($\ell_2$ norm) during training, and its implications for the emergent representations within self attention layers. Empirically, we document norm growth in the training of transformer language models, including T5 during its pretraining. As the parameters grow in magnitude, we prove that the network approximates a discretized network with saturated activation functions. Such "saturated" networks are known to have a reduced capacity compared to the full network family that can be described in terms of formal languages and automata. Our results suggest saturation is a new characterization of an ...
Keyword: Language Models; Natural Language Processing; Semantic Evaluation; Sociolinguistics
URL: https://underline.io/lecture/37533-effects-of-parameter-norm-growth-during-transformer-training-inductive-bias-from-gradient-descent
https://dx.doi.org/10.48448/2yr8-q466
BASE
Hide details
15
Asking It All: Generating Contextualized Questions for any Semantic Role ...
BASE
Show details
16
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction ...
BASE
Show details
17
Neural Extractive Search ...
BASE
Show details
18
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction ...
BASE
Show details
19
Ab Antiquo: Neural Proto-language Reconstruction ...
NAACL 2021 2021; Goldberg, Yoav; Meloni, Carlo. - : Underline Science Inc., 2021
BASE
Show details
20
Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora
In: ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03161637 ; ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Jul 2020, Seattle / Virtual, United States. pp.538-555, ⟨10.18653/v1/2020.acl-main.51⟩ (2020)
BASE
Show details

Page: 1 2 3 4

Catalogues
0
0
0
0
0
0
0
Bibliographies
1
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
1
0
0
0
Open access documents
67
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern