2 |
A fine-grained recognition of Named Entities in ELTeC collection using cascades
|
|
|
|
In: Final Action Event of COST Action Distant Reading for European Literary History ; https://hal.archives-ouvertes.fr/hal-03615219 ; Final Action Event of COST Action Distant Reading for European Literary History, Christof Schöch, Apr 2022, Krakow, Poland ; https://www.distant-reading.net/events/conference-programme/ (2022)
|
|
Abstract:
International audience ; In the scope of the COST action “Distant Reading for European Literary History” (Schöch et al. 2021; Patras et al. 2021) the working group 2 (WG2) responsible for methods and tools suggested a set of seven named entity (NE) categories to be used for annotating novels (the so-called “level-2” text version). Tags to be used for this set are: PERS, LOC, ORG, WORK, EVENT, ROLE, DEMO (Frontini et. al 2020; Šandrih Todorović et al. 2021). The level-2 version of Serbian novels was produced using this set of categories and tags (Krstev et al. 2019).For Serbian and French the fine-grained named entity recognition systems were developed based on exhaustive lexicons of corresponding languages and rules implemented in the form of cascades of finite-state automata (Maurel and Friburger 2014; Krstev et al. 2014). These systems were developed using the open-source corpus processing suite Unitex/GramLab and its module CasSys. Both systems recognize and tag a rich set of NE categories and subcategories and allow entity embedding; moreover, the French system recognizes NEs that correspond to TEI guidelines, chapter 13 (TEI P5). An example that illustrates this in Frenchis (Marquis de la Lande factories): usines de laLande Similarly, in Serbian (Queen Elizabeth of Hungary): kraljice Ugarske Elizabete Moreover, both systems recognize beside broad categories suggested by WG2 the other categories such as temporal or measurement expressions.In both Serbian and French systems, the recognition module is separated from the annotation module, which enables production of output as needed. In this paper we will illustrate this on a few Serbian and French novels from ELTeC corpus chosen to match in respect to corpus balance criteria, namely author’s gender, novel’s size, year of first publication. The novels will be annotated with the simplified tags needed for level-2 text format, and with more elaborate TEI compliant tags that reflect all nuances of recognized NEs.Two output formats for Serbian and French novels will be uploaded into TXM corpus processing systems which will enable both quantitative and qualitative analysis (Krstev et al., 2019). Besides statistical analysis of annotated NER, we will perform contrastive analysis of Serbian and French NEs and for both languages between fine-grained and simplified versions of annotation. The qualitative analysis will reveal interesting examples of annotation, open issues and hard cases. Textometrie analysis in TXM will be illustrated for both fine-grained and simplified versions of annotated samples.Finally, we will go back to the research questions that were posed by Action’s working group 3 (literary theory and history) when the Action started. Namely the first idea and wish of the WG3 was to produce fine grained annotations that will allow, for instance, distinction between cities and villages, different person’s roles (professions, family relations, etc.), person’s gender, types of locations (continent, country, region, city, village, mountain, waterbody, astronym), etc. After the analysis of availability of NER tools, the fine-grained approach was substituted with a much simpler schema. With this research we would like to reopen these questions and establish whether it is possible to meet the need for more detailed literary analysis based on Named Entities.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Digital humanities; Distant Reading for European Literary History; Named entities recognition; Unitex
|
|
URL: https://hal.archives-ouvertes.fr/hal-03615219
|
|
BASE
|
|
Hide details
|
|
3 |
Automatic Normalisation of Early Modern French
|
|
|
|
In: https://hal.inria.fr/hal-03540226 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Leza, Sungu, and Samba- Digital Humanities and Early Bantu History
|
|
|
|
In: Faculty Journal Articles (2022)
|
|
BASE
|
|
Show details
|
|
5 |
From FreEM to D'AlemBERT ; From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French
|
|
|
|
In: Proceedings of the 13th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-03596653 ; Proceedings of the 13th Language Resources and Evaluation Conference, European Language Resources Association, Jun 2022, Marseille, France (2022)
|
|
BASE
|
|
Show details
|
|
6 |
Skin and feminist cyberactivism. The reversal of the social stigma
|
|
|
|
In: EISSN: 2646-1064 ; La Peaulogie - Revue de sciences sociales et humaines sur les peaux ; https://hal.archives-ouvertes.fr/hal-03639171 ; La Peaulogie - Revue de sciences sociales et humaines sur les peaux, La Peaulogie 2022, Tatouage éthique et inclusif : la peau comme marqueuse politique, pp.163-203 ; https://lapeaulogie.fr/article/peau-cybermilitantisme-feministe/ (2022)
|
|
BASE
|
|
Show details
|
|
7 |
Islands and Bridges of Language: Bio-Inspired Structural Analysis of Language Embedding Data
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
|
|
|
|
In: Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II ; https://hal.archives-ouvertes.fr/hal-03635971 ; Matthias Hagen; Suzan Verberne; Craig Macdonald; Christin Seifert; Krisztian Balog; Kjetil Nørvåg; Vinay Setty. Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, 13186, Springer International Publishing, pp.347-354, 2022, Lecture Notes in Computer Science, 978-3-030-99738-0. ⟨10.1007/978-3-030-99739-7_44⟩ (2022)
|
|
BASE
|
|
Show details
|
|
9 |
« “Twitta” “Intellectuelle” “Influenceuse” ? Être enseignante-chercheuse sur twitter »
|
|
|
|
In: ISSN: 1763-0061 ; EISSN: 1963-1812 ; Tracés : Revue de Sciences Humaines ; https://hal.archives-ouvertes.fr/hal-03592945 ; Tracés : Revue de Sciences Humaines, ENS Éditions, A paraître (2022)
|
|
BASE
|
|
Show details
|
|
10 |
Cyberhate in the Context of Migrations
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03572743 ; Springer International Publishing, 2022, 978-3-030-92105-7. ⟨10.1007/978-3-030-92103-3⟩ ; https://link.springer.com/book/10.1007/978-3-030-92103-3 (2022)
|
|
BASE
|
|
Show details
|
|
11 |
Renouvellement paradigmatique dans l’analyse des discours numériques : le cas de la communication politique sur les RSN
|
|
|
|
In: ISSN: 2116-1747 ; Etudes de stylistique anglaise ; https://hal-amu.archives-ouvertes.fr/hal-03584927 ; Etudes de stylistique anglaise, Société de stylistique anglaise, Lyon, 2022, Renaissance(s)/Rebirth(s), ⟨10.4000/esa.4816⟩ ; https://journals.openedition.org/esa/4816 (2022)
|
|
BASE
|
|
Show details
|
|
12 |
Проверка показаний на месте с использованием онлайн-трансляции ... : Checking the readings on the spot using an online broadcast ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Правовая политика России в сфере антимонопольного регулирования на современном этапе ... : Legal policy of Russia in the sphere of anti-monopoly regulation at the present stage ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Language Learning Through Contemporary Technologies: A Case Of Tpack Teaching Model ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
THE EFFECT OF DIGITAL GAMES IN TEACHING VOCABULARY IN EFL CLASSES ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
THE EFFECTIVENESS OF USING DIGITAL STORYTELLING TECHNIQUE IN MULTICULTURAL CLASSROOMS IN ORDER TO RAISE AWARENESS OF TRANSNATIONALISM ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
THE EFFECT OF DIGITAL GAMES IN TEACHING VOCABULARY IN EFL CLASSES ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Language Learning Through Contemporary Technologies: A Case Of Tpack Teaching Model ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
THE EFFECTIVENESS OF USING DIGITAL STORYTELLING TECHNIQUE IN MULTICULTURAL CLASSROOMS IN ORDER TO RAISE AWARENESS OF TRANSNATIONALISM ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Determinants of Access to Digital Health for Culturally and Linguistically Diverse Females in Regional Communities: A Scoping Review ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|