1 |
TEICORPO: a conversion tool for spoken language transcription with a pivot file in TEI
|
|
|
|
In: ISSN: 2162-5603 ; EISSN: 2162-5603 ; Journal of the Text Encoding Initiative ; https://halshs.archives-ouvertes.fr/halshs-03043572 ; Journal of the Text Encoding Initiative, TEI Consortium, In press (2020)
|
|
Abstract:
International audience ; CORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in corpus linguistics and especially spoken language corpora. Because of the time required to collect and transcribe spoken language resources, their number is limited and thus corpora need to be interoperable and reusable in order to improve research on various themes (phonology, prosody, interaction, syntax, textometry…). To help researchers reach this goal, CORLI has designed a set of tools: TEICORPO to assist in the conversion and use of spoken language corpora, and TEIMETA for metadata purposes. TEICORPO is based on the principle of an underlying common format, namely the TEI as described in its specification for spoken language use (ISO/TEI 24624:2016). This tool enables the conversion of transcriptions created with alignment software such as CLAN, Transcriber, Praat or ELAN as well as common file formats (csv, xlsx, txt or docx) and the TEI format, which plays the role of a pivot format, without losing information. Backward conversion is possible in many cases, with limitations inherent to the destination target format. TEICORPO can run the Treetagger Part of Speech tagger and the Stanford CoreNLP tools on TEI files and can export the resulting files to textometric tools such as TXM, Le Trameur, or Iramuteq, making it a tool dedicated to spoken language corpora editing as well as to various research purposes.
|
|
Keyword:
[SHS.LANGUE]Humanities and Social Sciences/Linguistics; annotationBlock; conversion; oral corpora; TEI; transcription
|
|
URL: https://halshs.archives-ouvertes.fr/halshs-03043572 https://halshs.archives-ouvertes.fr/halshs-03043572/document https://halshs.archives-ouvertes.fr/halshs-03043572/file/182-Article%20Text-1407-1-15-20201019.pdf
|
|
BASE
|
|
Hide details
|
|
2 |
The CORLI Consortium: CORpus, Languages and Interaction
|
|
|
|
In: Digital Humanities 2019 ; https://halshs.archives-ouvertes.fr/halshs-02337690 ; Digital Humanities 2019, Jul 2019, Utrecht, Netherlands (2019)
|
|
BASE
|
|
Show details
|
|
3 |
Tei-Meta: a Tool for Editing Metadata in TEI - Application to Oral Language Research Purposes ; Tei-Meta: un outil pour éditer les métadonnées en TEI - Application aux recherches sur la langue orale
|
|
|
|
In: TEI 2018 ; https://halshs.archives-ouvertes.fr/halshs-01955665 ; TEI 2018, Sep 2018, Tokyo, Japan (2018)
|
|
BASE
|
|
Show details
|
|
4 |
Workshop \textquoteright\textquoterightSpoken Language : Tools and Workflow for Creating and Editing Data and Metadata\textquoteright\textquoteright
|
|
|
|
In: Spoken Language : Tools and Workflow for Creating and Editing Data and Metadata ; https://halshs.archives-ouvertes.fr/halshs-01967492 ; Spoken Language : Tools and Workflow for Creating and Editing Data and Metadata, 2018, Tokyo, Japan (2018)
|
|
BASE
|
|
Show details
|
|
5 |
FAIR en linguistique de la langue orale : objectifs, méthode et outils
|
|
|
|
In: Journée "Interopérabilité et pérennisation des données : comment FAIR En pratique?" ; https://halshs.archives-ouvertes.fr/halshs-01958684 ; Journée "Interopérabilité et pérennisation des données : comment FAIR En pratique?", Nov 2018, Paris, France (2018)
|
|
BASE
|
|
Show details
|
|
6 |
Workshop "Spoken Language : Tools and Workflow for Creating and Editing Data and Metadata" ; Atelier "Langue orale: Outils et méthodologie pour créer et éditer les métadonnées et les données"
|
|
|
|
In: TEI 2015 ; https://halshs.archives-ouvertes.fr/halshs-01955707 ; TEI 2015, Sep 2018, Tokyo, Japan (2018)
|
|
BASE
|
|
Show details
|
|
7 |
Segmentation in macrosyntactic units across different interaction types. A quantitative study ; Segmentation en unités macrosyntaxiques dans différents types d'interaction. Une étude quantitative
|
|
|
|
In: 50 years of corpus linguistics on oral corpora. Its contribution to the study of variation ; https://hal.archives-ouvertes.fr/hal-01927595 ; 50 years of corpus linguistics on oral corpora. Its contribution to the study of variation, Nov 2018, Orléans, France ; https://anniveslo-50ans.sciencesconf.org (2018)
|
|
BASE
|
|
Show details
|
|
8 |
CORLI : diffuser, exploiter, et partager les corpus et les outils de linguistique de l’écrit et de l’oral
|
|
|
|
In: Rencontres de la TGIR Huma-Num 2018 ; https://halshs.archives-ouvertes.fr/halshs-01958714 ; Rencontres de la TGIR Huma-Num 2018, Jun 2018, Lyon, France (2018)
|
|
BASE
|
|
Show details
|
|
9 |
Atelier "Interopérabilité Pratiques et outils d’exploration de corpus : Métadonnées et conversions de format
|
|
|
|
In: 6ième Congrès Mondial de Lingusitique Française ; https://halshs.archives-ouvertes.fr/halshs-01955728 ; 6ième Congrès Mondial de Lingusitique Française, Jul 2018, Mons, Belgique (2018)
|
|
BASE
|
|
Show details
|
|
10 |
Atelier Métadonnées
|
|
|
|
In: 9e Journées Internationales de la Linguistique de Corpus JLC2017 ; https://halshs.archives-ouvertes.fr/halshs-02083841 ; 9e Journées Internationales de la Linguistique de Corpus JLC2017, Laboratoire LIDILEM, Jul 2017, Grenoble, France (2017)
|
|
BASE
|
|
Show details
|
|
11 |
Vers un format pivot commun pour la mutualisation, l'échange et l'analyse des corpus oraux
|
|
|
|
In: FLORAL ; https://halshs.archives-ouvertes.fr/halshs-01636964 ; FLORAL, Mar 2017, Orléans, France (2017)
|
|
BASE
|
|
Show details
|
|
12 |
Agrégation automatisée de corpus de français parlé
|
|
|
|
In: Journées de Linguistique de Corpus ; https://halshs.archives-ouvertes.fr/halshs-01636957 ; Journées de Linguistique de Corpus, Jul 2017, Grenoble, France (2017)
|
|
BASE
|
|
Show details
|
|
13 |
Connecting Resources: Which Issues Have to be Solved to Integrate CMC Corpora from Heterogeneous Sources and for Different Languages?
|
|
|
|
In: 5th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora17) ; https://hal.archives-ouvertes.fr/hal-01918880 ; 5th Conference on CMC and Social Media Corpora for the Humanities (cmccorpora17), Oct 2017, Bolzano, Italy. pp.52-55 ; https://doi.org/10.5281/zenodo.1040713 (2017)
|
|
BASE
|
|
Show details
|
|
14 |
Metadata in French spoken language corpora
|
|
|
|
In: French-German colloquium on standards for corpora of computer-mediated communication, ; https://halshs.archives-ouvertes.fr/halshs-01958740 ; French-German colloquium on standards for corpora of computer-mediated communication,, Jun 2017, Duisburg-Essen, Germany (2017)
|
|
BASE
|
|
Show details
|
|
15 |
Using the TEI as pivot format for oral and multimodal language corpora
|
|
|
|
In: ISSN: 2162-5603 ; EISSN: 2162-5603 ; Journal of the Text Encoding Initiative ; https://halshs.archives-ouvertes.fr/halshs-01357343 ; Journal of the Text Encoding Initiative, TEI Consortium, 2016 (2016)
|
|
BASE
|
|
Show details
|
|
16 |
Utilisation d'un format commun pour structurer les métadonnées de corpus oraux : objectifs, enjeux et méthode
|
|
|
|
In: Données, métadonnées des corpus et catalogage des objets en sciences humaines et sociales ; https://halshs.archives-ouvertes.fr/halshs-01357271 ; Données, métadonnées des corpus et catalogage des objets en sciences humaines et sociales, Jun 2016, Poitiers, France (2016)
|
|
BASE
|
|
Show details
|
|
17 |
Utilisation d'un format commun pour structurer les métadonnées de corpus oraux : objectifs, enjeux et méthode
|
|
|
|
In: Données, métadonnées des corpus et catalogage des objets en sciences humaines et sociales ; https://halshs.archives-ouvertes.fr/halshs-01357271 ; Données, métadonnées des corpus et catalogage des objets en sciences humaines et sociales, Jun 2016, Poitiers, France (2016)
|
|
BASE
|
|
Show details
|
|
18 |
Using the TEI as a pivot format for oral and multimodal language corpora
|
|
|
|
In: Text Encoding Initiative Conference and Member's meeting 2015 ; https://halshs.archives-ouvertes.fr/halshs-01345777 ; Text Encoding Initiative Conference and Member's meeting 2015, Oct 2015, Lyon, France ; http://tei2015.huma-num.fr/fr/ (2015)
|
|
BASE
|
|
Show details
|
|
19 |
Using the TEI as a pivot format for oral and multimodal language corpora
|
|
|
|
In: Text Encoding Initiative Conference and Member's meeting 2015 ; https://halshs.archives-ouvertes.fr/halshs-01345777 ; Text Encoding Initiative Conference and Member's meeting 2015, Oct 2015, Lyon, France ; http://tei2015.huma-num.fr/fr/ (2015)
|
|
BASE
|
|
Show details
|
|
|
|