DE eng

Search in the Catalogues and Directories

Hits 1 – 9 of 9

1
Multimodal Person Discovery in Broadcast TV: lessons learned from MediaEval 2015
In: ISSN: 1380-7501 ; EISSN: 1573-7721 ; Multimedia Tools and Applications ; https://hal.archives-ouvertes.fr/hal-01690581 ; Multimedia Tools and Applications, Springer Verlag, 2017, 76 (21), pp.22547 - 22567. ⟨10.1007/s11042-017-4730-x⟩ (2017)
BASE
Show details
2
Benchmarking Multimedia Technologies with the CAMOMILE Platform: the Case of Multimodal Person Discovery at MediaEval 2015
In: LREC 2016 ; https://hal.archives-ouvertes.fr/hal-01690277 ; LREC 2016, May 2016, Portorož, Slovenia (2016)
BASE
Show details
3
The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
In: Proceedings of LREC 2016 ; LREC 2016 Conference ; https://hal.archives-ouvertes.fr/hal-01350096 ; LREC 2016 Conference, May 2016, Portoroz, Slovenia (2016)
BASE
Show details
4
What Makes a Speaker Recognizable in TV Broadcast? Going Beyond Speaker Identification Error Rate
In: Interspeech 2015 ; ERRARE Workshop, a satellite event of Interspeech 2015. ; https://hal.archives-ouvertes.fr/hal-01433205 ; ERRARE Workshop, a satellite event of Interspeech 2015., 2015, Sinaia, Romania (2015)
BASE
Show details
5
Unsupervised Speaker Identification in TV Broadcast Based on Written Names
In: ISSN: 1558-7916 ; IEEE Transactions on Audio, Speech and Language Processing ; https://hal.archives-ouvertes.fr/hal-01060827 ; IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2015, 23 (1), pp.57-68. ⟨10.1109/TASLP.2014.2367822⟩ ; https://dl.acm.org/authorize?N46627 (2015)
BASE
Show details
6
Collaborative Annotation for Person Identification in TV Shows
In: Interspeech 2015 (short demo paper) ; https://hal.archives-ouvertes.fr/hal-01170513 ; Interspeech 2015 (short demo paper), Sep 2015, Dresden, Germany (2015)
BASE
Show details
7
Integer Linear Programming for Speaker Diarization and Cross-Modal Identification in TV Broadcast
In: the 14rd Annual Conference of the International Speech Communication Association, INTERSPEECH ; https://hal.inria.fr/hal-00953095 ; the 14rd Annual Conference of the International Speech Communication Association, INTERSPEECH, 2013, Lyon, France (2013)
BASE
Show details
8
Towards a better integration of written names for unsupervised speakers identification in videos
In: First Workshop on Speech, Language and Audio in Multimedia, SLAM ; https://hal.inria.fr/hal-00953089 ; First Workshop on Speech, Language and Audio in Multimedia, SLAM, 2013, Marseille, France (2013)
BASE
Show details
9
Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast
In: Proceedings of the 13th Annual Conference of the International Speech Communication Association (Interspeech) ; Interspeech 2012 - Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-00767427 ; Interspeech 2012 - Conference of the International Speech Communication Association, Sep 2012, Portland, OR, United States. 4p (2012)
Abstract: Poster Session: Speaker Recognition III ; International audience ; We propose an approach for unsupervised speaker identification in TV broadcast videos, by combining acoustic speaker diarization with person names obtained via video OCR from overlaid texts. Three methods for the propagation of the overlaid names to the speech turns are compared, taking into account the co-occurence duration between the speaker clusters and the names provided by the video OCR and using a task-adapted variant of the TF-IDF information retrieval coefficient. These methods were tested on the REPERE dry-run evaluation corpus, containing 3 hours of annotated videos. Our best unsupervised system reaches a F-measure of 70.2% when considering all the speakers, and 81.7% if anchor speakers are left out. By comparison, a mono-modal, supervised speaker identification system with 535 speaker models trained on matching development data and additional TV and radio data only provided a 57.5% F-measure when considering all the speakers and 45.7% without anchor.
Keyword: [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; multimodal fusion; optical character recognition; reproducible results; speaker diarization; unsupervised speaker identification
URL: https://hal.archives-ouvertes.fr/hal-00767427
https://hal.archives-ouvertes.fr/hal-00767427/document
https://hal.archives-ouvertes.fr/hal-00767427/file/Poignant-al_Interspeech2012.pdf
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
9
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern