A. Allauzen and H. Bonneau-maynard, Training and evaluation of pos taggers on the french multitag corpus Basilis Gatos, and Ioannis Pratikakis. A two-stage scheme for text detection in video images, the 6th International Conference on Language Resources and Evaluation, LREC, pageAGP10] Marios Anthimopoulos, pp.1413-1426, 2008.

M. Bäuml, K. Bernardin, and M. Fischer, Hazim Kemal Ekenel , and Rainer Stiefelhagen. Multi-pose face recognition for person retrieval in camera networks, 7th International Conference on Advanced Video and Signal-Based Surveillance, AVSS, pp.441-447, 2010.

[. Bendris, D. Charlet, and G. Chollet, Lip activity detection for talking faces classification in TV-Content, International Conference on Machine Vision, 2010.

B. Meriem, Indexation audio visuelle des personnes dans un contexte de télévision, Thèse en informatique de l'École TÉLÉCOM ParisTech, 2011.

. Bendris, D. Benoit-favre, G. Charlet, R. Damnati, J. Auguste et al., Unsupervised face identification in TV content using audio-visual sources, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.243-249, 2013.
DOI : 10.1109/CBMI.2013.6576591
URL : https://hal.archives-ouvertes.fr/hal-00812334

. Frédéric, . Bechet, . Benoît, G. Favre, and . Damnati, Detecting person presence in tv shows with linguistic and structural features, the 37th IEEE International Conference in Acoustics, Speech and Signal Processing, pp.5077-5080, 2012.

H. Bredin and J. Poignant, Integer Linear Programming for Speaker Diarization and Cross-Modal Identification in TV Broadcast, the 14th Annual Conference of the International Speech Communication Association, p.2013
URL : https://hal.archives-ouvertes.fr/hal-00953095

L. Besacier, G. Quénot, and R. Stiefelhagen, QCompere at REPERE 2013, First Workshop on Speech, Language and Audio in Multimedia -the 14th Annual Conference of the International Speech Communication Association, p.2013

J. Bpt-+-12-]-hervé-bredin, M. Poignant, G. Tapaswi, . Fortier, T. Viet-bac-le et al., Frédéric Jurie, and Hazim Kemal Ekenel. Fusion of speech, faces and text for person identification in TV broadcast, Workshop on Information Fusion in Computer Vision for Concept Recognition, ECCV-IFCVCR, pp.385-394, 2012.

G. Bernard, S. Rosset, O. Galibert, E. Bilinski, and G. Adda, The LIMSI Participation in the QAst 2009 Track: Experimenting on Answer Scoring, the 10th Workshop of the Cross-Language Evaluation Forum, CLEF, pp.289-296, 2009.
DOI : 10.1007/978-3-642-15754-7_33

[. Bäuml, M. Tapaswi, and R. Stiefelhagen, Semisupervised Learning with Constraints for Person Identification in Multimedia Data, IEEE Conference on Computer Vision and Pattern Recognition, CVPR, p.2013

H. Bunke, On a relation between graph edit distance and maximum common subgraph, Pattern Recognition Letters, vol.18, issue.8, pp.689-694, 1997.
DOI : 10.1016/S0167-8655(97)00060-3

[. Barras, X. Zhu, S. Meignier, and J. Gauvain, Multistage speaker diarization of broadcast news, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.5, pp.1505-1512, 2006.
DOI : 10.1109/TASL.2006.878261
URL : https://hal.archives-ouvertes.fr/hal-01434241

[. Chen and P. S. Gopalakrishnan, Speaker, Environment And Channel Change Detection And Clustering Via The Bayesian Information Criterion, DARPA Broadcast News Transcription and Understanding Workshop, pp.127-132, 1998.

[. Chen and A. G. Hauptmann, Searching for a specific person in broadcast news video, the IEEE 29th International Conference on Acoustics, Speech and Signal Processing, p.page, 2004.

[. Canseco, L. Lamel, and J. Gauvain, A comparative study using manual and automatic transcriptions for diarization, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., pp.415-419, 2005.
DOI : 10.1109/ASRU.2005.1566507

[. Charhad, D. Moraru, S. Ayache, and G. Quénot, Speaker identity indexing in audio-visual documents, the 3rd Workshop on Content-Based Multimedia Indexing, CBMI, p.page, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00953917

[. Cour, A. Nagle, and B. Taskar, Talking pictures: Temporal grouping and dialog-supervised person recognition, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.1014-1021, 2010.
DOI : 10.1109/CVPR.2010.5540106
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.192.6518

L. Canseco-rodriguez, L. Lamel, and J. Gauvain, Speaker diarization from speech transcripts In the 5th Annual Conference of the International Speech Communication Association, INTERSPEECH, page Learning from ambiguously labeled images, IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp.919-926, 2004.

M. Cai, J. Song, and M. R. Lyu, A new approach for video text detection, the IEEE International Conference on Image Processing, ICIPCST11] Timothée Cour, Benjamin Sapp, and Benjamin Taskar. Learning from Partial Labels, pp.117-120, 2002.

P. Deléglise, Y. Estève, S. Meignier, and T. Merlin, The LIUM speech transcription system : a CMU Sphinx III-based system for french broadcast news, the 6th Annual Conference of the International Speech Communication Association, INTER- SPEECH, pp.1653-1656, 2005.

N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet-thomas et al., Front-End Factor Analysis for Speaker Verification. Audio, Speech, and Language Processing Solving the multiple instance problem with axis-parallel rectangles, IEEE Transactions on Artificial Intelligence, vol.19, issue.8912, pp.788-79831, 1997.

. Grégor, M. Dupuy, S. Rouvier, E. Meignier, and . Yannick, I-vectors and ILP clustering adapted to cross-show speaker diarization [DT05] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection, the 13rd Annual Conference of the International Speech Communication Association, INTERSPEECH IEEE Conference on Computer Vision and Pattern Recognition, CVPREIH07] Farshideh Einsele, Rolf Ingold, and Jean Hennebert. A HMM-based approach to recognize ultra low resolution anti-aliased words. In the 2nd international conference on Pattern recognition and machine intelligence, pp.2174-2177, 2005.

[. El-khoury, A. Laurent, S. Meignier, S. Petitrenaud-yannick, S. Estève et al., Extracting true speaker identities from transcriptions Hello ! My name is... Buffy ? Automatic naming of characters in TV video Taking the bite out of automatic naming of characters in TV video Face detection with the modified census transform, the 8th Annual Conference of the International Speech Communication Association, INTERSPEECHESZ06] Mark Everingham, Josef Sivic, and Andrew Zisserman the 17th British Machine Vision Conference the Sixth IEEE international conference on Automatic face and gesture recognition, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, AFGRFM08] Jenny Rose Finkel and Christopher D. Manning. Enforcing transitivity in coreference resolution. In the 46th Annual Meeting of the Association for Computational Linguistics, ACLFou11] N. Fourour. Identification et catégorisation automatiques des entités nommées dans les textes français Thèse en informatique à l'université de Nantes Tao Tao, and ChengXiang Zhai. A formal study of information retrieval heuristics. In the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp.4377-4380, 2004.

M. Giraudel, V. Carré, J. Mapelli, O. Kahn, L. Galibert et al., The REPERE corpus : a multimodal corpus for person recognition Partitioning and Transcription of Broadcast News Data Incorporating The 7th Australian International Speech Science and Technology Conference, ICSLP The LIMSI Broadcast News Transcription System Face recognition from caption-based supervision, the 8th International Conference on Language Resources and Evaluation, LREC, page the 5th International Conference on Spoken Language Processing Speech Communication, pp.1335-1338, 1998.

[. Hua, X. Chen, L. Wenyin, and H. Zhang, Automatic location of text in video frames, Proceedings of the 2001 ACM workshops on Multimedia multimedia information retrieval, MULTIMEDIA '01, pp.24-27, 2001.
DOI : 10.1145/500933.500941

R. Houghton, Named Faces: putting names to faces, IEEE Intelligent Systems, vol.14, issue.5, pp.45-50, 1999.
DOI : 10.1109/5254.796089

E. Michael and . Houle, A generic query-based model for scalable clustering, 2006.

[. Hastie, R. Tibshirani, and J. H. Friedman, The elements of statistical learning : data mining, inference, and prediction : with 200 full-color illustrations, 2001.

[. Hu?bregts and D. A. Van-leeuwen, Diarization-based Speaker Retrieval for Broadcast Television Archives, the 12nd Annual Conference of the International Speech Communication Association , INTERSPEECH, pp.1037-1040, 2011.

[. Ide, T. Ogasawara, T. Takahashi, and H. Murase, Name Identification of People in News Video by Face Matching, the 3rd International Workshop on Computer Vision meets Databases, CVDB, p.page, 2007.

C. Vincent-jousse, S. Jacquin, Y. Meignier, B. Estève, and . Daille, Etude pour l'amélioration d'un système d'identification nommée du locuteur, Les Journées d'Étude sur la Parole -Traitement Automatique des Langues Naturelles, JEP- TALN, 2008.

[. Jung, Q. Liu, and J. Kim, A stroke filter and its application to text localization, Pattern Recognition Letters, vol.30, issue.2, pp.114-122, 2009.
DOI : 10.1016/j.patrec.2008.05.014

S. Vincent-jousse, C. Meignier, S. Jacquin, Y. Petitrenaud, B. Estève et al., Analyse conjointe du signal sonore et de sa transcription pour l'identification nommée de locuteur, Traitement Automatique des langues, pp.201-225, 2009.

S. Vincent-jousse, . Petit-renaud, . Sylvain, Y. Meignier, C. Estève et al., Automatic named identification of speakers using diarization and ASR systems, the 34th IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4557-4560, 2009.

E. Khoury, C. Sénac, and P. Joly, Audiovisual diarization of people in video content, Multimedia Tools and Applications, 2012.
DOI : 10.1007/s11042-012-1080-6

W. Harold and . Kuhn, The hungarian method for the assignment problem, Naval Research Logistics Quarterly, vol.2, issue.12, pp.83-97, 1955.

M. Köstinger, P. Wohlhart, P. M. Roth, and H. Bischof, Learning to recognize faces from videos and weakly related information cues, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp.23-28, 2011.
DOI : 10.1109/AVSS.2011.6027287

[. Le, C. Barras, and M. Ferràs, On the use of GSV- SVM for Speaker Diarization and Tracking, Odyssey -The Speaker and Language Recognition Workshop, pp.146-150, 2010.

L. Lamel, S. Courcinous, J. Despres, J. Gauvain, Y. Josse et al., Speech Recognition for Machine Translation in Quaero, The International Workshop on Spoken Language Translation, IWSLT, p.page, 2011.

[. Lavergne, O. Cappé, and F. Yvon, Practical very large scale CRFs, the 48th Annual Meeting of the Association for Computational Linguistics, ACL, pp.504-513, 2010.

R. Lienhart and W. Effelsberg, Automatic text segmentation and text recognition for video indexing, Multimedia Systems, vol.8, issue.1, pp.69-81, 1998.
DOI : 10.1007/s005300050006
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.3485

A. Liu, J. Fei, J. Fan, L. Pang, Y. Zhang et al., Confusion network based Video OCR post-processing approach, IEEE international conference on Multimedia and Expo, ICME, pp.137-140, 2009.

C. Liu, S. Jiang, and Q. Huang, Naming faces in broadcast news video by image google, Proceeding of the 16th ACM international conference on Multimedia, MM '08, pp.717-720, 2008.
DOI : 10.1145/1459359.1459468

[. Le, M. E. Shin-'ichi-satoh, D. Houle, and . Nguyen, Finding Important People in Large News Video Databases Using Multimodal and Clustering Analysis, 2007 IEEE 23rd International Conference on Data Engineering Workshop, pp.127-136, 2007.
DOI : 10.1109/ICDEW.2007.4400982

[. Maron and T. Lozano-pérez, A Framework for Multiple- Instance Learning, Advances in Neural Information Processing Systems, pp.570-576, 1998.

[. Mauclair, S. Meignier, and Y. Estève, Indexation en locuteur : utilisation d'informations lexicales, Les Journées d'Étude, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01434240

J. Mauclair, S. Meignier, and Y. Estève, Speaker diarization : about whom the speaker is talking ? In IEEE Odyssey 2006 -The Speaker and Language Recognition Workshop, p.page, 2006.

[. Ma, P. Nguyen, and M. Milind, Finding Speaker Identities with a Conditional Maximum Entropy Model, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07, pp.261-264, 2007.
DOI : 10.1109/ICASSP.2007.366899

D. Marco and S. Rosset, Models Cascade for Tree- Structured Named Entity Detection, the 5th International Joint Conference on Natural Language Processing, ?CNLP, pp.1269-1278, 2011.

[. Ozkan and P. Duygulu, Finding People Frequently Appearing in News, the 5th international conference on Image and Video Retrieval, pp.173-182, 2006.
DOI : 10.1007/11788034_18

F. Och and H. Ney, A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, vol.22, issue.1, pp.19-51, 2003.
DOI : 10.1109/89.817451

J. Poignant, H. Bredin, L. Besacier, G. Quénot, and C. Barras, Towards a better integration of written names for unsupervised speakers identification in videos, First Workshop on Speech, Language and Audio in Multimedia -the 14th Annual Conference of the International Speech Communication Association, p.2013
URL : https://hal.archives-ouvertes.fr/hal-00953089

J. Poignant, H. Bredin, L. Viet-bac-le, C. Besacier, G. Barras et al., Unsupervised speaker identification using overlaid texts in TV broadcast, the 13rd Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.2650-2653, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00767427

J. Poignant, L. Besacier, S. Viet-bac-le, G. Rosset, and . Quénot, Unsupervised naming of speakers in broadcast TV : using written names, pronounced names or both, the 14th Annual Conference of the International Speech Communication Association, p.2013
URL : https://hal.archives-ouvertes.fr/hal-00953088

[. Poignant, L. Besacier, and G. Quénot, Nommage non-supervisé des personnes dans les émissions de télévision : une revue du potentiel de chaque modalité, la 10ème COnférence en Recherche d'Information et Applications, 2013.

[. Poignant, L. Besacier, G. Quénot, and F. Thollard, From Text Detection in Videos to Person Identification, 2012 IEEE International Conference on Multimedia and Expo
DOI : 10.1109/ICME.2012.119
URL : https://hal.archives-ouvertes.fr/hal-00767383

P. Petitrenaud, V. Jousse, S. Meignier, and Y. Estève, Reconnaissance Automatique de Locuteurs à l'aide de Fonctions de Croyance [PMT10] Phi The Pham, Marie-Francine Moens, and Tinne Tuytelaars. Naming persons in news video with label propagation, IEEE International Conference on Multimedia and Expo, ICME le 17e congrès francophone Reconnaissance des Formes et Intelligence Artificielle (RFIA'10) IEEE international conference on Multimedia and Expo, ICMEPoi11] Johann Poignant. Détection et reconnaissance de texte dans les documents vidéos, Et leurs apports à la reconnaissance de personnes, pp.854-859, 2010.

[. Petit-renaud, V. Jousse, S. Meignier, and Y. Estève, Identification of speakers by name using belief functions In the 13th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems Multi-frame combination for robust videotext recognition [PTM11] Phi The Pham, Tinne Tuytelaars, and Marie-Francine Moens. Naming people in news videos with label propagation Text detection and recognition for person identification in video, PSM + 08] Rohit Prasad, Shirin Saleem, Ehry MacRostie, Premkumar Natarajan , and Michael DecerboPTQB11] Johann Poignant the 9th Workshop on Content-Based Multimedia Indexing, CBMIQMB03] Georges Quénot, Daniel Moraru, and Laurent Besacier. CLIPS at TRECvid : Shot Boundary Detection and Feature Detection. In Workshop TRECVID, pp.179-188, 2003.

E. Stephen, K. S. Robertson, and . Jones, Relevance weighting of search terms [RM12] Mickael Rouvier and Sylvain Meignier. A Global Optimization Framework For Speaker Diarization In Odyssey -The Speaker and Language Recognition Workshop, page Blind clustering of speech utterances based on speaker and language characteristics, The 5th International Conference on Spoken Language Processing , Incorporating The 7th Australian International Speech Science and Technology Conference, pp.129-146, 1976.

J. Sivic, M. Everingham, and A. Zisserman, Who are you ?" -Learning person specific classifiers from video, IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp.1145-1152, 2009.

. Shin-'ichi, T. Satoh, and . Kanade, Name-It : association of face and name in video, Conference on Computer Vision and Pattern Recognition, CVPR, p.368, 1997.

[. Song, C. Lin, and M. Sun, Crossmodality automatic face model training from large video databases, Conference on Computer Vision and Pattern Recognition Workshop , CVPRW, p.91, 2004.

[. Sang, C. Liang, C. Xu, and J. Cheng, Robust movie character identification and the sensitivity analysis, 2011 IEEE International Conference on Multimedia and Expo, pp.1-6, 2011.
DOI : 10.1109/ICME.2011.6011837

Y. Shin-'ichi-satoh, T. Nakamura, and . Kanade, Name-It : naming and detecting faces in video by the integration of image and natural language processing, The Fifteenth International Joint Conference on Artifical Intelligence, ?CAISNK99] Shin'ichi Satoh, Yuichi Nakamura, and Takeo Kanade. Name-It : naming and detecting faces in news videos. IEEE Multimedia, pp.1488-149322, 1997.

J. Jaakko, M. Sauvola, and . Pietikäinen, Adaptive document image binarization, Pattern Recognition, pp.225-236, 2000.

E. Robert, Y. Schapire, and . Singer, Improved boosting algorithms using confidence-rated predictions, Machine Learning, pp.297-336, 1999.

[. Sang and C. Xu, Robust Face-Name Graph Matching for Movie Character Identification, IEEE Transactions on Multimedia, vol.14, issue.3, pp.586-596, 2012.
DOI : 10.1109/TMM.2012.2188784

[. Tapaswi, M. Bäuml, and R. Stiefelhagen, Knock ! Knock ! Who is it ?" probabilistic person identification in TV-series, IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp.2658-2665, 2012.

S. E. Tranter, Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, pp.1013-1016, 2006.
DOI : 10.1109/ICASSP.2006.1660195

[. Bibliographie, J. Wolf, F. Jolion, and . Chassaing, Text Localization, Enhancement and Binarization in Multimedia Documents, the 16th International Conference on Pattern Recognition, ICPR, pp.1037-1040, 2002.

[. Wohlhart, M. Köstinger, P. M. Roth, and H. Bischof, Learning Face Recognition in Videos from Associated Information Sources, Workshop of the Austrian Association for Pattern Recognition, p.page, 2011.

[. Wohlhart, M. Köstinger, P. M. Roth, and H. Bischof, Multiple Instance Boosting for Face Recognition in Videos, the 33rd annual Symposium of the German Association for Pattern Recognition, DAGM, pp.132-141, 2011.
DOI : 10.1007/978-3-642-23123-0_14

X. Xi, X. Hua, L. Chen, H. Wenyin, and . Zhang, A video text detection and recognition system, IEEE international conference on Multimedia and Expo, ICME, p.222, 2001.

J. Yang, M. Chen, and A. G. Hauptmann, Finding Person X: Correlating Names with Visual Appearances, the 3rd International Conference on Image and Video Retrieval, CIVR, pp.270-278, 2004.
DOI : 10.1007/978-3-540-27814-6_34

J. Yang and A. G. Hauptmann, Naming every individual in news video monologues, Proceedings of the 12th annual ACM international conference on Multimedia , MULTIMEDIA '04, pp.10-16, 2004.
DOI : 10.1145/1027527.1027666

[. Ye and Q. Huang, A New Text Detection Algorithm in Images/Video Frames, the 5th Pacific Rim Conference on Advances in Multimedia Information Processing, pp.858-865, 2005.
DOI : 10.1007/978-3-540-30542-2_106

J. Yang, R. Yan, and A. G. Hauptmann, Multiple instance learning for labeling faces in broadcasting news video, Proceedings of the 13th annual ACM international conference on Multimedia , MULTIMEDIA '05, pp.31-40, 2005.
DOI : 10.1145/1101149.1101155

Y. Zhang, C. Xu, J. Cheng, and H. Lu, Naming faces in films using hypergraph matching, 2009 IEEE International Conference on Multimedia and Expo, pp.278-281, 2009.
DOI : 10.1109/ICME.2009.5202489

Y. Zhang, C. Xu, and H. Lu, Automatic character identification in feature-length films, IEEE international conference on Multimedia and Expo, ICME, pp.1469-1472, 2008.

Y. Zhang, C. Xu, H. Lu, and Y. Huang, Character Identification in Feature-Length Films Using Global Face-Name Matching, IEEE Transactions on Multimedia, vol.11, issue.7, pp.1276-1288, 2009.
DOI : 10.1109/TMM.2009.2030629

«. Exemples-de and C. Quasi, Les trois premiers sacs contiennent des images positives alors que le dernier non

. Zhang, Vue d'ensemble de l'association noms-visages proposée par, Image, p.33

S. Méthode and ]. Vs-n-grammesemdm07, transcription de la parole manuelle (a) ou automatique (b), diarization manuelle. Image extraite de, p.39

). Trois-niveaux-de-granularité, image de visage (vert), séquence de visages (rouge), p.60

. Détection-grossière-puis-fine-proposée and . Cai, Image extraite de, p.74

. Binarisation-du-textewjc02-]...., Image extraite de, p.75

R. Évolution-du-taux-d, EGER d'identification des locuteurs (courbes bleue et rouge) et des visages (courbes jaune et verte) en fonction du critère d'arrêt du regroupement sur l'ensemble d'apprentissage et de, p.137

/. Invité and .. , Répartition de la présence des personnes en fonction de leurs rôles, phase 1 du corpus REPERE, partie apprentissage. R1,2,3 : Présentateur/chroniqueur /reporter, p.56

J. Poignant, L. Besacier, V. B. Le, S. Rosset, and G. Quénot, Unsupervised naming of speakers in broadcast TV : using written names, pronounced names or both, the 14rd Annual Conference of the International Speech Communication Association, INTERSPEECH, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953088

J. Poignant, L. Besacier, and G. Quénot, Nommage non-supervisé des personnes dans les émissions de télévision : une revue du potentiel de chaque modalité, CORIA 2013, papier long (oral), 2013.

J. Poignant, H. Bredin, L. Besacier, G. Quénot, and C. Barras, Towards a better integration of written names for unsupervised speakers identification in videos, First Workshop on Speech, Language and Audio in Multimedia, SLAM, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953089

H. Bredin and J. Poignant, Integer Linear Programming for Speaker Diarization and Cross-Modal Identification in TV Broadcast, the 14rd Annual Conference of the International Speech Communication Association, INTERSPEECH, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00953095

H. Bredin, J. Poignant, G. Fortier, M. Tapaswi, V. Le et al., QCOMPERE at REPERE 2013, First Workshop on Speech, Language and Audio in Multimedia, SLAM, 2013.

J. Poignant, H. Bredin, V. Le, L. Besacier, C. Barras et al., Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast, Interspeech 2012, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00767427

J. Poignant, F. Thollard, G. Quénot, and L. Besacier, From Text Detection in Videos to Person Identification, 2012 IEEE International Conference on Multimedia and Expo, 2012.
DOI : 10.1109/ICME.2012.119
URL : https://hal.archives-ouvertes.fr/hal-00767383

H. Bredin, J. Poignant, M. Tapaswi, G. Fortier, V. Bac-le et al., Fusion of Speech, Faces and Text for Person Identification in TV Broadcast, ECCV 2012, Workshop on Information Fusion in Computer Vision for Concept Recognition, 2012.
DOI : 10.1007/978-3-642-33885-4_39
URL : https://hal.archives-ouvertes.fr/hal-00722884

J. Poignant, F. Thollard, G. Quénot, and L. Besacier, Text detection and recognition for person identification in videos, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.245-248, 2011.
DOI : 10.1109/CBMI.2011.5972553

J. Poignant, Détection et reconnaissance de texte dans les documents vidéos, Et leurs apports à la reconnaissance de personnes, RJCRI -CORIA 2011 : 6è Rencontres Jeunes Chercheurs en Recherche d'Information -8è COnférence en Recherche d'Information et Applications, pp.409-414, 2011.