Introduction of quality measures in audio-visual identity verification, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009. ,
DOI : 10.1109/ICASSP.2009.4959983
Introduction of indexing people problematic in TV-Content. Seminar on Information, Signal, Images et Vision : Indexation scalable et Cross Media, 2009. ,
Talking Faces indexing in TV-Content. International Workshop on Content-Based Multimedia Indexing (CBMI), 2010. ,
Lip activity detection for talking faces classification in TV-Content, International Conference in Machine Vision (ICMV). Hong Kong, 2010. ,
Technologies d'indexation pour la valorisation du patrimoine audiovisuel, 2011. ,
People indexing in TV-Content using lip-activity and unsupervised audio-visual identity verification. International Workshop on Content-Based Multimedia Indexing (CBMI), p.167, 2011. ,
156 8.3 ´ Evaluation des résultats de l'indexation en locuteurs sur 157 8.4 ´ Evaluation des résultats de la structuration par le costume sur, p.158 ,
The fusion of distributed microphone arrays for sound localization, EURASIP Journal on Applied Signal Processing, vol.4, pp.338-347, 2003. ,
An automatic face detection and recognition system for video indexing applications, Internationl conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.3644-3647, 2002. ,
Automatic Face Recognition for Film Character Retrieval in Feature-Length Films, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.860-867, 2005. ,
DOI : 10.1109/CVPR.2005.81
Head pan angle estimation by a nonlinear regression on selected features, 2009 16th IEEE International Conference on Image Processing (ICIP), 2009. ,
DOI : 10.1109/ICIP.2009.5414310
The BANCA Database and Evaluation Protocol, 4th International Conference on Audio-and Video-Based Biometric Person Authentication, pp.625-638, 2003. ,
DOI : 10.1007/3-540-44887-X_74
Becars : a free software for speaker verification. ODYSSEY -The Speaker and Language Recongnition Workshop, pp.145-148, 2004. ,
Foveated shot detection for video segmentation, IEEE Transactions on Circuits and Systems for Video Technology, pp.365-377, 2005. ,
DOI : 10.1109/TCSVT.2004.842603
Comparison of video shot boundary detection techniques, Journal of Electronic Imaging, vol.5, issue.2, p.122, 1996. ,
DOI : 10.1117/12.238675
Robust Facial Feature Tracking, Procedings of the British Machine Vision Conference 2000, pp.232-241, 2000. ,
DOI : 10.5244/C.14.24
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.144.2333
Efficient audio stream segmentation via the combined T/sup 2/ statistic and Bayesian information criterion, IEEE Transactions on Speech and Audio Processing, vol.13, issue.4, pp.467-474, 2005. ,
DOI : 10.1109/TSA.2005.845790
Verification de l'identite d'un visage parlant Apport de la mesure de synchronie audiovisuelle fac aux tentatives deliberees d'imposture, 2007. ,
Improving acoustic speaker verification with visual Body-Language features, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1909-1912, 2009. ,
DOI : 10.1109/ICASSP.2009.4959982
URL : http://cims.nyu.edu/%7Ebregler/ICASSP09/bregler_icassp09.pdf
Video shot segmentation using singular value decomposition, International Conference on Multimedia and Expo, pp.301-304, 2003. ,
DOI : 10.1109/icme.2003.1221613
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.220.2309
environment and channel change detection and clustering via the bayesian information criterion, Proc DARPA Broadcast News Transcription and Understanding Workshop, pp.127-132 ,
A novel method for detecting lips, eyes and faces in real time, Real-Time Imaging, vol.9, issue.4, pp.277-287, 2003. ,
DOI : 10.1016/j.rti.2003.08.003
Kernel-based object tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.25, issue.5, pp.564-577, 2003. ,
DOI : 10.1109/TPAMI.2003.1195991
Active appearance models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, issue.6, pp.681-685, 2001. ,
DOI : 10.1109/34.927467
Active shape models-their training and application. Computer Vision and Image Understanding, pp.38-59, 1995. ,
DOI : 10.1006/cviu.1995.1004
URL : https://www.escholar.manchester.ac.uk/api/datastream?publicationPid=uk-ac-man-scw:1d1862&datastreamId=POST-PEER-REVIEW-PUBLISHERS.PDF
Learning from ambiguously labeled images, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.919-926, 2009. ,
DOI : 10.1109/CVPR.2009.5206667
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.1111
Face-texture model based on SGLD and its application in face detection in a color scene, Pattern Recognition, vol.29, issue.6, pp.1007-1017, 1996. ,
DOI : 10.1016/0031-3203(95)00139-5
Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.886-893, 2004. ,
DOI : 10.1109/CVPR.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512
Maximum likelihood from incomplete data via the em algorithm, The Royal Statistical Society Series B Methodological, vol.39, pp.1-38, 1977. ,
A connexionist approach for robust and precise facial feature detection in complex scenes, ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005., 2005. ,
DOI : 10.1109/ISPA.2005.195430
Audio-visual speech modeling for continuous speech recognition, IEEE Transactions on Multimedia, vol.2, issue.3, 2000. ,
DOI : 10.1109/6046.865479
buffy " ? automatic naming of characters in tv video, The British Machine Vision Conference (BMVC), pp.1-10 ,
A generative framework for real time object detection and classification, Computer Vision and Image Understanding, vol.98, issue.1, pp.182-210, 2005. ,
DOI : 10.1016/j.cviu.2004.07.014
A fast and accurate face detector based on neural networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, issue.1, pp.42-53, 2001. ,
DOI : 10.1109/34.899945
Progressive search space reduction for human pose estimation, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008. ,
DOI : 10.1109/CVPR.2008.4587468
Discriminative multimodal biometric authentication based on quality measures, Pattern Recognition, vol.38, issue.5, pp.777-779, 2005. ,
DOI : 10.1016/j.patcog.2004.11.012
The ES- TER2 evaluation campaign for the rich transcription of French radio broadcast, 10th Annual Conference of the International Speech Communication Association (Interspeech, 2009. ,
A neural architecture for fast and robust face detection. Object recognition supported by user interaction for service robots, pp.44-47, 2002. ,
The nist meeting room pilot corpus, 4th Conference on Language Resources and Evaluation (LREC), 2004. ,
Combining methods to improve speaker verification decision, Proceeding of Fourth International Conference on Spoken Language Processing ICSLP 96 ICSLP-96, pp.1756-1759, 1996. ,
DOI : 10.1109/ICSLP.1996.607968
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.4619
Corpus description of the ester evaluation campaign for the rich transcription of french broadcast news, 5th international conference on Language Resources and Evaluation (LREC), 2006. ,
Hierarchical clustering of a mixture model, Advances in Neural Information Processing Systems, pp.505-512, 2004. ,
An introduction to multisensor data fusion, Proceedings of the IEEE, vol.85, issue.1, pp.6-23, 1997. ,
DOI : 10.1109/5.554205
A hybrid ann/hmm audio-visual speech recognition system, International Conference on AuditoryVisual Speech Processing Proceedings (AVSP), 2001. ,
Exploiting multimodal data fusion in robust speech recognition, 2010 IEEE International Conference on Multimedia and Expo, 2010. ,
DOI : 10.1109/ICME.2010.5583086
URL : https://hal.archives-ouvertes.fr/hal-00508288
Indexation de la vidéo par le costume, 2005. ,
Costume : A new feature for automatic video content indexing, Coupling approaches, coupling media and coupling languages for information retrieval, pp.314-325, 2004. ,
Hierarchical clustering schemes, Psychometrika, vol.58, issue.4, pp.241-254, 1967. ,
DOI : 10.1007/BF02289588
Unsupervised Video Indexing based on Audiovisual Characterization of Persons, 2010. ,
URL : https://hal.archives-ouvertes.fr/tel-00515424
Face-andclothing based people clustering in video content, International conference on Multimedia information retrieval, pp.295-304, 2010. ,
Multimodal speaker diarization using oriented optical flow histograms, International Conference of the International Speech Communication Association (Interspeech), pp.290-293, 2010. ,
Error handling in multimodal biometric systems using reliability measures, 13th European Signal Processing (EUSIPCO), pp.4-8, 2005. ,
Bimodal speaker identification using dynamic bayesian network Advances in Biometric Person Authentication, pp.1-24, 2005. ,
Multi-modal biometric verification based on far-score normalization, International Journal of Computer Science and Network Security (IJCSNS), vol.8, pp.250-254, 2008. ,
Face detection based on template matching and neural network verification, International Conference on Image, 2000. ,
Active Appearance Models Revisited, International Journal of Computer Vision, vol.60, issue.2, pp.135-164, 2004. ,
DOI : 10.1023/B:VISI.0000029666.37597.d3
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.8544
MODELLING FACIAL COLOUR AND IDENTITY WITH GAUSSIAN MIXTURES, Pattern Recognition, vol.31, issue.12, pp.1883-1892, 1998. ,
DOI : 10.1016/S0031-3203(98)00066-1
Ehmm approach for learning and adapting sound models for speaker indexing. A Speaker Odyssey The Speaker Recognition Workshop, pp.175-180, 2001. ,
URL : https://hal.archives-ouvertes.fr/hal-01434656
XM2VTSDB : The Extended M2VTS Database, Second International Conference on Audio and Video-based Biometric Person Authentication, pp.72-77, 1999. ,
Locating Facial Features with an Extended Active Shape Model, 10th European Conference on Computer Vision : Part IV, pp.504-513, 2008. ,
DOI : 10.1007/978-3-540-88693-8_37
Learning Multimodal Dictionaries, IEEE Transactions on Image Processing, vol.16, issue.9, pp.538-545, 2006. ,
DOI : 10.1109/TIP.2007.901813
URL : https://hal.archives-ouvertes.fr/inria-00544772
A bayesian approach to audio-visual speaker identification. 4th international conference on Audio-and video-based biometric person authentication (AVBPA), pp.761-769, 2003. ,
Shape and the Stereo Correspondence Problem, International Journal of Computer Vision, vol.1, issue.3, pp.147-162, 2005. ,
DOI : 10.1007/s11263-005-3672-3
Guide to biometric reference systems and performance evaluation, 2009. ,
DOI : 10.1007/978-1-84800-292-0
Overview of the multiple biometrics grand challenge, Third International Conference on Advances in Biometrics (ICB), pp.705-714, 2009. ,
Improving fusion with margin-derived confidence in biometric authentication tasks. Audio and videobased biometric person authentication (AVBPA), pp.474-483, 2005. ,
Quality controlled multimodal fusion of biometric experts, 12th Iberoamerican Congress on Pattern Recognition (CIARP). Viña del Mar, pp.881-890, 2007. ,
An approach to speaker identification using multiple classifiers, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.1135-1138, 1997. ,
DOI : 10.1109/ICASSP.1997.596142
Approaches and Applications of Audio Diarization, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., pp.953-956, 2005. ,
DOI : 10.1109/ICASSP.2005.1416463
Confidence and reliability measures in speaker verification, International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.574-595, 2006. ,
Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models, Pattern Analysis and Applications, vol.12, pp.271-284, 2008. ,
Visual speech recognition with loosely synchronized feature streams, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, pp.1424-1431, 2005. ,
DOI : 10.1109/ICCV.2005.251
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.1928
Identity verification using speech and face information, Digital Signal Processing, vol.14, issue.5, pp.449-480, 2004. ,
DOI : 10.1016/j.dsp.2004.05.001
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.1287
Scene change detection and content-based sampling of video sequences. Digital video compression : algorithms and technologies, pp.2-13, 1995. ,
Automatic segmentation, classification and clustering of broadcast news audio. DARPA Speech Recognition Workshop, pp.97-99, 1997. ,
Computer lipreading for improved accuracy in automatic speech recognition, IEEE Transactions on Speech and Audio Processing, vol.4, issue.5, pp.337-351, 1996. ,
DOI : 10.1109/89.536928
Video shot boundary detection: Seven years of TRECVid activity, Computer Vision and Image Understanding, vol.114, issue.4, pp.411-418, 2010. ,
DOI : 10.1016/j.cviu.2009.03.011
A real-time face tracker for color video, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), pp.1493-1496, 2001. ,
DOI : 10.1109/ICASSP.2001.941214
Speaker clustering using direct maximisation of the mllr-adapted likelihood, 5th International Conference on Spoken Language Processing (ICSLP), pp.1775-1779, 1998. ,
Eigenfaces for Recognition, Journal of Cognitive Neuroscience, vol.10, issue.9, pp.71-86, 1991. ,
DOI : 10.1007/BF00239352
Exploring cooccurence between speech and body movement for audio-guided video localization, IEEE Transactions on Circuits and Systems for Video Technology, pp.1608-1617, 2008. ,
Robust visual features for the multimodal identification of unregistered speakers in TV talk-shows, 2010 IEEE International Conference on Image Processing, 2010. ,
DOI : 10.1109/ICIP.2010.5653393
Multi-modal identity verification using expert fusion, Information Fusion, vol.1, issue.1, pp.17-33, 2000. ,
DOI : 10.1016/S1566-2535(00)00002-6
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.7260
Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, pp.511-518, 2001. ,
DOI : 10.1109/CVPR.2001.990517
Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, vol.58, issue.301, pp.236-244, 1963. ,
DOI : 10.1007/BF02289263
Automatic partitioning of full-motion video, Multimedia Systems, vol.1, issue.1, pp.10-28, 1993. ,
DOI : 10.1007/BF01210504
Combining speaker identification and bic for speaker diarization. International Speech Communication Association (Interspeech), 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-01434281
Mosaic-based 3D scene representation and rendering, 11th InternationalConference on Image Processing, pp.739-754, 2005. ,
DOI : 10.1016/j.image.2006.08.002
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.3470
amélioration du réseau Internet a permis de mettre un grand nombre de contenus télévisuelstélévisuelsà disposition des utilisateurs Afin de faciliter la navigation parmi ces vidéos, il est intéressant de développer des technologies pour indexer les personnes automatiquement. Les solutions actuelles proposent de construire l'index audio ,
le visuel et leur association (interactivité des dialogues, variations de pose du visage, asynchronie entre la parole et l'apparence, etc) Les approches basées sur la fusion des index audio et visuel combinent les erreurs d'indexation issues de chaque modalité. Les travaux présentés dans ce rapport exploitent la complémentarité entre les informations audio et visuelle afin de palier aux faiblesses de chaque modalité. Ainsi, une modalité peut appuyer l'indexation d'une personne lorsque l ,
nous avons développé une nouvelle méthode de détection de mouvement des l` evres basée sur la mesure du degré de désordre de la direction des pixels autour de la région des l` evres. L'´ evaluation, réalisée sur le corpus de d'´ emission de plateaux, montre une amélioration significative de la détection des visages parlants comparécomparéà l'´ etat de l'art dans ce contexte. En particulier, notre méthode s'avèrê etre plus robustè a un mouvement global du visage. Enfin, nous avons proposé deux schémas de correction. Le premier est basé sur une modification systématique de la modalité considérée a priori la moins fiable. Le second compare des scores de vérification de l'identité non supervisée afin de déterminer quelle modalité a ´ echoué et la corriger ,