Vision-Guided Robot Hearing Special Issue on Robot Vision , under review, International Journal of Robotics Research, 2013. ,
Ravel: An annotated corpus for training robots with audiovisual abilities, International Conferences and Workshops, 2013. ,
Active-Speaker Detection and Localization with Microphones and Cameras Embedded into a Robotic Head, Proceedings of the International Conference on Humanoid Robotics, 2013. ,
The Geometry of Sounds-Source Localization using Non-Coplanar Microphone Arrays, Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013. ,
Benchmarking methods for audio-visual recognition using tiny training sets, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2013. ,
Online multimodal speaker detection for humanoid robots, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), 2012. ,
DOI : 10.1109/HUMANOIDS.2012.6651509
URL : https://hal.archives-ouvertes.fr/hal-00768764
Audiovisual robot command recognition, ACM/IEEE International Conference on Multimodal Interaction, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00768761
Sound-event recognition with a companion humanoid, IEEE International Conference on Humanoid Robotics, 2012. ,
Geometrically-constrained robust time delay estimation using non-coplanar microphone arrays, Proceeding of the 20th European Signal Processing Conference (EUSIPCO), 2012. ,
Finding audio-visual events in informal social gatherings, Proceedings of the 13th International Conference on Multimodal Interaction, 2011. ,
Image compression with generalized lifting and partial knowledge of the signal pdf, IEEE International Conference on Image Processing, 2008. ,
Finding audio-visual events in informal social gatherings, Proceedings of the 13th international conference on multimodal interfaces, ICMI '11, 2011. ,
DOI : 10.1145/2070481.2070527
URL : https://hal.archives-ouvertes.fr/inria-00623489
Geometricallyconstrained Robust Time Delay Estimation Using Noncoplanar Microphone Arrays, Proceedings of EU- SIPCO, pp.1309-1313, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00768763
RAVEL: An Annotated Corpus for Training Robots with Audio Visual Abilities, Journal of Multimodal User Interfaces, 2012. ,
Vision-guided robot hearing, The International Journal of Robotics Research, vol.26, issue.10, 2013. ,
DOI : 10.1214/aos/1176344136
URL : https://hal.archives-ouvertes.fr/hal-00990766
The geometry of sound-source localization using non-coplanar microphone arrays, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013. ,
DOI : 10.1109/WASPAA.2013.6701849
URL : https://hal.archives-ouvertes.fr/hal-00848876
Benchmarking methods for audio-visual recognition using tiny training sets, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013. ,
DOI : 10.1109/ICASSP.2013.6638341
URL : https://hal.archives-ouvertes.fr/hal-00861645
The BANCA Database and Evaluation Protocol, Proceedings of the International Conference on Audio and Video-Based Biometric Person Authentication, pp.625-638, 2003. ,
DOI : 10.1007/3-540-44887-X_74
Energetic and Informational Masking Effects in an Audiovisual Speech Recognition System. Audio, Speech, and Language Processing, IEEE Transactions on, vol.17, issue.3, pp.446-458, 2009. ,
Exact and Approximate Solutions of Source Localization Problems, IEEE Transactions on Signal Processing, vol.56, issue.5, pp.1770-1778, 2008. ,
DOI : 10.1109/TSP.2007.909342
Time Delay Estimation via Minimum Entropy, IEEE Signal Processing Letters, vol.14, issue.3, pp.157-160, 2007. ,
DOI : 10.1109/LSP.2006.884038
Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection, Journal of NeuroEngineering and Rehabilitation, vol.5, issue.1, p.11, 2008. ,
DOI : 10.1186/1743-0003-5-11
Extraction of Audio Features Specific to Speech Production for Multimodal Speaker Detection. Multimedia, IEEE Transactions on, vol.10, issue.1, pp.63-73, 2008. ,
Convex optimization, 2004. ,
A closed-form location estimator for use with room environment microphone arrays. Speech and Audio Processing, IEEE Transactions on, vol.5, issue.1, pp.45-50, 1997. ,
A practical methodology for speech source localization with microphone arrays, Computer Speech & Language, vol.11, issue.2, pp.91-126, 1997. ,
DOI : 10.1006/csla.1996.0024
Annotating multimedia / multimodal resources with ELAN, Proceedings of the International Conference on Language Resources and Evaluation, pp.2065-2068, 2004. ,
Comparison between different sound source localization techniques, Hands-Free Speech Communication and Microphone Arrays, pp.69-72, 2008. ,
From error probability to information theoretic (multi-modal) signal processing, Signal Processing, vol.85, issue.5, pp.875-902, 2005. ,
DOI : 10.1016/j.sigpro.2004.11.027
The handbook of multisensory processes, 2004. ,
Acoustic Source Localization With Distributed Asynchronous Microphone Networks. Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, issue.2, pp.439-443, 2013. ,
A simple and efficient estimator for hyperbolic location, IEEE Transactions on Signal Processing, vol.42, issue.8, pp.1905-1915, 1994. ,
DOI : 10.1109/78.301830
Acoustic Source Localization and Beamforming: Theory and Practice, EURASIP Journal on Advances in Signal Processing, vol.2003, issue.4, pp.359-370, 2003. ,
DOI : 10.1155/S1110865703212038
Robust time delay estimation exploiting redundancy among multiple microphones, IEEE Transactions on SAP, vol.11, issue.6, pp.549-557, 2003. ,
Time Delay Estimation in Room Acoustic Environments: An Overview, EURASIP Journal on Advances in Signal Processing, vol.11, issue.6, pp.1-20, 2006. ,
DOI : 10.1155/ASP/2006/26503
Some Experiments on the Recognition of Speech, with One and with Two Ears, The Journal of the Acoustical Society of America, vol.25, issue.5, pp.975-979, 1953. ,
DOI : 10.1121/1.1907229
Integrating pitch and localisation cues at a speech fragment level, INTERSPEECH, pp.2769-2772, 2007. ,
An audio-visual corpus for speech perception and automatic speech recognition, The Journal of the Acoustical Society of America, vol.120, issue.5, pp.384-401, 2007. ,
DOI : 10.1121/1.2229005
Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005. ,
DOI : 10.1109/CVPR.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512
A Cluster Separation Measure. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.1, issue.2, p.2247, 1979. ,
2D sound-source localization on the binaural manifold, 2012 IEEE International Workshop on Machine Learning for Signal Processing, pp.1-6, 2012. ,
DOI : 10.1109/MLSP.2012.6349784
The cocktail party robot, Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, HRI '12, 2012. ,
DOI : 10.1145/2157689.2157834
Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), vol.39, issue.1, p.1, 1977. ,
Robust Adaptive Time Delay Estimation for Speaker Localization in Noisy and Reverberant Acoustic Environments, EURASIP Journal on Advances in Signal Processing, vol.2003, issue.11, pp.1110-1124, 2003. ,
DOI : 10.1155/S111086570330602X
New Approaches to Multilateration processing: analysis and field evaluation, 2006 European Radar Conference, pp.116-119, 2006. ,
DOI : 10.1109/EURAD.2006.280287
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.2, p.6016, 2007. ,
DOI : 10.1109/TASL.2006.881678
Automatic nonverbal analysis of social interaction in small groups: A review, Image and Vision Computing, vol.27, issue.12, pp.1775-1787, 2009. ,
DOI : 10.1016/j.imavis.2009.01.004
Is neocortex essentially multisensory?, Trends in Cognitive Sciences, vol.10, issue.6, pp.278-285, 2006. ,
DOI : 10.1016/j.tics.2006.04.008
Actions as Space-Time Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.12, pp.2247-2253, 2007. ,
DOI : 10.1109/TPAMI.2007.70711
Cyclopean geometry of binocular vision, Journal of the Optical Society of America A, vol.25, issue.9, pp.2357-2369, 2008. ,
DOI : 10.1364/JOSAA.25.002357
URL : https://hal.archives-ouvertes.fr/inria-00435548
Conjugate Mixture Models for Clustering Multimodal Data, Neural Computation, vol.49, issue.3, pp.517-557, 2011. ,
DOI : 10.1007/978-94-011-3436-1
URL : https://hal.archives-ouvertes.fr/inria-00590267
Pixels that Sound, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.88-95, 2005. ,
DOI : 10.1109/CVPR.2005.274
Cross-Modal Localization via Sparsity, IEEE Transactions on Signal Processing, vol.55, issue.4, pp.1390-1404, 2007. ,
DOI : 10.1109/TSP.2006.888095
Human-Robot Interaction in Real Environments by Audio-Visual Integration, International Journal of Control, Automation and Systems, vol.5, issue.1, pp.61-69, 2007. ,
2D Binaural Sound Localization: for Urban Search and Rescue Robotics, Mobile Robotics, pp.423-435, 2009. ,
DOI : 10.1142/9789814291279_0053
Audio/video fusion for objects recognition, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009. ,
DOI : 10.1109/IROS.2009.5354442
Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008. ,
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659
AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking, Proceedings of the Workshop on Machine Learning and Multimodal Interaction, 2005. ,
DOI : 10.1007/978-3-540-30568-2_16
Prediction of energy decay in room impulse responses simulated with an image-source model, The Journal of the Acoustical Society of America, vol.124, issue.1, pp.269-277, 2008. ,
DOI : 10.1121/1.2936367
Matlab code for image-source model in room acoustics, 2011. ,
Time Delay Estimation Method Based on Canonical Correlation Analysis. Circuits, Systems and Signal Processing, 2013. ,
An Audio-Visual Fusion Framework with Joint Dimensionality Reduction, Proceedings of the IEEE International Conference on Audio Speech and Signal Processing, 2008. ,
Recognizing Realistic Actions from Videos " in the Wild, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2009. ,
Audio and Video Feature Fusion for Activity Recognition in Unconstrained Videos, Intelligent Data Engineering and Automated Learning, 2006. ,
DOI : 10.1007/11875581_99
Multisensor integration and fusion in intelligent systems, IEEE Transactions on Systems, Man, and Cybernetics, vol.19, issue.5, p.9011, 1989. ,
DOI : 10.1109/21.44007
Object Category Detection Using Audio-Visual Cues, Proceedings of the 6th International Conference on Computer Vision Systems, 2008. ,
DOI : 10.1007/978-3-540-79547-6_52
An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments, Proc. NIPS, pp.953-960, 2007. ,
Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation, 2010. ,
DOI : 10.1007/978-3-642-17711-8_22
URL : https://hal.archives-ouvertes.fr/hal-01318429
Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009. ,
DOI : 10.1109/CVPR.2009.5206557
URL : https://hal.archives-ouvertes.fr/inria-00548645
XM2VTSDB: The Extended M2VTS Database, Proceedings of the International Conference on Audio and Video-based Biometric Person Authentication, pp.72-77, 1999. ,
Multimodal feature fusion for robust event detection in web videos, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012. ,
DOI : 10.1109/CVPR.2012.6247814
Text Classification from Labeled and Unlabeled Documents using EM, Machine Learning, p.1034, 2000. ,
Multimodal Speaker Diarization. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.34, issue.1, pp.79-93, 2012. ,
DOI : 10.1109/tpami.2011.47
CUAVE: A new audio-visual database for multimodal human-computer interface research, Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, pp.2017-2020, 2002. ,
Real- Time Multiple Sound Source Localization and Counting Using a Circular Microphone Array. Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, issue.10, pp.2193-2206, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01367320
Acoustic Source Localization in a Room Environment and at Moderate Distances, 2009. ,
Theory and applications of digital speech processing, 2011. ,
Continuous audio analytics by HMM and Viterbi decoding The topography of multivariate normal mixtures, Proceedings of the IEEE International Conference on Audio, Speech and Signal Processing, pp.2396-2399, 2005. ,
Using Reverberation to Improve Range and Elevation Discrimination for Small Array Sound Source Localization, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.7, pp.1781-1792, 2010. ,
DOI : 10.1109/TASL.2010.2052250
The KIT Robo-kitchen data set for the evaluation of view-based activity recognition systems, 2011 11th IEEE-RAS International Conference on Humanoid Robots, 2011. ,
DOI : 10.1109/Humanoids.2011.6100854
Object Category Recognition Using Probabilistic Fusion of Speech and Image Classifiers, Proceedings of the 4th International Conference on Machine Learning for Multimodal Interaction, 2008. ,
DOI : 10.1007/978-3-540-78155-4_4
Adaptive Time Delay Estimation Using Filter Length Constraints for Source Localization in Reverberant Acoustic Environments, IEEE Signal Processing Letters, vol.20, issue.5, pp.507-510, 2013. ,
DOI : 10.1109/LSP.2013.2253319
Online multimodal speaker detection for humanoid robots, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), 2012. ,
DOI : 10.1109/HUMANOIDS.2012.6651509
Action Recognition Robust to Background Clutter by Using Stereo Vision, 4th International Workshop on Video Event Categorization, Tagging and Retrieval (VECTaR), in conjunction with IEEE European Conference on Computer Vision, 2012. ,
DOI : 10.1007/978-3-642-33863-2_33
Spherical microphone array for spatial sound localization for a mobile robot, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.713-718, 2012. ,
DOI : 10.1109/IROS.2012.6385877
Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004. ,
DOI : 10.1109/ICPR.2004.1334462
A survey of mathematical methods for indoor localization, 2009 IEEE International Symposium on Intelligent Signal Processing, pp.9-14, 2009. ,
DOI : 10.1109/WISP.2009.5286582
Crossmodal binding through neural coherence: implications for multisensory processing, Trends in Neurosciences, vol.31, issue.8, pp.401-409, 2008. ,
DOI : 10.1016/j.tins.2008.05.002
Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models, International Journal of Computer Vision, vol.6, issue.4???5, pp.22-32, 2011. ,
DOI : 10.1007/s11263-010-0384-0
Auvio-Visual Fusion and Tracking With Multilevel Iterative Decoding: Framework and Experimental Evaluation, Journal of Selected Topics in Signal Processing, 2010. ,
Closed-form least-squares source location estimation from range-difference measurements, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.35, issue.12 ,
DOI : 10.1109/TASSP.1987.1165089
Closed-form formulae for timedifference-of-arrival estimation, Signal Processing IEEE Transactions on, vol.56, issue.6, pp.2614-2620, 2008. ,
WaldBoost ??? Learning for Time Constrained Sequential Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005. ,
DOI : 10.1109/CVPR.2005.373
The SHOGUN Machine Learning Toolbox, Journal of Machine Learning Research, vol.99, pp.1799-1802, 2010. ,
Classification of time delay estimates for robust speaker localization, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), pp.3081-3084, 1999. ,
DOI : 10.1109/ICASSP.1999.757492
The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009. ,
DOI : 10.1109/ICCVW.2009.5457583
Novel closed-form ML position estimator for hyperbolic location, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. ,
DOI : 10.1109/ICASSP.2004.1326216
Robust 3D Localization and Tracking of Sound Sources Using Beamforming and Particle Filtering, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006. ,
DOI : 10.1109/ICASSP.2006.1661100
Scene flow estimation by growing correspondence seeds, CVPR 2011, 2011. ,
DOI : 10.1109/CVPR.2011.5995442
Three-Dimensional Scene Flow, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.3, 2005. ,
DOI : 10.1109/iccv.1999.790293
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.3563
Social signal processing, Proceeding of the 16th ACM international conference on Multimedia, MM '08, pp.1743-1759, 2009. ,
DOI : 10.1145/1459359.1459573
On the Use of Spatial Cues to Improve Binaural Source Separation, proc. DAFx, pp.209-213, 2003. ,
Free viewpoint action recognition using motion history volumes, Computer Vision and Image Understanding, vol.104, issue.2-3, pp.249-257, 2006. ,
DOI : 10.1016/j.cviu.2006.07.013
URL : https://hal.archives-ouvertes.fr/inria-00544629
A middleware for collaborative research in experimental robotics, 2011 IEEE/SICE International Symposium on System Integration (SII), 2011. ,
DOI : 10.1109/SII.2011.6147617
Exemplarbased Action Recognition in Video, Proceedings of the British Machine Vision Conference, 2009. ,
Binaural Localization of Multiple Sources in Reverberant and Noisy Environments, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.5, pp.1503-1512, 2012. ,
DOI : 10.1109/TASL.2012.2183869
Realistic Human Action Recognition with Audio Context, 2010 International Conference on Digital Image Computing: Techniques and Applications, 2010. ,
DOI : 10.1109/DICTA.2010.57