R. P. @bullet-xavier-alameda-pineda and . Horaud, Vision-Guided Robot Hearing Special Issue on Robot Vision , under review, International Journal of Robotics Research, 2013.

J. @bullet-xavier-alameda-pineda, J. Sanchez-riera, . Wienke, . Vojtech-franc, K. Cech et al., Ravel: An annotated corpus for training robots with audiovisual abilities, International Conferences and Workshops, 2013.

. @bullet-jan-cech, K. Ravi, A. Mittal, J. Deleforge, X. Sanchez-riera et al., Active-Speaker Detection and Localization with Microphones and Cameras Embedded into a Robotic Head, Proceedings of the International Conference on Humanoid Robotics, 2013.

R. P. @bullet-xavier-alameda-pineda, B. Horaud, and . Mourrain, The Geometry of Sounds-Source Localization using Non-Coplanar Microphone Arrays, Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.

J. @bullet-xavier-alameda-pineda, R. P. Sanchez-riera, and . Horaud, Benchmarking methods for audio-visual recognition using tiny training sets, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2013.

@. Sanchez-riera, X. Alameda-pineda, J. Wienke, A. Deleforge, S. A. Cech et al., Online multimodal speaker detection for humanoid robots, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), 2012.
DOI : 10.1109/HUMANOIDS.2012.6651509
URL : https://hal.archives-ouvertes.fr/hal-00768764

@. Sanchez-riera, X. Alameda-pineda, and R. P. Horaud, Audiovisual robot command recognition, ACM/IEEE International Conference on Multimodal Interaction, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00768761

X. @bullet-maxime-janvier, L. Alameda-pineda, R. P. Girin, and . Horaud, Sound-event recognition with a companion humanoid, IEEE International Conference on Humanoid Robotics, 2012.

R. P. @bullet-xavier-alameda-pineda and . Horaud, Geometrically-constrained robust time delay estimation using non-coplanar microphone arrays, Proceeding of the 20th European Signal Processing Conference (EUSIPCO), 2012.

V. @bullet-xavier-alameda-pineda, R. P. Khalidov, F. Horaud, and . Forbes, Finding audio-visual events in informal social gatherings, Proceedings of the 13th International Conference on Multimodal Interaction, 2011.

@. Julio, C. Rolon, P. Salembier, and X. , Image compression with generalized lifting and partial knowledge of the signal pdf, IEEE International Conference on Image Processing, 2008.

A. Alameda-pineda, V. Khalidov, R. Horaud, and &. F. Forbes, Finding audio-visual events in informal social gatherings, Proceedings of the 13th international conference on multimodal interfaces, ICMI '11, 2011.
DOI : 10.1145/2070481.2070527
URL : https://hal.archives-ouvertes.fr/inria-00623489

A. Alameda-pineda and &. R. Horaud, Geometricallyconstrained Robust Time Delay Estimation Using Noncoplanar Microphone Arrays, Proceedings of EU- SIPCO, pp.1309-1313, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00768763

A. Alameda-pineda, J. Sanchez-riera, V. Franc, J. Wienke, J. Cech et al., RAVEL: An Annotated Corpus for Training Robots with Audio Visual Abilities, Journal of Multimodal User Interfaces, 2012.

A. Alameda-pineda and &. R. Horaud, Vision-guided robot hearing, The International Journal of Robotics Research, vol.26, issue.10, 2013.
DOI : 10.1214/aos/1176344136
URL : https://hal.archives-ouvertes.fr/hal-00990766

A. Alameda-pineda, R. Horaud, and &. B. Mourrain, The geometry of sound-source localization using non-coplanar microphone arrays, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.
DOI : 10.1109/WASPAA.2013.6701849
URL : https://hal.archives-ouvertes.fr/hal-00848876

A. Alameda-pineda, J. Sanchez-riera, and &. R. Horaud, Benchmarking methods for audio-visual recognition using tiny training sets, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
DOI : 10.1109/ICASSP.2013.6638341
URL : https://hal.archives-ouvertes.fr/hal-00861645

]. E. Bailly-baillire, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler et al., The BANCA Database and Evaluation Protocol, Proceedings of the International Conference on Audio and Video-Based Biometric Person Authentication, pp.625-638, 2003.
DOI : 10.1007/3-540-44887-X_74

J. Barker and &. X. Shao, Energetic and Informational Masking Effects in an Audiovisual Speech Recognition System. Audio, Speech, and Language Processing, IEEE Transactions on, vol.17, issue.3, pp.446-458, 2009.

A. Beck, P. Stoica, and &. J. Li, Exact and Approximate Solutions of Source Localization Problems, IEEE Transactions on Signal Processing, vol.56, issue.5, pp.1770-1778, 2008.
DOI : 10.1109/TSP.2007.909342

J. Benesty, Y. Huang, and &. J. Chen, Time Delay Estimation via Minimum Entropy, IEEE Signal Processing Letters, vol.14, issue.3, pp.157-160, 2007.
DOI : 10.1109/LSP.2006.884038

]. P. Besson-08a, &. M. Besson, and . Kunt, Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection, Journal of NeuroEngineering and Rehabilitation, vol.5, issue.1, p.11, 2008.
DOI : 10.1186/1743-0003-5-11

]. P. Besson-08b, V. Besson, J. Popovici, J. Vesin, &. M. Thiran et al., Extraction of Audio Features Specific to Speech Production for Multimodal Speaker Detection. Multimedia, IEEE Transactions on, vol.10, issue.1, pp.63-73, 2008.

S. Boyd and &. L. Vandenberghe, Convex optimization, 2004.

M. Brandstein, J. Adcock, and &. H. Silverman, A closed-form location estimator for use with room environment microphone arrays. Speech and Audio Processing, IEEE Transactions on, vol.5, issue.1, pp.45-50, 1997.

M. Brandstein and &. H. Silverman, A practical methodology for speech source localization with microphone arrays, Computer Speech & Language, vol.11, issue.2, pp.91-126, 1997.
DOI : 10.1006/csla.1996.0024

H. Brugman, A. Russel, and &. X. Nijmegen, Annotating multimedia / multimodal resources with ELAN, Proceedings of the International Conference on Language Resources and Evaluation, pp.2065-2068, 2004.

A. Brutti, M. Omologo, P. Svaizer, and &. F. Bruno, Comparison between different sound source localization techniques, Hands-Free Speech Communication and Microphone Arrays, pp.69-72, 2008.

]. T. Butz and &. Thiran, From error probability to information theoretic (multi-modal) signal processing, Signal Processing, vol.85, issue.5, pp.875-902, 2005.
DOI : 10.1016/j.sigpro.2004.11.027

G. Calvert, C. Spence, and &. E. Stein, The handbook of multisensory processes, 2004.

A. Canclini, E. Antonacci, A. Sarti, and &. S. Tubaro, Acoustic Source Localization With Distributed Asynchronous Microphone Networks. Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, issue.2, pp.439-443, 2013.

&. K. Chan and . Ho, A simple and efficient estimator for hyperbolic location, IEEE Transactions on Signal Processing, vol.42, issue.8, pp.1905-1915, 1994.
DOI : 10.1109/78.301830

]. J. Chen-03a, K. Chen, &. R. Yao, and . Hudson, Acoustic Source Localization and Beamforming: Theory and Practice, EURASIP Journal on Advances in Signal Processing, vol.2003, issue.4, pp.359-370, 2003.
DOI : 10.1155/S1110865703212038

]. J. Chen-03b, J. Chen, &. Y. Benesty, and . Huang, Robust time delay estimation exploiting redundancy among multiple microphones, IEEE Transactions on SAP, vol.11, issue.6, pp.549-557, 2003.

J. Chen, J. Benesty, and &. Y. Huang, Time Delay Estimation in Room Acoustic Environments: An Overview, EURASIP Journal on Advances in Signal Processing, vol.11, issue.6, pp.1-20, 2006.
DOI : 10.1155/ASP/2006/26503

E. C. Cherry, Some Experiments on the Recognition of Speech, with One and with Two Ears, The Journal of the Acoustical Society of America, vol.25, issue.5, pp.975-979, 1953.
DOI : 10.1121/1.1907229

H. Christensen, N. Ma, S. Wrigley, and &. J. Barker, Integrating pitch and localisation cues at a speech fragment level, INTERSPEECH, pp.2769-2772, 2007.

M. Cooke, J. Barker, S. Cunningham, and &. X. Shao, An audio-visual corpus for speech perception and automatic speech recognition, The Journal of the Acoustical Society of America, vol.120, issue.5, pp.384-401, 2007.
DOI : 10.1121/1.2229005

&. B. Dalal and . Triggs, Histograms of Oriented Gradients for Human Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.177
URL : https://hal.archives-ouvertes.fr/inria-00548512

]. D. Davies and &. D. Bouldin, A Cluster Separation Measure. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.1, issue.2, p.2247, 1979.

]. A. Deleforge-12a, &. R. Deleforge, and . Horaud, 2D sound-source localization on the binaural manifold, 2012 IEEE International Workshop on Machine Learning for Signal Processing, pp.1-6, 2012.
DOI : 10.1109/MLSP.2012.6349784

]. A. Deleforge-12b, &. R. Deleforge, and . Horaud, The cocktail party robot, Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, HRI '12, 2012.
DOI : 10.1145/2157689.2157834

A. Dempster, N. Laird, and &. D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), vol.39, issue.1, p.1, 1977.

]. S. Doclo and &. M. Moonen, Robust Adaptive Time Delay Estimation for Speaker Localization in Noisy and Reverberant Acoustic Environments, EURASIP Journal on Advances in Signal Processing, vol.2003, issue.11, pp.1110-1124, 2003.
DOI : 10.1155/S111086570330602X

G. Galati, M. Gasbarra, P. Magaro, P. Marco, L. Mene et al., New Approaches to Multilateration processing: analysis and field evaluation, 2006 European Radar Conference, pp.116-119, 2006.
DOI : 10.1109/EURAD.2006.280287

]. D. Gatica-perez-07, G. Gatica-perez, J. Lathoud, &. I. Odobez, and . Mc-cowan, Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.2, p.6016, 2007.
DOI : 10.1109/TASL.2006.881678

]. D. Gatica-perez-09 and . Gatica-perez, Automatic nonverbal analysis of social interaction in small groups: A review, Image and Vision Computing, vol.27, issue.12, pp.1775-1787, 2009.
DOI : 10.1016/j.imavis.2009.01.004

A. A. Ghazanfar and &. C. Schroeder, Is neocortex essentially multisensory?, Trends in Cognitive Sciences, vol.10, issue.6, pp.278-285, 2006.
DOI : 10.1016/j.tics.2006.04.008

L. Gorelick, M. Blank, E. Shechtman, M. Irani, and &. R. Basri, Actions as Space-Time Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.12, pp.2247-2253, 2007.
DOI : 10.1109/TPAMI.2007.70711

M. Hansard and &. R. Horaud, Cyclopean geometry of binocular vision, Journal of the Optical Society of America A, vol.25, issue.9, pp.2357-2369, 2008.
DOI : 10.1364/JOSAA.25.002357
URL : https://hal.archives-ouvertes.fr/inria-00435548

]. V. Khalidov, F. Forbes, and &. R. Horaud, Conjugate Mixture Models for Clustering Multimodal Data, Neural Computation, vol.49, issue.3, pp.517-557, 2011.
DOI : 10.1007/978-94-011-3436-1
URL : https://hal.archives-ouvertes.fr/inria-00590267

E. Kidron, Y. Y. Schechner, and &. M. Elad, Pixels that Sound, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp.88-95, 2005.
DOI : 10.1109/CVPR.2005.274

E. Kidron, Y. Schechner, and &. M. Elad, Cross-Modal Localization via Sparsity, IEEE Transactions on Signal Processing, vol.55, issue.4, pp.1390-1404, 2007.
DOI : 10.1109/TSP.2006.888095

H. Kim, J. Suk-choi, and &. M. Kim, Human-Robot Interaction in Real Environments by Audio-Visual Integration, International Journal of Control, Automation and Systems, vol.5, issue.1, pp.61-69, 2007.

A. R. Kullaib, M. Al-mualla, and &. D. Vernon, 2D Binaural Sound Localization: for Urban Search and Rescue Robotics, Mobile Robotics, pp.423-435, 2009.
DOI : 10.1142/9789814291279_0053

L. Lacheze, Y. Guo, R. Benosman, B. Gas, and &. C. Couverture, Audio/video fusion for objects recognition, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009.
DOI : 10.1109/IROS.2009.5354442

I. Laptev, M. Marszalek, C. Schmid, and &. B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659

G. Lathoud, J. Odobez, and &. D. Gatica-pérez, AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking, Proceedings of the Workshop on Machine Learning and Multimodal Interaction, 2005.
DOI : 10.1007/978-3-540-30568-2_16

]. E. Lehmann and &. M. Johansson, Prediction of energy decay in room impulse responses simulated with an image-source model, The Journal of the Acoustical Society of America, vol.124, issue.1, pp.269-277, 2008.
DOI : 10.1121/1.2936367

]. E. Lehmann, Matlab code for image-source model in room acoustics, 2011.

]. Lim and &. Pang, Time Delay Estimation Method Based on Canonical Correlation Analysis. Circuits, Systems and Signal Processing, 2013.

M. Liu, Y. Fu, and &. T. Huang, An Audio-Visual Fusion Framework with Joint Dimensionality Reduction, Proceedings of the IEEE International Conference on Audio Speech and Signal Processing, 2008.

]. J. Liu, J. Luo, and &. M. Shah, Recognizing Realistic Actions from Videos " in the Wild, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2009.

]. J. Lopes and &. S. Singh, Audio and Video Feature Fusion for Activity Recognition in Unconstrained Videos, Intelligent Data Engineering and Automated Learning, 2006.
DOI : 10.1007/11875581_99

R. C. Luo and &. M. Kay, Multisensor integration and fusion in intelligent systems, IEEE Transactions on Systems, Man, and Cybernetics, vol.19, issue.5, p.9011, 1989.
DOI : 10.1109/21.44007

J. Luo, B. Caputo, A. Zweig, J. Bach, and &. J. Anemüller, Object Category Detection Using Audio-Visual Cues, Proceedings of the 6th International Conference on Computer Vision Systems, 2008.
DOI : 10.1007/978-3-540-79547-6_52

M. I. Mandel, D. P. Ellis, and &. T. Jebara, An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments, Proc. NIPS, pp.953-960, 2007.

]. S. Marcel, C. Mccool, P. Matejka, T. Ahonen, and &. J. Cernocky, Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation, 2010.
DOI : 10.1007/978-3-642-17711-8_22
URL : https://hal.archives-ouvertes.fr/hal-01318429

M. Marszalek, I. Laptev, and &. C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206557
URL : https://hal.archives-ouvertes.fr/inria-00548645

]. K. Messer, J. Matas, J. Kittler, and &. K. Jonsson, XM2VTSDB: The Extended M2VTS Database, Proceedings of the International Conference on Audio and Video-based Biometric Person Authentication, pp.72-77, 1999.

P. Natarajan, S. Wu, S. N. Vitaladevuni, X. Zhuang, S. Tsakalidis et al., Multimodal feature fusion for robust event detection in web videos, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6247814

A. Nigam, S. Mccallum, &. T. Thrun, and . Mitchell, Text Classification from Labeled and Unlabeled Documents using EM, Machine Learning, p.1034, 2000.

]. A. Noulas, G. Englebienne, and &. B. Krose, Multimodal Speaker Diarization. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.34, issue.1, pp.79-93, 2012.
DOI : 10.1109/tpami.2011.47

E. K. Patterson, S. Gurbuz, Z. Tufekci, and &. J. Gowdy, CUAVE: A new audio-visual database for multimodal human-computer interface research, Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, pp.2017-2020, 2002.

D. Pavlidi, A. Griffin, M. Puigt, and &. A. Mouchtaris, Real- Time Multiple Sound Source Localization and Counting Using a Circular Microphone Array. Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, issue.10, pp.2193-2206, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01367320

P. Pertilä, Acoustic Source Localization in a Room Environment and at Moderate Distances, 2009.

L. R. Rabiner and &. R. Schafer, Theory and applications of digital speech processing, 2011.

V. Ramasubramanian, R. Karthik, S. Thiyagarajan, and &. S. Cherla, Continuous audio analytics by HMM and Viterbi decoding The topography of multivariate normal mixtures, Proceedings of the IEEE International Conference on Audio, Speech and Signal Processing, pp.2396-2399, 2005.

]. F. Ribeiro, C. Zhang, D. Florêncio, and &. D. Ba, Using Reverberation to Improve Range and Elevation Discrimination for Small Array Sound Source Localization, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.7, pp.1781-1792, 2010.
DOI : 10.1109/TASL.2010.2052250

L. Rybok, S. Friedberger, U. D. Hanebeck, and &. R. Stiefelhagen, The KIT Robo-kitchen data set for the evaluation of view-based activity recognition systems, 2011 11th IEEE-RAS International Conference on Humanoid Robots, 2011.
DOI : 10.1109/Humanoids.2011.6100854

K. Saenko and &. T. Darrell, Object Category Recognition Using Probabilistic Fusion of Speech and Image Classifiers, Proceedings of the 4th International Conference on Machine Learning for Multimodal Interaction, 2008.
DOI : 10.1007/978-3-540-78155-4_4

D. Salvati and &. S. Canazza, Adaptive Time Delay Estimation Using Filter Length Constraints for Source Localization in Reverberant Acoustic Environments, IEEE Signal Processing Letters, vol.20, issue.5, pp.507-510, 2013.
DOI : 10.1109/LSP.2013.2253319

]. J. Sanchez-riera-12b, X. Sanchez-riera, J. Alameda-pineda, A. Wienke, S. Deleforge et al., Online multimodal speaker detection for humanoid robots, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), 2012.
DOI : 10.1109/HUMANOIDS.2012.6651509

]. J. Sanchez-riera-12c, J. Sanchez-riera, &. R. Cech, and . Horaud, Action Recognition Robust to Background Clutter by Using Stereo Vision, 4th International Workshop on Video Event Categorization, Tagging and Retrieval (VECTaR), in conjunction with IEEE European Conference on Computer Vision, 2012.
DOI : 10.1007/978-3-642-33863-2_33

Y. Sasaki, M. Kabasawa, S. Thompson, S. Kagami, and &. K. Oro, Spherical microphone array for spatial sound localization for a mobile robot, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.713-718, 2012.
DOI : 10.1109/IROS.2012.6385877

C. Schüldt, I. Laptev, and &. B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004.
DOI : 10.1109/ICPR.2004.1334462

F. Seco, A. Jiménez, C. Prieto, J. Roa, and &. K. Koutsou, A survey of mathematical methods for indoor localization, 2009 IEEE International Symposium on Intelligent Signal Processing, pp.9-14, 2009.
DOI : 10.1109/WISP.2009.5286582

D. Senkowski, T. R. Schneider, J. J. Foxe, and &. A. Engel, Crossmodal binding through neural coherence: implications for multisensory processing, Trends in Neurosciences, vol.31, issue.8, pp.401-409, 2008.
DOI : 10.1016/j.tins.2008.05.002

Q. Shi, L. Wang, L. Cheng, and &. A. Smola, Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models, International Journal of Computer Vision, vol.6, issue.4???5, pp.22-32, 2011.
DOI : 10.1007/s11263-010-0384-0

S. T. Shivappa, B. D. Rao, and &. M. Trivedi, Auvio-Visual Fusion and Tracking With Multilevel Iterative Decoding: Framework and Experimental Evaluation, Journal of Selected Topics in Signal Processing, 2010.

J. Smith and &. J. Abel, Closed-form least-squares source location estimation from range-difference measurements, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.35, issue.12
DOI : 10.1109/TASSP.1987.1165089

&. F. Chan and . Chan, Closed-form formulae for timedifference-of-arrival estimation, Signal Processing IEEE Transactions on, vol.56, issue.6, pp.2614-2620, 2008.

]. J. Sochman-05, &. J. Sochman, and . Matas, WaldBoost ??? Learning for Time Constrained Sequential Detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
DOI : 10.1109/CVPR.2005.373

S. Sonnenburg, G. Rätsch, S. Henschel, C. Widmer, J. Behr et al., The SHOGUN Machine Learning Toolbox, Journal of Machine Learning Research, vol.99, pp.1799-1802, 2010.

N. Strobel and &. R. Rabenstein, Classification of time delay estimates for robust speaker localization, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), pp.3081-3084, 1999.
DOI : 10.1109/ICASSP.1999.757492

M. Tenorth, J. Bandouch, and &. M. Beetz, The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009.
DOI : 10.1109/ICCVW.2009.5457583

A. Urruela and &. J. Riba, Novel closed-form ML position estimator for hyperbolic location, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004.
DOI : 10.1109/ICASSP.2004.1326216

J. Valin and &. F. Michaud, Robust 3D Localization and Tracking of Sound Sources Using Beamforming and Particle Filtering, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1661100

]. J. Cech-11, J. Cech, &. R. Sanchez-riera, and . Horaud, Scene flow estimation by growing correspondence seeds, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995442

S. Vedula, S. Baker, P. Rander, R. Collins, and &. T. Kanade, Three-Dimensional Scene Flow, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.3, 2005.
DOI : 10.1109/iccv.1999.790293
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.3563

A. Vinciarelli, M. Pantic, and &. H. Bourlard, Social signal processing, Proceeding of the 16th ACM international conference on Multimedia, MM '08, pp.1743-1759, 2009.
DOI : 10.1145/1459359.1459573

H. Viste and &. G. Evangelista, On the Use of Spatial Cues to Improve Binaural Source Separation, proc. DAFx, pp.209-213, 2003.

D. Weinland, R. Ronfard, and &. E. Boyer, Free viewpoint action recognition using motion history volumes, Computer Vision and Image Understanding, vol.104, issue.2-3, pp.249-257, 2006.
DOI : 10.1016/j.cviu.2006.07.013
URL : https://hal.archives-ouvertes.fr/inria-00544629

J. Wienke and &. S. Wrede, A middleware for collaborative research in experimental robotics, 2011 IEEE/SICE International Symposium on System Integration (SII), 2011.
DOI : 10.1109/SII.2011.6147617

G. Willems, J. H. Becker, and &. T. Tuytelaars, Exemplarbased Action Recognition in Video, Proceedings of the British Machine Vision Conference, 2009.

J. Woodruff and &. D. Wang, Binaural Localization of Multiple Sources in Reverberant and Noisy Environments, IEEE Transactions on Audio, Speech, and Language Processing, vol.20, issue.5, pp.1503-1512, 2012.
DOI : 10.1109/TASL.2012.2183869

]. Q. Wu, Z. Wang, F. Deng, and &. D. Feng, Realistic Human Action Recognition with Audio Context, 2010 International Conference on Digital Image Computing: Techniques and Applications, 2010.
DOI : 10.1109/DICTA.2010.57