.. Architecturecompì-ete-du-détecteur-d-'intentionnalité, Le rectangle rouge met en avant les trois modalités utilisées en entrée du détecteur, p.26

.. Architecturecompì-ete-du-détecteur-d-'intentionnalité, En rouge, la partie décision du détecteur détaillée dans cette section, p.31

L. Courbes-bleue-noire-représentent-respectivement-la-sortie-de-notre-détecteur-et-la-vérité-terrain and .. , Les courbes rouge et verte décrivent respectivement la probabilité de détecter ou non-détecter une intentionnalité, montrant l'´ evolution des deuxétatsdeuxétats du HMM au cours du temps, p.35

.. Architecturecompì-ete-du-détecteur-d-'intentionnalité, Le cadre rouge désigne la partie filtrage détaillée dans ce chapitre, p.40

. Visualisation-du-système and . De-smach, Chaque zone grisée est une sous-machinè a ´ etats . Les sorties sont représentées par les ovales rouges, p.101

M. A. Aizerman, E. A. Braverman, and L. Rozonoer, Theoretical foundations of the potential function method in pattern recognition learning, Automation and Remote Control, 1964.

J. Allen, D. Byron, M. Dzikovska, G. Ferguson, L. Galescu et al., An architecture for a generic dialogue shell, Natural Language Engineering, vol.6, issue.3&4, 2000.
DOI : 10.1017/S135132490000245X

C. Andrieu, M. Davy, and A. Doucet, Improved auxiliary particle filtering: applications to time-varying spectral analysis, Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal Processing (Cat. No.01TH8563), 2001.
DOI : 10.1109/SSP.2001.955284

L. Bascetta, G. Ferretti, P. Rocco, H. Ardo, H. Bruyninckx et al., Towards safe human-robot interaction in robotic cells: An approach based on visual tracking and intention estimation, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011.
DOI : 10.1109/IROS.2011.6094642

H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, Speeded-up robust features (surf) Computer Vision and Image Understanding, 2008.

B. Bogert, M. Healy, and J. Tukey, The quefrency alanysis of time series for echoes : Cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking, Proc. Symp. on Time Series Analysis, pp.209-243, 1963.

R. Brochard, B. Burger, A. Herbulot, and F. Lerasle, Measuring gaze orientation for human-robot interaction, Int. Workshop during IEEE Int. Symp. on Robot and Human Interactive Communication (RO-MAN'09), 2009.

M. Buss, D. Carton, B. Gonsior, K. Kuehnlenz, C. Landsiedel et al., Towards proactive human-robot interaction in human environments, Cognitive Infocommunications (CogInfoCom) 2nd International Conference on, 2011.

B. Can and H. Artuner, A syllable-based Turkish speech recognition system by using time delay neural networks (TDNNs), 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), 2013.
DOI : 10.1109/SOCPAR.2013.7054130

Y. Changjiang, R. Duraiswami, D. , and L. , Fast multiple object tracking via a hierarchical particle filter, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.95

S. Y. Chen, Kalman Filter for Robot Vision: A Survey, IEEE Transactions on Industrial Electronics, vol.59, issue.11, 2012.
DOI : 10.1109/TIE.2011.2162714

Y. Chen and G. Medioni, Object modeling by registration of multiple range images, Proceedings. 1991 IEEE International Conference on Robotics and Automation, 1991.
DOI : 10.1109/ROBOT.1991.132043

H. Chen-chien and D. Guo-tang, Multiple object tracking using particle swarm optimization, World Academy of Science, Engineering and Technology, 2012.

Y. Cheng, Mean shift, mode seeking, and clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1995.

C. Ching-han and Y. Miao-chun, PSO-Based Multiple People Tracking, Digital Information and Communication Technology and Its Applications, 2011.
DOI : 10.1016/0031-3203(95)00097-6

J. Choi and J. Chang, On using spectral gradient in conditional MAP criterion for robust voice activity detection, 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content, 2012.
DOI : 10.1109/ICNIDC.2012.6418777

A. B. Clair, R. Mead, and M. J. Matari´cmatari´c, Monitoring and guiding user attention and intention in human-robot interaction, IEEE International Conference on Robotics and Automation Workshop on Interactive Communication for Autonomous Intelligent Robots, 2010.

P. Deléglise, Y. Estève, S. Meignier, M. , and T. , The lium speech transcription system : a cmu sphinx iii-based system for french broadcast news, 2005.

L. Deng, J. Li, J. Huang, K. Yao, D. Yu et al., Recent advances in deep learning for speech research at Microsoft, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
DOI : 10.1109/ICASSP.2013.6639345

A. Doucet, S. Godsill, and C. Andrieu, On sequential monte carlo sampling methods for bayesian filtering, Statistics and Computing, 2000.

A. Doucet and N. Gordon, Efficient particle filters for tracking manoeuvring targets in clutter, IEE Colloquium. Target Tracking: Algorithms and Applications, 1999.
DOI : 10.1049/ic:19990505

D. Dov, R. Talmon, and I. Cohen, Audio-visual voice activity detection using diffusion maps. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 2015.

G. Fanelli, M. Dantone, J. Gall, A. Fossati, V. Gool et al., Random Forests for Real Time 3D Face Analysis, International Journal of Computer Vision, vol.41, issue.5, 2013.
DOI : 10.1007/s11263-012-0549-0

G. Fanelli, J. Gall, V. Gool, and L. , Real time head pose estimation with random regression forests, CVPR 2011, 2011.
DOI : 10.1109/CVPR.2011.5995458

S. Fleury, M. Herrb, C. , and R. , Genom : A tool for the specification and the implementation of operating modules in a distributed robot architecture, International Conference on Intelligent Robots and Systems, 1997.

C. Fook, M. Hariharan, S. Yaacob, and A. Adom, A review: Malay speech recognition and audio visual speech recognition, 2012 International Conference on Biomedical Engineering (ICoBE), 2012.
DOI : 10.1109/ICoBE.2012.6179063

S. Galliano, E. Geoffrois, G. Gravier, J. Bonastre, D. Mostefa et al., Corpus description of the ester evaluation campaign for the rich transcription of french broadcast news, Proceedings of the 5th international Conference on Language Resources and Evaluation, 2006.

A. Giremus, A. Doucet, V. Calmettes, and J. Tourneret, A Rao-Blackwellized particle filter for INS/GPS integration, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004.
DOI : 10.1109/ICASSP.2004.1326707

O. Gorur and A. Erkmen, Elastic networks in reshaping human intentions by proactive social robot moves, The 23rd IEEE International Symposium on Robot and Human Interactive Communication, 2014.
DOI : 10.1109/ROMAN.2014.6926385

J. Han, D. Kim, K. , and J. , Physical Learning Activities with a Teaching Assistant Robot in Elementary School Music Class, 2009 Fifth International Joint Conference on INC, IMS and IDC, 2009.
DOI : 10.1109/NCM.2009.407

L. Herranz, R. Xu, and S. Jiang, A probabilistic model for food image recognition in restaurants, 2015 IEEE International Conference on Multimedia and Expo (ICME), 2015.
DOI : 10.1109/ICME.2015.7177464

J. Huang, S. Xuhui, and H. Wechsler, Face pose discrimination using support vector machines (SVM), Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170), 1998.
DOI : 10.1109/ICPR.1998.711102

B. Huber, Foot position as indicator of spatial interest at public displays, CHI '13 Extended Abstracts on Human Factors in Computing Systems on, CHI EA '13, 2013.
DOI : 10.1145/2468356.2479495

E. Hudlicka and M. D. Mcneese, Assessment of user affective and belief states for interface adaptation : Application to an air force pilot task, 2002.

D. Huggins-daines, M. Kumar, A. Chan, A. W. Black, M. Ravishankar et al., Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006.
DOI : 10.1109/ICASSP.2006.1659988

M. Isard and A. Blake, Condensation?conditional density propagation for visual tracking, International Journal of Computer Vision, 1998.

W. Johal, C. Adam, H. Fiorino, S. Pesty, C. Jost et al., Acceptability of a companion robot for children in daily life situations, 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom), 2014.
DOI : 10.1109/CogInfoCom.2014.7020474

URL : https://hal.archives-ouvertes.fr/hal-01117304

J. Kennedy and R. Eberhart, Particle swarm optimization, Proceedings of ICNN'95, International Conference on Neural Networks, 1995.
DOI : 10.1109/ICNN.1995.488968

Y. Kondo, K. Takemura, J. Takamatsu, and T. Ogasawara, Planning body gesture of android for multi-person human-robot interaction, 2012 IEEE International Conference on Robotics and Automation, 2012.
DOI : 10.1109/ICRA.2012.6224903

A. Kong, J. S. Liu, and W. H. Wong, Sequential Imputations and Bayesian Missing Data Problems, Journal of the American Statistical Association, vol.52, issue.425, 1994.
DOI : 10.1080/01621459.1987.10478458

G. M. Kruijff, H. Zender, P. Jensfelt, and H. I. Christensen, Clarification dialogues in human-augmented mapping, Proceeding of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction , HRI '06, 2006.
DOI : 10.1145/1121241.1121290

J. Kuan, T. Huang, and H. Huang, Human intention estimation method for a new compliant rehabilitation and assistive robot, SICE Annual Conference, 2010.

E. A. Kuli´ckuli´c and D. Croft, Estimating intent for human-robot interaction, International Conference on Advanced Robotics, 2003.

S. Kumar, P. Rajasekar, T. Mandharasalam, and S. Vignesh, Handicapped assisting robot, 2013 International Conference on Current Trends in Engineering and Technology (ICCTET), 2013.
DOI : 10.1109/ICCTET.2013.6675917

S. Lemaignan, R. Ros, R. Alami, and M. Beetz, What are you talking about? Grounding dialogue in a perspective-aware robotic architecture, 2011 RO-MAN, 2011.
DOI : 10.1109/ROMAN.2011.6005249

URL : https://hal.archives-ouvertes.fr/hal-00664548

S. Lemaignan, R. Ros, L. Mösenlechner, R. Alami, and M. Beetz, Oro, a knowledge management module for cognitive architectures in robotics, Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010.

T. Li, S. Sun, and T. Sattar, Adapting sample size in particle filters through KLD-resampling, Electronics Letters, vol.49, issue.12, 2013.
DOI : 10.1049/el.2013.0233

Y. Li, Y. Li, T. Hu, and Z. Lv, An automatic semantic Web service composition method based on ontology, 2015 IEEE/ACIS 14th International Conference on Computer and Information Science (ICIS), 2015.
DOI : 10.1109/ICIS.2015.7166656

H. Maganti, D. Gatica-perez, and I. Mccowan, Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array, Audio, Speech, and Language Processing, 2007.
DOI : 10.1109/TASL.2007.906197

A. Mallet, S. Fleury, and H. Bruyninckx, A specification of generic robotics software components : future evolutions of genom in the orocos context, Intelligent Robots and Systems, 2002.

M. Martin, F. Van-de-camp, and R. Stiefelhagen, Real Time Head Model Creation and Head Pose Estimation on Consumer Depth Cameras, 2014 2nd International Conference on 3D Vision, 2014.
DOI : 10.1109/3DV.2014.54

S. Mei-ping and G. Guo-chang, Research on particle swarm optimization: a review, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826), 2004.
DOI : 10.1109/ICMLC.2004.1382171

C. Mollaret, F. Lerasle, I. Ferrané, and J. Pinquier, A particle swarm optimization inspired tracker applied to visual tracking, 2014 IEEE International Conference on Image Processing (ICIP), 2014.
DOI : 10.1109/ICIP.2014.7025085

URL : https://hal.archives-ouvertes.fr/hal-01390848

C. Mollaret, A. Mekonnen, I. Ferrane, J. Pinquier, and F. Lerasle, Perceiving user's intention-for-interaction: A probabilistic multimodal data fusion scheme, 2015 IEEE International Conference on Multimedia and Expo (ICME), 2015.
DOI : 10.1109/ICME.2015.7177514

E. Murphy-chutorian and M. Trivedi, Head pose estimation in computer vision : A survey. Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp.607-626, 2009.

Y. Onuma, N. Kamado, H. Saruwatari, and K. Shikano, Real-time semi-blind speech extraction with speaker direction tracking on kinect, Signal Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012.

R. Ooko, R. Ishii, and Y. Nakano, Estimating a User???s Conversational Engagement Based on Head Pose Information, Intelligent Virtual Agents, 2011.
DOI : 10.1007/978-3-642-23974-8_29

P. Padeleris, X. Zabulis, and A. Argyros, Head pose estimation on depth data based on Particle Swarm Optimization, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012.
DOI : 10.1109/CVPRW.2012.6239236

A. Pandey, M. Ali, M. Warnier, A. , and R. , Towards multi-state visuo-spatial reasoning based proactive human-robot interaction, 2011 15th International Conference on Advanced Robotics (ICAR), 2011.
DOI : 10.1109/ICAR.2011.6088642

T. Pellegrini, P. Guyot, B. Angles, C. Mollaret, and C. Mangou, Towards soundpainting gesture recognition (regular paper), Audio Mostly, pp.2014-2017, 2014.

M. Pinheiro, E. Bicho, and W. Erlhagen, A dynamic neural field architecture for a proactive assistant robot, In Biomedical Robotics and Biomechatronics (BioRob), 2010.

R. Poli, Analysis of the Publications on the Applications of Particle Swarm Optimisation, Journal of Artificial Evolution and Applications, vol.15, issue.4, 2008.
DOI : 10.1109/TPWRS.2005.846064

T. Qiao and S. Dai, Fast head pose estimation using depth data, 2013 6th International Congress on Image and Signal Processing (CISP), 2013.
DOI : 10.1109/CISP.2013.6745249

L. Rabiner and B. Juang, Fundamentals of Speech Recognition, 1993.

J. Rios-martinez, A. Escobedo, A. Spalanzani, and C. Laugier, Intention driven human aware navigation for assisted mobility, Workshop on Assistance and Service robotics in a human environment at IROS, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00757133

R. Rocha, V. Freire, A. , and M. , Voice segmentation system based on energy estimation, Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European, 2014.

A. Schmid, O. Weede, and H. Worn, Proactive Robot Task Selection Given a Human Intention Estimate, RO-MAN 2007, The 16th IEEE International Symposium on Robot and Human Interactive Communication, 2007.
DOI : 10.1109/ROMAN.2007.4415181

E. Seemann, K. Nickel, and R. Stiefelhagen, Head pose estimation using stereo vision for human-robot interaction, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings., 2004.
DOI : 10.1109/AFGR.2004.1301603

M. Seltzer and R. Stern, Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments, Audio, Speech, and Language Processing, 2006.
DOI : 10.1109/TASL.2006.872614

F. Sha, C. Bae, G. Liu, X. Zhao, Y. Y. Chung et al., A categorized particle swarm optimization for object tracking, 2015 IEEE Congress on Evolutionary Computation (CEC), 2015.
DOI : 10.1109/CEC.2015.7257228

L. S. Simon and E. Vincent, Combining blockwise and multi-coefficient stepwise approches in a general framework for online audio source separation, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01186948

D. Spiliotopoulos, I. Androutsopoulos, and C. D. Spyropoulos, Human-robot interaction based on spoken natural language dialogue, Proceedings of the European Workshop on Service and Humanoid Robots, 2001.

A. Tavakkoli, R. Kelley, C. King, M. Nicolescu, M. Nicolescu et al., A visionbased architecture for intent recognition, Advances in Visual Computing, 2007.

M. Trawicki, M. Johnson, A. Ji, and T. Osiejuk, Multichannel speech recognition using distributed microphone signal fusion strategies, 2012 International Conference on Audio, Language and Image Processing, 2012.
DOI : 10.1109/ICALIP.2012.6376789

R. Valenti, N. Sebe, and T. Gevers, Combining Head Pose and Eye Location Information for Gaze Estimation, IEEE Transactions on Image Processing, vol.21, issue.2, 2012.
DOI : 10.1109/TIP.2011.2162740

P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001.
DOI : 10.1109/CVPR.2001.990517

E. Wan and R. V. Merwe, The unscented Kalman filter for nonlinear estimation, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373), 2000.
DOI : 10.1109/ASSPCC.2000.882463

Y. Wan, T. Zhang, Z. Wang, J. , and J. , Robust speech recognition based on multi-band spectral subtraction, 2013 6th International Congress on Image and Signal Processing (CISP), 2013.
DOI : 10.1109/CISP.2013.6744019

Y. Wang, S. Huang, W. , and Y. , A voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin, 2013 6th International Congress on Image and Signal Processing (CISP), 2013.
DOI : 10.1109/CISP.2013.6743871

X. Wei, X. Zhang, W. , and Y. , Research on a detection and recognition method of tactile-slip sensation used to control the Elderly-assistant & Walking-assistant Robot, 2012 IEEE International Conference on Automation Science and Engineering (CASE), 2012.
DOI : 10.1109/CoASE.2012.6386341

J. Wu and L. Etzkorn, Validation of an approach for finding good anchor nodes in ontologies in the semantic web, SoutheastCon 2015, 2015.
DOI : 10.1109/SECON.2015.7132942

Y. Xiao, Z. Zhang, A. Beck, J. Yuan, and D. Thalmann, Human???Robot Interaction by Understanding Upper Body Gestures, Presence: Teleoperators and Virtual Environments, vol.7, issue.1, 2014.
DOI : 10.1109/TMECH.2011.2181977

G. Xiong, J. Gong, T. Zhuang, T. Zhao, D. Liu et al., Development of Assistant Robot with Standing-up Devices for Paraplegic Patients and Elderly People, 2007 IEEE/ICME International Conference on Complex Medical Engineering, 2007.
DOI : 10.1109/ICCME.2007.4381693

K. Yamazaki, R. Ueda, S. Nozawa, M. Kojima, K. Okada et al., Home-Assistant Robot for an Aging Society, Proceedings of the IEEE, 2012.
DOI : 10.1109/JPROC.2012.2200563

S. Young, M. Gasic, B. Thomson, W. , and J. , Pomdp-based statistical spoken dialogue systems : a review, Proceedings of the IEEE, 2013.

J. Zhang, Y. Feng, G. Ning, J. , and F. , Noise adaptive stream fusion based on feature component rejection for robust multi-stream speech recognition, 2015 Seventh International Conference on Advanced Computational Intelligence (ICACI), 2015.
DOI : 10.1109/ICACI.2015.7184714

X. Zhang, W. Hu, W. Li, W. Qu, and S. Maybank, Multi-object tracking via species based particle swarm optimization, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009.
DOI : 10.1109/ICCVW.2009.5457581

X. Zhang, W. Hu, S. Maybank, X. Li, and M. Zhu, Sequential particle swarm optimization for visual tracking, CVPR, 2008.

. Lors-de-ce, Une phase de monitoring o` u le robot se trouve loin de l'utilisateur et l'observe de sa position, en attente d'une demande d'interaction, une phase d'interaction proximale o` u le robot se trouve proche de l'utilisateur et interagit avec lui, et enfin la transition qui permet au robot de passer d'une phasè a l'autre. Ce scénario est donc construit demanì erè a créer un robot d'interaction proactif mais non-intrusif. Le caractère non-intrusif est matérialisé par la phase de monitoring. La proactivité est, quantàquantà elle, matérialisée par la création d'un détecteur d'intentionnalité permettant au robot de comprendre demanì ere non-verbale la volonté de l'utilisateur de communiquer avec lui. Les contributions scientifiques de cette thèse recoupent divers aspects du projet : le scénario robotique, le détecteur d'intentionnalité, une technique de filtrage par essaim de particules, et enfin une technique bayésienne d'amélioration du taux d'erreur de motàmotà partir d'informations de distance, défini un scénario robotique regroupant trois phases principales