Skip to Main content Skip to Navigation
Habilitation à diriger des recherches

Weakly Supervised and On-line Machine Learning for Object Tracking and Recognition in Images and Videos

Stefan Duffner 1
1 imagine - Extraction de Caractéristiques et Identification
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : This manuscript summarises the work that I have been involved in for my post-doctoral research and in the context of my PhD supervision activities during the past 11 years. I have conducted this work partly as a post-doctoral researcher at the Idiap Research Institute in Switzerland, and partly as an associate professor at the LIRIS laboratory and INSA Lyon in France. The technical section of the manuscript comprises two main parts: the first part being on on-line learning approaches for visual object tracking in dynamic environments, and the second part on similarity metric learning algorithms and Siamese Neural Networks (SNN). I first present our work on on-line multiple face tracking in a dynamic indoor environment, where we focused on the aspects of track creation and removal for long-term tracking. The automatic detection of the faces to track is challenging in this setting because they may not be detected for long periods of time, and false detections may occur frequently. Our proposed algorithm consisted in a recursive Bayesian framework with a separate track creation and removal step based on Hidden Markov Models including observation likelihood functions that are learnt off-line on a set of static and dynamic features related to the tracking behaviour and the objects’ appearance. This approach is very efficient and showed superior performance to the state of the art in on-line multiple object tracking. In the same context, we further developed a new on-line algorithm to estimate the Visual Focus of Attention from videos of persons sitting in a room. This unsupervised on-line learning approach is based on an incremental k-means algorithm and is able to automatically extract, from a video stream, the targets that the persons are looking at in a room. I further present our research on on-line learnt robust appearance models for single-object tracking. In particular we focused on the problem of model-free, on-line tracking of arbitrary objects, where the state and model of the object to track is initialised in the first frame and updated throughout the rest of the video. Our first approach, called PixelTrack, consists in a combined detection and segmentation framework that robustly learns the appearance of the object to track and avoids drift by an effective on-line co-training algorithm. This method showed excellent tracking performance on public benchmarks, both in terms of robustness and speed, and is particularly suitable for tracking deformable objects. The second tracking approach, called MCT, employs an on-line learnt discriminative classifier that stochastically samples the training instances from a dynamic probability density function that is computed from moving and possibly distracting image background regions. The use of this motion context showed to be very effective and lead to a significant gain in the overall tracking robustness and performance. We extended this idea by designing a set of features that concisely describe the visual context of the overall scene shown in a video at a given point in time. Then, we applied several complementary tracking algorithms on a set of training videos and computed the corresponding context features for each frame. Finally, we trained a discriminative classifier off-line that estimates the most suitable tracker for a given context, and applied it on-line in an effective tracker-selection framework. Evaluated on several different “pools” of individual trackers, the combined model lead to an increased performance in terms of accuracy and robustness on challenging public benchmarks. In the second part of the manuscript, I present several contributions related to SNNs for similarity metric learning. First, we proposed a new objective function and training algorithm called Triangular Similarity Metric Learning that enhances the convergence behaviour and achieved state-of-the-art results on pairwise verification tasks, like face, speaker or kinship verification. Then, I present our work on SNNs for gesture classification from inertial sensor data, where we proposed a new class-balanced learning strategy operating on tuples of training samples and an objective function based on a polar sine formulation. Finally, I present several contributions on SNN with deeper and more complex Convolutional Neural Network models applied to the problem of person re-identification in images. In this context, we proposed different neural architectures and triplet learning methods that include semantic prior knowledge, e.g. on pedestrian attributes, body orientation and surrounding group context, using a combination of supervised and weakly supervised algorithms. Also, a new learning-to-rank algorithm for SNN, called Rank-Triplet, has been introduced and successfully applied to person re-identification. These recent works achieved state-of-the-art re-identification results on challenging pedestrian image datasets and opened new perspectives for future similarity metric approaches.
Complete list of metadatas

Cited literature [456 references]  Display  Hide  Download
Contributor : Stefan Duffner <>
Submitted on : Monday, June 17, 2019 - 10:10:38 AM
Last modification on : Wednesday, July 8, 2020 - 12:43:36 PM


Files produced by the author(s)


  • HAL Id : tel-02157568, version 1


Stefan Duffner. Weakly Supervised and On-line Machine Learning for Object Tracking and Recognition in Images and Videos. Computer Vision and Pattern Recognition [cs.CV]. Université Lyon 1 - Claude Bernard; INSA Lyon, 2019. ⟨tel-02157568⟩



Record views


Files downloads