Skip to Main content Skip to Navigation

Learning objects model and context for recognition and localisation

Abstract : This Thesis addresses the modeling, recognition, localization and use of context for objects manipulation by a robot. We start by presenting the modeling process and its components: the real system, the sensors' data, the properties to reproduce and the model. We show how, by specifying each of them, one can define a modeling process adapted to the problem at hand, namely object manipulation by a robot. This analysis leads us to the adoption of local textured descriptors for object modeling. Modeling with local textured descriptors is not a new concept, it is the subject of many Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM) works. Existing methods include bundler, roboearth modeler and 123DCatch. Still, no method has gained widespread adoption. By implementing a similar approach, we show that they are hard to use even for expert users and produce highly complex models. Such complex techniques are necessary to guaranty the robustness of the model to view point change. There are two ways to handle the problem: the multiple views paradigm and the robust features paradigm. The multiple views paradigm advocate in favor of using a large number of views of the object. The robust feature paradigm relies on robust features able to resist large view point changes. We present a set of experiments to provide an insight into the right balance between both. By varying the number of views and using different features we show that small and fast models can provide robustness to view point changes up to bounded blind spots which can be handled by robotic means. We propose four different methods to build simple models from images only, with as little a priori information as possible. The first one applies to planar or piecewise planar objects and relies on homographies for localization. The second approach is applicable to objects with simple geometry, such as cylinders or spheres, but requires many measures on the object. The third method requires the use of a calibrated 3D sensor but no additional information. The fourth technique doesn't need a priori information at all. We apply this last method to autonomous grocery objects modeling. From images automatically retrieved from a grocery store website, we build a model which allows recognition and localization for tracking. Even using light models, real situations ask for numerous object models to be stored and processed. This poses the problems of complexity, processing multiple models quickly, and ambiguity, distinguishing similar objects. We propose to solve both problems by using contextual information. Contextual information is any information helping the recognition which is not directly provided by sensors. We focus on two contextual cues: the place and the surrounding objects. Some objects are mainly found in some particular places. By knowing the current place, one can restrict the number of possible identities for a given object. We propose a method to autonomously explore a previously labeled environment and establish a correspondence between objects and places. Then this information can be used in a cascade combining simple visual descriptors and context. This experiment shows that, for some objects, recognition can be achieved with as few as two simple features and the location as context. The objects surrounding a given object can also be used as context. Objects like a keyboard, a mouse and a monitor are often close together. We use qualitative spatial descriptors to describe the position of objects with respect to their neighbors. Using a Markov Logic Network, we learn patterns in objects disposition. This information can then be used to recognize an object when surrounding objects are already identified. This Thesis stresses the good match between robotics, context and objects recognition.
Document type :
Complete list of metadata

Cited literature [226 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Thursday, August 29, 2019 - 4:03:06 PM
Last modification on : Saturday, August 31, 2019 - 1:17:32 AM


Version validated by the jury (STAR)


  • HAL Id : tel-02274280, version 1



Guido Manfredi. Learning objects model and context for recognition and localisation. Robotics [cs.RO]. Université Paul Sabatier - Toulouse III, 2015. English. ⟨NNT : 2015TOU30386⟩. ⟨tel-02274280⟩



Record views


Files downloads