Traitrements cognitifs mis en jeu dans la perception visuelle de scènes complexes et conséquences sur l'indexation automatique d'images

Abstract : Human scene understanding is remarkable because, with only a brief glance at an image, an abundance of information is available: image content and meaning, spatial layout and semantic label (Potter, 1975; Schyns & Oliva, 1994; Thorpe, Fize, & Marlot, 1996) etc. Currently, several hypotheses have been advanced to explain how scenes are recognized so quickly. First, it could be that a diagnostic object is rapidly identified, and that the scene gist is inferred with from this object (Friedman, 1979) or a few objects and their spatial relationships (De Graef, Christiaens, & d'Ydewalle, 1990). Contrary to the traditional ideas of research in scene understanding that treat objects as the atoms of recognition, the real world scenes can be recognized without necessarily identifying the objects they contain (Greene & Oliva, 2006; Schyns & Oliva, 1994; Oliva & Schyns, 2000). There are some scene-level features that directly suggest identity and gist without requiring identification of any of the specific objects or any specific spatial relationships among them. Past suggestions for these features include large volumetric shapes or other similar large-scale image features (Biederman, 1995). Studies of eye movement in scene recognition have shown that two kinds of information can be coded and stored during the early stages of low-level cognitive processing of complex scenes. These are contour density, local contrast (Mannan, Ruddock & Wooding, 1996, 1997; Reinagel & Zador, 1999), and global layout information (Sanocki & Epstein, 1997; Castelhano & Henderson, 2003; Oliva & Torralba., 2003). These two types of information are manipulated to transform the image into a "structural luminance image". The purpose of this work was to investigate how the "structural luminance image" is used by humans to process information in a real-world scene. The finding from extensive experiments demonstrates that subjects are able to identify natural scenes based on large structural regions of different luminance
