Context-aware person recognition in TV programs

Thomas Petit

Résumé

The automatic recognition and retrieval of faces can be a useful tool for exploiting and promoting large datasets, such as the archival collection of TV shows stored by INA. Although face recognition solutions have improved dramatically in the last decade, they unfortunately remain prone to mistakes, more especially with a large number of faces and a large number of different identities. The various TV shows are however quite standardised, meaning that it is most of the time easy for anyone to tell what a TV show is about in a glimpse, be it a sport show, an entertainment show or a newscast. Though implicit, this standardisation of the TV shows applies in numerous ways, from the visual appearance of the show to the broadcast time. Moreover, we also know that the contextual information plays a major role in helping the human brain recognizing people, and that, in fact, we seldom recognize people based on their facial appearance only. This also applies to TV shows, where the various contextual information can help us identify who is likely or not to appear in a given show. The goal of this thesis is to identify and to exploit the contextual modalities available and potentially useful for the identification of the people appearing in TV shows. For each one of these modalities, we extract the information as a feature descriptor which can be combined to the facial feature descriptor to either retrieve other instances of the same person or to identify them. More especially, we focus on how the social relationships of the people appearing in the shows make them more likely to appear with some people than with others. We introduce an unsupervised method for identifying simultaneously the participants of a TV show, by estimating their probably to appear together based on previous unannotated observations. We also study the visual context of the shows and we highlight how the background and other visual cues can help to successfully identify difficult faces. Finally, we explore how useful can be the contextual modalities such as the time of broadcast or the thematic tags assigned to each show, by evaluating the improvement they bring on the face recognition task and how redundant they can be with the other modalities.

L'identification automatique et la recherche par similarité des visages peut s'avérer être un outil utile pour la fouille de grandes bases de données telles que les archives télévisuelles de l'INA. Bien que les outils de reconnaissance faciale aient grandement progressé récemment, ils ne sont pas pour autant exempts d'erreurs, notamment lorsque la quantité de visages et le nombre de personnalités à reconnaître deviennent trop grands. En revanche, les programmes télévisés sont généralement très codifiés, de telle manière qu'il est aisé pour chacun de dire en quelques secondes d'une émission s'il s'agit d'une émission sportive, de divertissement ou d'actualité. Cette codification des programmes, bien qu'implicite, peut s'étendre de l'apparence visuelle du plateau au choix du créneau horaire. Par ailleurs, nous savons aussi aujourd'hui que le contexte, au sens large, joue un rôle important pour le cerveau afin de reconnaître des individus, et que l'on ne reconnaît en réalité que très rarement des visages de par leur apparence seule. Ceci s'applique aussi bien évidemment aux programmes télévisés, où ces informations nous permettent donc de prédire qui est susceptible ou non de participer à une émission donnée. L'objectif de cette thèse est ainsi d'exploiter l'ensemble des informations contextuelles disponibles et potentiellement utiles pour l'identification des personnalités apparaissant dans les programmes télévisés. Pour chacune de ces modalités, nous en extrayons l'information, qui combinée aux descripteurs faciaux des sujets à reconnaître, permettra d'améliorer la recherche de nouvelles instances ou la classification des visages. Nous nous intéressons notamment aux relations sociales entre les différents participants faisant que certains sont plus susceptibles d'apparaître ensemble à la télévision que d'autres. Nous proposons ainsi une méthode non-supervisée pour identifier simultanément l'ensemble des participants à un programme télévisé, en estimant leur probabilité d'apparaître conjointement. Dans une seconde partie, nous nous intéressons aux informations contenues dans le contexte visuel des programmes télévisé et montrons que les arrière-plans visibles à l'écran peuvent aider à d'identifier avec succès les visages ambigus. Nous explorons aussi les modalités contextuelles telles que les heures de diffusion ou les catégorisations thématiques des programmes, pour lesquelles nous évaluons l'apport d'informations utiles à la reconnaissance des participants ainsi que leur redondance avec les autres modalités étudiées.

Context-aware person recognition in TV programs

Reconnaissance des personnes grâce au contexte dans les programmes télévisés

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager