Abstract : Visual lifelog indexing by content has emerged as a high reward application. Enabled by the recent availability of miniaturized recording devices, the demand for automatic extraction of relevant information from wearable sensors generated content has grown. Among many other applications, indoor localization is one challenging problem to be addressed. Many standard solutions perform unreliably in indoors conditions or require significant intervention. In this thesis we address the problem of localization from the perspective of image-based approach using wearable video camera sensors. The key contribution of this work is the development and the study of a location estimation system composed of diverse modules, which perform tasks ranging from low-level visual information extraction to final topological location estimation with the aid of automatic indexing algorithms. Within this framework, important contributions have been made by efficiently leveraging information brought by multiple visual features, unlabeled image data and the temporal continuity of the video. Early and late data fusion were considered, and shown to take advantage of the complementarities of multiple visual features describing the images. Due to the difficulty in obtaining annotated data in our context, semi-supervised approaches were investigated, to use unlabeled data as additional source of information, both for non-linear data-adaptive dimensionality reduction, and for improving classification. Herein we have developed a time-aware co-training approach that combines late datafusion with the semi-supervised exploitation of both unlabeled data and time information. Finally, we have proposed to apply transformation invariant learning to adapt non-invariant descriptors to our localization framework. The methods have been tested on controlled publicly available data sets to evaluate the gain of each contribution. This work has also been applied to the IMMED project, dealing with activity recognition and monitoring of the daily living using a wearable camera. In this context, the developed framework has been used to estimate localization on the real world IMMED project video corpus, which showed the potential of the approaches in such challenging conditions.