Abstract : The objective of this thesis is to extract the informative temporal segments from video sequences, more particularly in aerial video. Manual interpretation of such videos for information gathering faces an ever growing volume of available data. We have thus considered an algorithmic assistance based on different modalities of indexation in order to locate "segments of interest" and avoid a complete visualization of the video. We have chosen two methods in particular and have respectively developed them in each part of this thesis. Part 1 describes how viewing conditions can be used as a method of indexation. The assessment of image quality enables to filter out the temporal segments for which the quality is low and which can thus not be exploited. The classification of global image motion, which is directly linked to camera motion, leads to a method of indexation for video sequences. Indeed, it emphasizes possible segments of interest or, conversely, difficult segments for which motion is very fast or oscillating. Part 2 focuses on the dynamic content of video sequences, especially the presence of moving objects. We first present a local (in time) approach. This approach refines the results obtained after a first classification by supervised learning by using contextual information, spatial then semantic information. We have then investigated several methods for moving object detection which are global in time. Such approaches aim to enforce the temporal consistency of the detected objects and to reduce false detections.