Abstract : Numerous fields of applied sciences and industries have been witnessing a process of digitisation over the past few years. This trend has come with a steady increase in the amount of available digital data whose processing was become a challenging task. For instance, it is nowadays common to take thousands of pictures of several millions of pixels, which makes any subsequent image-processing/computer-vision task a computationally demanding exercise. In this context, parsimony--also known as sparsity--has emerged as a key concept in machine learning, statistics and signal processing. It is indeed appealing to represent, analyze, and exploit data through a reduced number of parameters, e.g., performing object recognition over high-resolution images based only on some relevant subsets of pixels. While general sparsity-inducing approaches have already been well-studied--with elegant theoretical foundations, efficient algorithmic tools and successful applications, this thesis focuses on a particular and more recent form of sparsity, referred to as structured sparsity. As its name indicates, we shall consider situations where we are not only interested in sparsity, but where some structural prior knowledge is also available. Continuing the example of object recognition, we know that neighbouring pixels on images tend to share similar properties--e.g., the label of the object class to which they belong--so that sparsity-inducing approaches should take advantage of this spatial information. The goal of this thesis is to understand and analyze the concept of structured sparsity, based on statistical, algorithmic and applied considerations. To begin with, we introduce a family of structured sparsity-inducing norms whose properties are closely studied. In particular, we show what type of structural prior knowledge they correspond to, and we present the statistical conditions under which these norms are capable of consistently performing structured variable selection. We then turn to the study of sparse structured dictionary learning, where we use the aforementioned norms within the framework of matrix factorization. The resulting approach is flexible and versatile, and it is shown to learn representations whose structured sparsity patterns are adapted to the considered class of signals. From an optimization viewpoint, we derive several efficient and scalable algorithmic tools, such as, working-set strategies and proximal-gradient techniques. With these methods in place, we illustrate on numerous real-world applications from various fields, when and why structured sparsity is useful. This includes, for instance, restoration tasks in image processing, the modelling of text documents as hierarchy of topics, the inter-subject prediction of sizes of objects from fMRI signals, and background-subtraction problems in computer vision.