Abstract : In this thesis, we study different properties of learning from examples with Statistical Mechanics tools and, particularly, with the replica trick. Supervised tasks, corresponding to a binary classification of data, and unsupervised tasks like the parametric estimation of a probability density function, are considered. In the first part, a variational approach allows us to determine the optimal learning performance in the problem of learning an anisotropy direction, and to deduce a cost function which allows to obtain such optimal performance.In the case of the supervised learning of a linearly separable task, numerical simulations, that confirm our theoretical results, allow us to determine finite size effects. In the case of a probability density function composed of a mixture of two Gaussians, the optimal learning performance presents several phase transitions as a function of the size of the data set. These results raise a controversy between the variational theory and the Bayesian approach of the optimal learning. In the second part, we study two different approaches used to learn complex classification tasks. We first consider the one of support vector machines. We study a family of such machines for which linear and quadratic separations are particular cases. The capacity, the typical value of the margin and the number of support vectors, are determined. The second approach is the one of a parity machine trained with an incremental learning algorithm. This algorithm constructs progressively a neural network with one hidden layer. The capacity of this algorithm is found to be close to the capacity of the parity machine.