**Abstract** : We consider the problem of learning the parameters of a structured output prediction model, that is, learning to predict elements of a complex interdependent output space that correspond to a given input. Unlike many of the existing approaches, we focus on the weakly supervised setting, where most (or all) of the training samples have only been partially annotated. Given such a weakly supervised dataset, our goal is to estimate accurate parameters of the model by minimizing the regularized empirical risk, where the risk is measured by a user-specified loss function. This task has previously been addressed by the well-known latent support vector machine (latent SVM) framework. We argue that, while latent SVM offers a computational efficient solution to loss-based weakly supervised learning, it suffers from the following three drawbacks: (i) the optimization problem corresponding to latent SVM is a difference-of-convex program, which is non-convex, and hence susceptible to bad local minimum solutions; (ii) the prediction rule of latent SVM only relies on the most likely value of the latent variables, and not the uncertainty in the latent variable values; and (iii) the loss function used to measure the risk is restricted to be independent of true (unknown) value of the latent variables. We address the the aforementioned drawbacks using three novel contributions. First, inspired by human learning, we design an automatic self-paced learning algorithm for latent SVM, which builds on the intuition that the learner should be presented in the training samples in a meaningful order that facilitates learning: starting frome easy samples and gradually moving to harder samples. Our algorithm simultaneously selects the easy samples and updates the parameters at each iteration by solving a biconvex optimization problem. Second, we propose a new family of LVMs called max-margin min-entropy (M3E) models, which includes latent SVM as a special case. Given an input, an M3E model predicts the output with the smallest corresponding Renyi entropy of generalized distribution, which relies not only on the probability of the output but also the uncertainty of the latent variable values. Third, we propose a novel learning framework for learning with general loss functions that may depend on the latent variables. Specifically, our framework simultaneously estimates two distributions: (i) a conditional distribution to model the uncertainty of the latent variables for a given input-output pair; and (ii) a delta distribution to predict the output and the latent variables for a given input. During learning, we encourage agreement between the two distributions by minimizing a loss-based dissimilarity coefficient. We demonstrate the efficacy of our contributions on standard machine learning applications using publicly available datasets.