M. Achab, E. Bacry, S. Gaïffas, I. Mastromatteo, and J. Muzy, Uncovering causality from multivariate hawkes integrated cumulants, International Conference on Machine Learning, pp.1-10, 2017.

P. K. Andersen, O. Borgan, R. D. Gill, and N. Keiding, Statistical models based on counting processes, 2012.

F. Bach, Self-concordant analysis for logistic regression, Electronic Journal of Statistics, vol.4, pp.384-414, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00426227

J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA journal of numerical analysis, vol.8, issue.1, pp.141-148, 1988.

H. H. Bauschke, M. Bolte, and J. Teboulle, A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications, Mathematics of Operations Research, vol.42, issue.2, pp.330-348, 2016.

M. Bertero, P. Boccacci, G. Desiderà, and G. Vicidomini, Image deblurring with poisson data: from cells to galaxies, Inverse Problems, vol.25, issue.12, p.123006, 2009.

C. Blundell, J. Beck, and K. A. Heller, Modelling reciprocating relationships with hawkes processes, Advances in Neural Information Processing Systems, pp.2600-2608, 2012.

E. Bacry, S. Delattre, M. Hoffmann, and J. Muzy, Modelling microstructure noise with mutually exciting point processes, Quantitative Finance, vol.13, issue.1, pp.65-77, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01313995

D. P. Bertsekas, Nonlinear programming. Athena scientific Belmont, 1999.

H. C. Boshuizen, E. Feskens, ;. Beazley, W. Fulton, M. Köppe et al., Swig: Simplified wrapper and interface generator, Epidemiologic Perspectives & Innovations, vol.7, p.4, 1996.

E. Bacry, S. Gaïffas, and J. Muzy, Concentration inequalities for matrix martingales in continuous time, Probability Theory and Related Fields, vol.170, p.525, 2018.

E. Bacry, T. Jaisson, and J. Muzy, Estimation of slowly decreasing hawkes kernels: application to high-frequency order book dynamics, Quantitative Finance, vol.16, issue.8, pp.1179-1201, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01313833

J. Bennett, S. Lanning, ;. Buitinck, G. Louppe, M. Blondel et al., API design for machine learning software: experiences from the scikit-learn project, ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp.108-122, 2007.

P. L. Bartlett and S. Mendelson, Empirical minimization. Probability Theory and Related Fields, vol.135, pp.311-334, 2006.

E. Bacry and J. Muzy, Second order statistics characterization of hawkes processes and non-parametric estimation, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01313834

E. Bacry, I. Mastromatteo, and J. Muzy, Hawkes processes in finance, Market Microstructure and Liquidity, vol.1, issue.01, p.1550005, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01313838

P. J. Bickel, Y. Ritov, and A. B. Tsybakov, Simultaneous analysis of lasso and dantzig selector, The Annals of Statistics, vol.37, issue.4, pp.1705-1732, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00401585

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM journal on imaging sciences, vol.2, issue.1, pp.183-202, 2009.

W. J. Burroughs, Weather cycles: real or imaginary?, 2003.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

A. Cauchy, ;. P. Cortez, A. Cerdeira, F. Almeida, T. Matos et al., Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, Comp. Rend. Sci. Paris, vol.25, issue.4, pp.547-553, 1847.

P. L. Combettes and J. Pesquet, Proximal splitting methods in signal processing, Fixed-point algorithms for inverse problems in science and engineering, pp.185-212, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00643807

Y. Chen, D. Pavlov, and J. F. Canny, Large-scale behavioral targeting, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.209-218, 2009.

R. Crane and D. Sornette, Robust dynamic classes revealed by measuring the response function of a social system, Proceedings of the National Academy of Sciences, vol.105, pp.15649-15653, 2008.

E. J. Candès and T. Tao, Decoding by linear programming, IEEE Transactions on Information Theory, vol.12, issue.51, pp.4203-4215, 2004.

E. J. Candès and T. Tao, The power of convex relaxation: Near-optimal matrix completion, IEEE Transactions on Information Theory, vol.56, issue.5, pp.2053-2080, 2010.

A. Defazio, F. Bach, and S. Lacoste-julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in neural information processing systems, pp.1646-1654, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

C. Dubois, C. Butts, and P. Smyth, Stochastic blockmodeling of relational event dynamics, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pp.238-246, 2013.

M. Deilmann, A guide to vectorization with intel c++ compilers, Intel Corporation, 2012.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011.

M. A. De-menezes and A. Barabási, Fluctuations in network dynamics, Phys. Rev. Lett, vol.92, p.28701, 2004.

N. Daneshmand, M. Rodriguez, L. Song, B. Schölkpof, ;. De et al., Estimating diffusion network structure: Recovery conditions, sample complexity, and a soft-thresholding algorithm, Advances in Neural Information Processing Systems, pp.397-405, 2014.

D. J. Daley and D. Vere-jones, An introduction to the theory of point processes: volume II: general theory and structure, 2007.

P. Embrechts, T. Liniger, and L. Lin, Multivariate hawkes processes: an application to financial data, Journal of Applied Probability, vol.48, pp.367-378, 2011.

V. Filimonov and D. Sornette, Apparent criticality and calibration issues in the hawkes self-excited point process model: application to high-frequency financial data, Quantitative Finance, vol.15, issue.8, pp.1293-1314, 2015.

K. Fernandes, P. Vinagre, P. Cortez-;-m.-farajtabar, Y. Wang, M. G. Rodriguez et al., A proactive intelligent decision support system for predicting the popularity of online news, Advances in Neural Information Processing Systems, pp.1954-1962, 2015.

A. Galves and E. Löcherbach, Modeling networks of spiking neurons as interacting processes with memory of variable length, 2015.

M. Gomez-rodriguez, J. Leskovec, and B. Schölkopf, Modeling information propagation with survival theory, International Conference on Machine Learning, 2013.

J. D. Hamilton, Time series analysis, vol.2, 1994.

A. G. Hawkes, Point spectra of some mutually exciting point processes, Journal of the Royal Statistical Society. Series B (Methodological), pp.438-443, 1971.

A. G. Hawkes, Spectra of some self-exciting and mutually exciting point processes, Biometrika, vol.58, issue.1, pp.83-90, 1971.

S. J. Hardiman, N. Bercot, and J. Bouchaud, Critical reflexivity in financial markets: a hawkes process analysis, The European Physical Journal B, vol.86, issue.10, p.442, 2013.

E. Hazan and H. Luo, Variance-reduced and projection-free stochastic optimization, International Conference on Machine Learning, pp.1263-1271, 2016.

Z. T. Harmany, R. F. Marcia, and R. M. Willett, This is spiral-tap: Sparse poisson intensity reconstruction algorithms-theory and practice, IEEE Transactions on Image Processing, vol.21, issue.3, pp.1084-1096, 2012.

A. G. Hawkes and D. Oakes, A cluster process representation of a selfexciting process, Journal of Applied Probability, vol.11, issue.3, pp.493-503, 1974.

N. R. Hansen, P. Reynaud-bouret, and V. Rivoirard, Lasso and probabilistic inequalities for multivariate point processes, Bernoulli, vol.21, issue.1, pp.83-143, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00722668

C. Hsieh, H. Yu, and I. Dhillon, Passcode: Parallel asynchronous stochastic dual co-ordinate descent, ICML, vol.15, pp.2370-2379, 2015.

T. Iwata, A. Shah, and Z. Ghahramani, Discovering latent influence in online social activities via shared cascade poisson processes, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.266-274, 2013.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems, pp.315-323, 2013.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2015.

V. Koltchinskii, K. Lounici, and A. B. Tsybakov, Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion, The Annals of Statistics, vol.39, issue.5, pp.2302-2329, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00676868

V. Koltchinskii, Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Saint-Flour XXXVIII-2008, vol.2033

. Springer, , 2011.

S. Linderman and R. Adams, Discovering latent network structure in point process data, International Conference on Machine Learning, pp.1413-1421, 2014.

J. Leskovec, L. Backstrom, and J. Kleinberg, Meme-tracking and the dynamics of the news cycle, Proceedings of the 15th ACM SIGKDD, 2009.

J. Leskovec, Dynamics of large networks, Machine Learning Department, 2008.

A. S. Lewis, The convex analysis of unitarily invariant matrix functions, Journal of Convex Analysis, vol.2, issue.1, pp.173-183, 1995.

H. Lu, R. M. Freund, and Y. Nesterov, Relatively smooth convex optimization by first-order methods, and applications, SIAM Journal on Optimization, vol.28, issue.1, pp.333-354, 2018.

M. Lichman, UCI machine learning repository, 2013.

H. Lütkepohl and M. Krätzig, Applied time series econometrics, 2004.

E. Lewis and G. Mohler, A nonparametric em algorithm for multiscale hawkes processes, Journal of Nonparametric Statistics, vol.1, issue.1, pp.1-20, 2011.

E. Lewis, G. Mohler, P. J. Brantingham, and A. L. Bertozzi, Self-exciting point process models of civilian deaths in iraq, Security Journal, vol.25, issue.3, pp.244-264, 2012.

R. Leblond, F. Pedregosa, and S. Lacoste-julien, Asaga: asynchronous parallel saga, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01407833

R. Lemonnier, K. Scaman, A. Kalogeratos-;-m.-lukasik, P. K. Srijith, D. Vu et al., Hawkes processes for continuous time sequence classification: an application to rumour stance classification in twitter, Proceedings of 54th Annual Meeting of the Association for Computational Linguistics, pp.393-398, 2016.

R. Lemonnier and N. Vayatis, Nonparametric markovian learning of triggering kernels for mutually exciting and mutually inhibiting multivariate hawkes processes, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.161-176, 2014.

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, Rcv1: A new benchmark collection for text categorization research, Journal of machine learning research, vol.5, pp.361-397, 2004.

P. Massart, Concentration inequalities and model selection, vol.1896, 2007.

G. Mohler-;-h.-mania, X. Pan, D. Papailiopoulos, B. Recht, K. Ramchandran et al., Modeling and estimation of multi-source clustering in crime and security data, The Annals of Applied Statistics, vol.7, issue.3, pp.2202-2229, 2013.

S. Moro, P. Rita, and J. Coelho, Stripping customers' feedback on hotels through data mining: the case of las vegas strip, Tourism Management Perspectives, vol.23, pp.41-52, 2017.

S. Moro, P. Rita, B. O. Vala-;-g, M. B. Mohler, P. J. Short et al., Predicting social media performance metrics and evaluation of the impact on brand building: A data mining approach, Journal of the American Statistical Association, vol.69, issue.9, pp.3341-3351, 2011.

Y. Nesterov, A method of solving a convex programming problem with convergence rate o (1/k2), Soviet Mathematics Doklady, vol.27, pp.372-376, 1983.

Y. Nesterov, Introductory lectures on convex optimization: A basic course, vol.87, 2013.

Y. Nesterov and A. Nemirovskii, Interior-point polynomial algorithms in convex programming, SIAM, 1994.

J. , Updating quasi-newton matrices with limited storage, vol.35, pp.773-782, 1980.

J. Nocedal and S. J. Wright, Nonlinear Equations, 2006.

Y. Ogata, The asymptotic behaviour of maximum likelihood estimators for stationary point processes, Annals of the Institute of Statistical Mathematics, vol.30, issue.1, pp.243-261, 1978.

Y. Ogata, On lewis' simulation method for point processes, IEEE Transactions on Information Theory, vol.27, issue.1, pp.23-31, 1981.

Y. Ogata, Statistical models for earthquake occurrences and residual analysis for point processes, Journal of the American Statistical association, vol.83, issue.401, pp.9-27, 1988.

Y. Ogata, Space-time point-process models for earthquake occurrences, Annals of the Institute of Statistical Mathematics, vol.50, issue.2, pp.379-402, 1998.

Y. Ogata, Seismicity analysis through point-process modeling: A review, Pure & Applied Geophysics, vol.155, issue.2-4, p.471, 1999.

F. Pedregosa, R. Leblond, S. Lacoste-julien, ;. M. Pino, L. Landesa et al., The generalized forward-backward method for analyzing the scattering from targets on ocean-like rough surfaces, Advances in Neural Information Processing Systems, vol.47, pp.2568-2576, 1999.

R. L. Priol, A. Touati, S. Lacoste-julien, ;. F. Pedregosa, G. Varoquaux et al., Scikitlearn: Machine learning in Python, Adaptive stochastic dual coordinate ascent for conditional random fields, vol.12, pp.2825-2830, 2011.

N. Qian, On the momentum term in gradient descent learning algorithms, Neural networks, vol.12, issue.1, pp.145-151, 1999.

Z. Qu, P. Richtárik, M. Takác, and O. Fercoq, Sdna: Stochastic dual newton ascent for empirical risk minimization, International Conference on Machine Learning, pp.1823-1832, 2016.

M. Rambaldi, E. Bacry, and F. Lillo, The role of volume in order book dynamics: a multivariate hawkes process analysis, Quantitative Finance, vol.17, issue.7, pp.999-1020, 2017.

P. Reynaud-bouret and V. Rivoirard, Near optimal thresholding estimation of a poisson intensity on the real line, Electronic journal of statistics, vol.4, pp.172-238, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00634406

M. Rodriguez, D. Balduzzi, and B. Schölkopf, Uncovering the temporal dynamics of diffusion networks, International Conference on Machine Learning, 2011.

E. Richard, S. Gaïffas, N. Vayatis-;-s.-reddi, A. Hefny, S. Sra et al., On variance reduction in stochastic gradient descent and its asynchronous variants, Advances in Neural Information Processing Systems, pp.2647-2655, 2014.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, vol.323, issue.6088, p.533, 1986.

H. Robbins and S. Monro, A stochastic approximation method. The annals of mathematical statistics, pp.400-407, 1951.

F. Ricci, L. Rokach, and B. Shapira, Introduction to recommender systems handbook, 2011.

B. Recht, C. Re, S. Wright, and F. Niu, Hogwild: A lock-free approach to parallelizing stochastic gradient descent, Advances in neural information processing systems, pp.693-701, 2011.

J. D. Scargle, Studies in astronomical time series analysis. ii-statistical aspects of spectral analysis of unevenly spaced data, The Astrophysical Journal, vol.263, pp.835-853, 1982.

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, vol.162, issue.1-2, pp.83-112, 2017.
URL : https://hal.archives-ouvertes.fr/hal-00860051

S. Shalev-shwartz, Sdca without duality, regularization, and individual convexity, International Conference on Machine Learning, pp.747-754, 2016.

S. Shalev-shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research, vol.14, pp.567-599, 2013.

S. Shalev-shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, International Conference on Machine Learning, pp.64-72, 2014.

T. Sun and Q. Tran-dinh, Generalized self-concordant functions: a recipe for newton-type methods, Mathematical Programming, pp.1-69, 2017.

S. J. Taylor, Modelling financial time series, 2008.

Q. Tran-dinh, A. Kyrillidis, and V. Cevher, Composite self-concordant minimization, The Journal of Machine Learning Research, vol.16, issue.1, pp.371-416, 2015.

L. Tran, M. Farajtabar, L. Song, and H. Zha, Netcodec: Community detection from individual activities, Proceedings of the 2015 SIAM International Conference on Data Mining, pp.91-99, 2015.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), pp.267-288, 1996.

R. Tibshirani, Regression shrinkage and selection via the lasso: a retrospective, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.73, issue.3, pp.273-282, 2011.

C. Tan, S. Ma, Y. Dai, and Y. Qian, Barzilai-borwein step size for stochastic gradient descent, Advances in Neural Information Processing Systems, pp.685-693, 2016.

J. A. Tropp, User-friendly tail bounds for sums of random matrices, Foundations of Computational Mathematics, vol.12, issue.4, pp.389-434, 2012.

R. S. Tsay, Analysis of financial time series, vol.543, 2005.

S. Van-de and . Geer, Empirical Processes in M-estimation, vol.105, 2000.

H. Xu, M. Farajtabar, and H. Zha, Learning granger causality for hawkes processes, International Conference on Machine Learning, pp.1717-1726, 2016.

L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014.

S. Yang and H. Zha, Mixture of mutually exciting processes for viral diffusion, International Conference on Machine Learning, pp.1-9, 2013.

M. D. Zeiler, Adadelta: an adaptive learning rate method, 2012.

P. Zhao and T. Zhang, Stochastic optimization with importance sampling for regularized loss minimization, Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp.1-9, 2015.

K. Zhou, H. Zha, and L. Song, Learning social infectivity in sparse lowrank networks using multi-dimensional hawkes processes, AISTATS, vol.31, pp.641-649, 2013.

K. Zhou, H. Zha, and L. Song, Learning triggering kernels for multidimensional hawkes processes, Proceedings of the 30th International Conference on Machine Learning, vol.28, pp.17-19, 2013.