D. S. Alves, J. Paulus, and J. Fonseca, Drum transcription from multichannel recordings with non-negative matrix factorization, Proceedings European Signal Processing Conference (EUSIPCO), pp.894-898, 2009.

E. Battenberg, Techniques for machine understanding of live drum performances, 2012.

E. Battenberg, V. Huang, and D. Wessel, Live drum separation using probabilistic spectral clustering based on itakura-saïto divergence, Proceedings of Audio Engineering Society Convention on Time-Frequency Processing in Audio (AES), 2012.

J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies et al., A tutorial on onset detection in musical signals, IEEE Transactions on Speech and Audio Processing, vol.13, issue.5, pp.1035-1047, 2005.

J. P. Bello, E. Ravelli, and M. B. Sandler, Drum sound analysis for the manipulation of rythm in drum loops, Proceedings of first MIREX, 2006.

J. P. Bello and R. J. Weiss, Music structure segmentation using shift-invariant probabilistic latent component analysis, 2010.

L. Benaroya, L. Macdonagh, F. Bimbot, and R. Gribonval, Non negative sparse representation for wiener based source separation with a single sensor, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2003.
URL : https://hal.archives-ouvertes.fr/inria-00574784

E. Benetos, R. Badeau, T. Weyde, R. , and G. , Template adaptation for improving automatic music transcription, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01083552

E. Benetos and S. Dixon, A shift-invariant latent variable model for automatic music transcription, Computer Music Journal, vol.36, issue.4, pp.81-94, 2012.

E. Benetos, S. Ewert, and T. Weyde, Automatic transcription of pitched and unpitched sounds from polyphonic music, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.3107-3111, 2014.

M. W. Berry and M. Browne, Email surveillance using non-negative matrix factorization, 2005.

N. Bertin, Les factorisations en matrices non-négatives. Approches contraintes et probabilistes, application à la transcription automatique de la musque polyphonique, 2009.

S. Böck, A. Artz, F. Krebs, and M. Shedl, Online real-time onset detection with recurrent neural networks, Proceedings of the 15th International Conference on Digital Audio Effects (DAFx), 2012.

S. Böck, F. Korzeniowski, J. Schlüter, F. Krebs, and G. Widmer, Madmom : a new python audio and music signal processing library, Late-Breaking Demo Session of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016.

N. Bogaards, A. Röbel, and X. Rodet, Sound analysis and processing with AudioSculpt 2, Proc. Int. Computer Music Conference (ICMC), 2004.
URL : https://hal.archives-ouvertes.fr/hal-01161198

D. Bouvier, N. Obin, M. Liuni, and A. Roebel, A source/filter model with adaptive constraints for nmf-based speech separation, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.131-135, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01294681

Z. Chen, A. Cichocki, and T. M. Rutkowski, Constrained non-negative matrix factorization method for eeg analysis in early detection of alzheimer's disease. Pro, ceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol.5, pp.893-896, 2006.

K. Cho, B. Van-merriënboer, C. Gulcehre, . Bahdanau, F. Bougares et al., Learning phrase representations using rnn encoder-decoder for statistical machine translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1724-1734, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

A. Cichocki, R. Zdunek, and S. Amari, Csiszár's divergences for non-negative matrix factorization : Family of new algorithms, International Conference on Independent Component Analysis and Blind Signal Separation (ICA), pp.32-39, 2006.

C. Ding, T. Li, and W. Peng, On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing, Computational Statistics and Data Analysis, 2008.

C. Dittmar and D. Gärtner, Real-time transcription and separation of drum recordings based on nmf decomposition, Proceedings of the International Conference on Digital Audio Effects (DAFx), pp.187-194, 2014.

C. Dittmar and C. Uhle, Further steps towards drum transcription of polyphonic music, Proceedings of Audio Engineering Society Convention (AES), 2004.

S. Dixon, On the computer recognition of solo piano music. iAustralasian Computer Music Conference, pp.31-37, 2000.

K. Drakakis, S. Rickard, R. De-fréin, and A. Cichocki, Analysis of financial data using non-negative matrix factorization, International Mathematical Forum, 2008.

J. Durrieu, A. Ozerov, C. Fevotte, R. Gaël, D. et al., Main instrument separation from stereophonic audio signals using a source/filter model, EUSIPCO, vol.1, pp.15-19, 2009.

J. Eggert and E. Korner, Sparse coding and nmf, IEEE International Joint Conference on Neural Networks, vol.4, pp.2529-2533, 2004.

A. Elowsson and A. Friberg, Modelling perception of speed in music audio, Proceedings of the Sound and Music Computing Conference, 2013.

V. Emiya, Transcription Automatique de la Musique de Piano, 2008.
URL : https://hal.archives-ouvertes.fr/pastel-00004867

A. Eronen, Musical instrument recognition using ica-based transform of features and discriminatively trained hmms, Proceedings of Intl. Symposium on Signal Processing and its Application (ISSPA), vol.2, pp.133-136, 2003.

J. A. Fessler and A. O. Hero, Space-alternating generalized expectationmaximization algorithm, IEEE Transactions on signal processing, 1994.

C. Fevotte, N. Bertin, and J. Durrieu, Nonnegative matrix factorization with the itakura-saito divergence : With application to music analysis, Neural computation, vol.21, issue.3, pp.793-830, 2009.

C. Févotte and A. T. Cemgil, Nonnegative matrix factorizations as probabilistic inference in composite models, 17th European Signal Processing Conference, pp.1913-1917, 2009.

D. Fitzgerald, Automatic drum transcription and source separation, 2004.

D. Fitzgerald, Harmonic/percussive separation using median filtering, Proceedings of the International Conference on Digital Audio Effects (DAFx), pp.246-253, 2010.

D. Fitzgerald, R. Lawlor, and E. Coyle, Drum transcription in the presence of pitched instruments using prior subspace analysis, Proceedings of Irish Signals and Systems Conference (ISSC), 2003.

D. Fitzgerald, R. Lawlor, and E. Coyle, Prior subspace analysis for drum transcription, Proceedings of Audio Engineering Society Convention (AES), 2003.

D. Fitzgerald, J. Paulus, D. Fitzgerald, and J. Paulus, Unpitched percussion transcription, 2006.

B. Fuentes, L'analyse probabiliste en composantes latentes et ses adaptations aux signaux musicaux. Application à la trancription automatique de la musique et à la séparation de sources, 2013.
URL : https://hal.archives-ouvertes.fr/tel-01337630

N. Gajhede, O. Beck, and H. Purwins, Convolutional neural networks with batch normalization for classifying hi-hat, snare, and bass percussion sound samples, Proceedings of Audio Mostly : A Conference on Interaction with Sound, pp.111-115, 2016.

O. Gillet and G. Richard, Automatic transcription of drum loops, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol.4, pp.269-272, 2004.

O. Gillet and G. Richard, Automatic transcription of drum sequences using audiovisual features, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.205-208, 2005.

O. Gillet and G. Richard, Enst-drums : an etensive audio-visual database for drum signals processing, Proceedings of the 7th International Society for Music Information Retrieval Conference (ISMIR), 2006.

O. Gillet and G. Richard, Supervised and unsupervised sequence modelling for drum transcription, Proceedings of the 8th International Society for Music Information Retrieval Conference (ISMIR), pp.219-224, 2007.

O. Gillet and G. Richard, Transcription and separation of drum signals from polyphonic music, IEEE Transactions on Audio, Speech and Language Processing, vol.16, issue.3, pp.529-540, 2008.
URL : https://hal.archives-ouvertes.fr/hal-02652666

X. Gloriot and Y. Bengio, Understanding the difficulty of feedforward in neural networks, Aistats, vol.9, pp.249-256, 2010.

M. Goto, Rwc music database : Music genre database and musical instrument sound database, Proceedings of the 4th International Society for Music Information Retrieval Conference (ISMIR), pp.229-230, 2003.

M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka, RWC music database : Popular, classical and jazz music databases, Proceedings of the 3rd International Society on Music Information Retrieval Conference (ISMIR), vol.2, pp.287-288, 2002.

F. Gouyon, F. Pachet, and O. Delerue, On the use of zero-crossing rate for an application of classification of percussive osunds, Proceedings of the International Conference on Digital Audio Effects (DAFx), 2000.

G. Grindlay and D. P. Ellis, Multi-voice polyphonic music transcription using eigeninstruments, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009.

D. Guillamet, B. Schile, and J. Vitrià, Color histogram classification using nmf, 2001.

H. Hahn, Expressive sampling synthesis -Learning extended source-filter model from instrument sound database for expressive sample manipulations, 2015.

P. Herrera, A. Dehamel, and F. Gouyon, Automatic labeling of unpitched percussion sounds, Proceedings of Audio Engineering Society Convention (AES), 2003.

P. Herrera, V. Sandvold, and F. Gouyon, Percussion-related semantic descriptors of music audio files, Metadata for Audio (AES), 2004.

P. Herrera, A. Yeterian, and F. Gouyon, Automatic classification of drum sounds : A comparison of feature selection methods and classification techniques, Proceedings of Intl. Conf. on Music and Artificial Intelligence (ICMAI), 2002.

M. D. Hoffman, Approximate maximum a posteriori interference with entropic priors, 2010.

T. Hofmann, Probabilistic latent semantic analysis. Uncertainty in Artificail Intelligence, 1999.

P. O. Hoyer, Non-negative sparse coding, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, pp.557-565, 2002.

P. O. Hoyer, Non-negative matrix factorization with sparseness constraints, 2004.

C. Jacques, A. Aknin, and A. Röbel, Mirex 2018 : Automatic drum transcritpion with convolutional neural networks. MIREX, 2018.

C. Jacques and A. Röbel, Automatic drum transcription with convolutional neural networks, Proceedings of the International Conference on Digital Audio Effects (DAFx), 2018.
URL : https://hal.archives-ouvertes.fr/hal-02018777

C. Jacques and A. Röbel, Data augmentation for onset detection and drum transcription (submitted), 2019.

X. Jaureguiberry, P. Leveau, D. Maller, and J. J. Burred, Adaptation of sourcespecific dictionaries in non-negative matrix factorization for source separation, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00868399

K. Kashino and H. Murase, A sound source identification system for ensemble music based on template adaptation and music stream extraction, 1997.

R. Kelz, M. Dorfer, F. Korzeniowski, S. Böck, A. Artz et al., On the potential of simple framewise approaches to piano transcription, Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016.

S. Kim, Y. N. Rao, D. Ergodmus, C. , S. J. Nicolelis et al., Determining patterns in neural activity for reaching movements using nonnegative matric factorization, EURASIP Journal on Applied Sig. Proc, 2005.

R. Kompass, A generalized divergence measure for non-negative matrix factorization, Neuroinformatics workshop, 2005.

J. Laroche and M. Dolson, Improved phase vocoder time-scale modification of audio, IEEE Transactions on Speech and Audio Processing, vol.7, issue.3, pp.323-332, 1999.

D. Lee and S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol.401, issue.6755, pp.788-791, 1999.

S. Li, X. Hou, H. Zhang, and Q. Cheng, Learning spatially localized, partsbased representation, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol.1, pp.207-212, 2001.

,. Lindsay-smith, S. Mcdonald, and M. Sandler, Drumkit transcription via convolutive nmf, Proceedings of the International Conference on Digital Audio Effects (DAFx), 2012.

W. Liu and K. Yuan, Sparse p-norm nonnegative matrix factorization for clustering gene expression data, International Journal of Data Mining and Bioinformatics, 2008.

M. Marolt, A. Kavcic, and M. Provosnik, Neural network for note onset detection in piano music, Proceedings of the International Computer Music Conference (ICMC), 2002.

R. Marxer and J. Janer, Study of regularizations and constraints in nmf-based drums monaural separation, Proceedings of the International Conference on Digital Audio Effects (DAFx), pp.1-6, 2013.

B. Mcfee, E. Humphrey, and J. Bello, A software framework for musical data augmentation, Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), 2015.

M. Miron, M. E. Davies, and F. Gouyon, Improving the real-time performance of a causal audio drum transcription system, Proceedings of Sound and Music Computing Conference (SMC), pp.402-407, 2013.

M. Miron, M. E. Davies, and F. Gouyon, An open-source drum transcription system for pure data and max msp, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.221-225, 2013.

Y. Mitsufuji, M. Liuni, A. Baker, and A. Roebel, Online separation tensor deconvolution for source detection in 3dtv audio, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2007.

A. Moreau and A. Flexer, Drum transcription in polyphonic music using nonnegative matrix factorisation, Proceedings of the 8th International Society for Music Information Retrieval Conference (ISMIR), pp.353-354, 2007.

M. Nakano, H. Kameoka, J. Le-roux, Y. Kitano, N. Ono et al., Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with ?-divergence. International Workshop on Machine Learning for Signal Processing, Proc. IEEE, vol.10, pp.283-288, 2010.

T. Nakano, M. Goto, J. Ogata, and Y. Hiraga, Voice drummer : a music notation interface of drum sounds using voice percussion inpu, Proceedings of Annual ACM Symposium on User Interface Software and Technology (UIST), pp.49-50, 2005.

P. Paatero and U. Tapper, Positive matrix factorization : A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, vol.5, issue.2, pp.111-126, 1994.

J. Paulus, Signal processing methods for drum transcription and music structure analysis, 2009.

J. Paulus and A. Klapuri, Drum sound detection in polyphonic music with hidden markov models, EURASIP Journal on Audio, 2009.

J. Paulus and T. Virtanen, Drum transcription with non-negative spectrogram factorisation, Proceedings European Signal Processing Conference (EUSIPCO), 2005.

M. Prockup, E. Schmidt, J. Scott, E. , and K. Y. , Toward understanding expressive percussion through content based analysis, Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR), 2013.

A. Röbel, A new approach to transient processing in the phase vocoder, Proceedings of the 6th International Conference on Digital Audio Effects (DAFx), pp.344-349, 2003.

A. Röbel, J. Pons, M. Liuni, and M. Lagrange, On automatic drum transcription using non-negative matrix deconvolution and itakura-saito divergence, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.414-418, 2015.

A. Röbel and X. Rodet, Efficient Spectral Envelope Estimation and its application to pitch shifting and envelope preservation, International Conference on Digital Audio Effects, pp.30-35, 2005.

M. Rossignol, M. Lagrange, G. Lafay, and E. Benetos, Alternate level clustering for drum transcription, Proceedings of European Signal Processing Conf. (EUSIPCO), pp.2023-2027, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01122006

P. Roy, F. Pachet, and S. Krakowski, Improving the classification of percussive sounds with analytical features : A case of study, Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pp.229-232, 2007.

H. Sak, A. W. Senior, and F. Beaufays, Long short-term memory recurrent neural network architecture for large scale acoustic modelling, Proceedings of 15th Annual Conference of the International Speech Communication Association, pp.338-342, 2014.

J. Salamon and J. P. Bello, Deep convolutional neural networks and data augmentation for environmental sound classification, IEEE Signal Processing Letters, 2017.

W. A. Schloss, On the automatic transcription of percussive music -From acoutic signal to high-level analysis, 1985.

J. Schlüter and S. Böck, Improved musical onset detection with convolutional neural networks, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2014.

J. Schlüter and T. Grill, Exploring data augmentation for improved singing voice detection with neural networks, Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), 2015.

S. Scholler and H. Perwins, Sparse coding for drum classification and its use as a similarity measure, Proceedings of International Workshop on Machine Learning and Music (MML), pp.9-12, 2010.

S. Scholler and H. Perwins, Sparse approximations for drum sound classification, Journal of Selected Topics Signal Processing, vol.5, issue.5, pp.933-940, 2011.

J. Schroeter, Mirex 2018 : Drum transcription. MIREX, 2018.

M. Shashanka, B. Raj, and P. Smaragdis, Probabilistic latent variable model as nonnegative factorizations, Computational Intelligence and Neuroscience, 2008.

P. Smaragdis, Nonnegative matrix factor deconvolution ; extraction of multiple sound source from monophonic inputs. Independent Component Analysis and Blind Signal Separation, p.3195, 2004.

P. Smaragdis and J. Brown, Non-negative matrix factorization for polyphonic music transcription, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp.177-180, 2003.

P. Smaragdis and B. Raj, Shift-invariant probabilistic latent component analysis, Journal of Machine Learning Research, 2007.

C. Southall, Mirex 2017 drum transcription submissions, 2018.

C. Southall, N. Jillings, R. Stables, and J. Hockman, Adtweb : An open source browser based automatic drum transcription system, Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), 2017.

C. Southall, R. Stables, and J. Hockman, Automatic drum transcription using bi-directional recurrent neural networks, Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), pp.591-597, 2016.

C. Southall, R. Stables, and J. Hockman, Automatic drum transcription for polyphonic recordings using soft attention mechanisms and convolutional neural networks, Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pp.606-612, 2017.

V. M. Souza, G. E. Batista, and N. E. Souza-filho, Automatic classification of drum sounds with indefinite pitch, Proceedings of Intl. Joint Conf. on Neural Networks (IJCNN), 2015.

E. Springer, D. Fitzgerald, J. Paulus, A. Klapuri, and M. Davy, Unpitched percussion transcription, 2006.

D. V. Steelant, K. Tanthe, S. Degroeve, B. Baets, M. Leman et al., Classification of percussive sounds using support vector machine, Proceedings of Annual Machine Learning Conference of Belgium and The Netherlands (BENELEARN), pp.146-152, 2004.

R. S. Sutton, Two problems with backpropagation and other steepest descent learning procedures for network, Annual Conference of Cognitive Science Society, pp.823-831, 1986.

L. Thompson, S. Dixon, and M. Mauch, Drum transcription via classification of bar-level rhythmic patterns, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), pp.187-192, 2014.

A. Tindale, A. Kapur, G. Tzanetakis, and I. Fujinaga, Retrieval of percussion gestures using timbre classification technique, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), 2004.

G. Tzanetakis, A. Kapur, and R. I. Mcwalter, Subband-based drum transcription for audio signals, Proceedings of Workshop on multimedia Signal Processing, 2005.

E. Vincent, N. Bertin, and R. Badeau, Adaptive harmonic spectral decomposition for multiple pitch estimation, IEEE Transactions on Audio, Speech, and Language Processing, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00544094

T. Virtanen, Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Transactions on Audio, Speech and Language Processing, vol.15, issue.3, pp.1066-1074, 2007.

R. Vogl, M. Dorfer, and P. Knees, Drum transcription from polyphonic music with recurrent neural networks, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2016.

R. Vogl, M. Dorfer, and P. Knees, Recurrent neural networks for drum transcription, Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), pp.730-736, 2016.

R. Vogl, M. Dorfer, G. Widmer, and P. Knees, Drum transcription via joint beat and drum modelling using convolutional recurent neural networks, Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), 2017.

R. Vogl and P. Knees, Mirex submission for drum transcription, 2018.

R. Vogl, G. Widmer, and P. Knee, Towards multi-instrument drum transcription, Proceedings of the International Conference on Digital Audio Effects (DAFx), 2018.

Q. Wang, R. Zhou, Y. , and Y. , A two stage approach to note-level transcription of a specific piano, Applied Science, 2017.

C. Wu, C. Dittmar, C. Southall, R. Vogl, G. Widmer et al., A review of automatic drum transcription, IEEE Transactions on Audio, Speech, and Language Processing, 2017.

C. Wu and A. Lerch, Drum transcription using partially fixed non-negative matrix factorization, Proceedings of European Signal Processing Conference, 2015.

C. Wu and A. Lerch, Drum transcription using partially fixed non-negative matrix factorization with template adaptation, Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), 2015.

C. Wu and A. Lerch, On drum playing technique detection in polyphonic mixtures, Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), pp.218-224, 2016.

C. Wu and A. Lerch, Automatic drum transcription using the student-teacher learniing paradigm with unlabeled music data, Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), 2017.

K. Yoshii, M. Goto, and H. G. Okuno, Automatic drum sound description for real-world music using template adaptation and matching methods, Proceedings of the 5th International Society for Music Information Retrieval Conference (ISMIR), 2004.

K. Yoshii, M. Goto, and H. G. Okuno, Adamast : A drum sound recognizer on adaptation and matching of spectrogram templates. Annual Music Information Retrieval Evaluation eXchange (MIREX), 2005.

K. Yoshii, M. Goto, and H. G. Okuno, Drum sound recognition for polyphonic audio signals by adaptation and matching of spectrogram template with harmonic structure suppression, IEEE Transactions on Audio, vol.15, issue.1, pp.333-345, 2007.

M. Zivanovic, A. Röbel, and X. Rodet, A new approach to spectral peak classification, Proceedings of the 12th European Signal Processing Conference (EU-SIPCO), p.4, 2004.
URL : https://hal.archives-ouvertes.fr/hal-01161188