, Le son de la talking box est à écouter sur

, D'autres exemples de talking box sont à écouter sur

Y. Bayle, P. Hanna, and M. Et-robine, Classication à grande échelle de morceaux de musique en fonction de la présence de chant, Actes des 23èmes Journées d'Informatique Musicale, p.144152, 2016.

Y. Bayle, P. Hanna, and M. Et-robine, SATIN: A persistent musical database for music information retrieval, Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, p.15, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01570099

A. Bayle, Y. Mar²ík, L. Rusek, M. Robine, M. Hanna et al.,

J. Martinovi£ and J. Et-pokorný, Kara1k: A karaoke dataset for cover song identication and singing voice analysis, Proceedings of the 19th IEEE International Symposium on Multimedia Content-Based Multimedia Indexing, p.18, 2017.

L. Demany, Y. Bayle, E. Puginier, and C. Et-semal, Detecting temporal changes in acoustic scenes: The variable benet of selective attention. Hearing Research, vol.353, p.1725, 2017.

Y. Bayle, M. Robine, and P. Et-hanna, SATIN: A persistent musical database for music information retrieval and a supporting deep learning experiment on song instrumental classication, 2018.

Y. Bayle, P. Hanna, and M. Et-robine, Toward faultless content-based playlists generation for instrumentals, 2018.

B. Abadi, M. Barham, P. Chen, J. Chen, Z. Davis et al., Tensorow: A system for large-scale machine learning, Proceedings of the 12th USENIX Symposium on Operating System Design Implementation, p.265283, 2016.

N. Aizenberg, Y. Koren, and O. Et-somekh, Build your own music recommender by modeling internet radio streams, Proceedings of the 21st International Conference on World Wide Web, p.110, 2012.

A. Aljanaki, Emotion in music: Representation and computational modeling, 2016.

L. Andreou, M. Kashino, and M. Et-chait, The role of temporal regularity in auditory segregation, Hearing research, vol.280, issue.1, p.228235, 2011.

M. Arumugam and M. Et-kaliappan, An ecient approach for segmentation, feature extraction and classication of audio signals, Circuits and Systems, vol.7, issue.4, p.255279, 2016.

J. Aucouturier and F. Et-pachet, Representing musical genre: A state of the art, Journal of New Music Research, vol.32, issue.1, p.8393, 2003.

L. Balkwill and W. F. Et-thompson, A cross-cultural investigation of the perception of emotion in music: Psychophysical and cultural cues, vol.17, p.4364, 1999.

L. Balkwill, W. F. Thompson, and R. Matsunaga, Recognition of emotion in japanese, western, and hindustani music by japanese listeners, 2004.

, Japanese Psychological Research, vol.46, issue.4, p.337349

M. Barone, K. Dacosta, G. Vigliensoni, and M. Et-woolhouse, GRAIL: A music identity space collection and API, Proceedings of the 16th International Society for Music Information Retrieval Conference, p.45, 2015.

M. D. Barone, K. Dacosta, G. Vigliensoni, and M. H. Et-woolhouse, GRAIL: A music metadata identity API, Proceedings of the 17th BIBLIOGRAPHIE International Society for Music Information Retrieval Conference, p.1, 2016.

G. E. Batista, A. L. Bazzan, and M. C. Et-monard, Balancing training data for automated annotation of keywords: A case study, Proceedings of the 2nd Brazilian Workshop on Bioinformatics, p.1018, 2003.

G. E. Batista, R. C. Prati, and M. C. Et-monard, A study of the behavior of several methods for balancing machine learning training data, 2004.

, ACM Sigkdd Explorations Newsletter, vol.6, issue.1, p.2029

Y. Bayle, P. Hanna, and M. Et-robine, Classication à grande échelle de morceaux de musique en fonction de la présence de chant, Actes des 23èmes Journées d'Informatique Musicale, p.144152, 2016.

I. Florence,

Y. Bayle, L. Mar²ík, M. Rusek, M. Robine, P. Hanna et al.,

Y. Bayle, M. Robine, and P. Et-hanna, SATIN: A persistent musical database for music information retrieval and a supporting deep learning experiment on song instrumental classication. Multimedia Tools and Applications, 2018.

Y. Bayle, M. Robine, and P. Et-hanna, Toward faultless content-based playlists generation for instrumentals, 2018.

J. Bekios-calfa, J. M. Buenaposada, and L. Et-baumela, Revisiting linear discriminant techniques in gender recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.4, p.858864, 2011.

Y. Bengio, P. Simard, and P. Et-frasconi, Learning long-term dependencies with gradient descent is dicult, IEEE Transactions on Neural Networks, vol.5, issue.2, p.157166, 1994.

A. Berenzweig, B. Logan, D. P. Ellis, and B. Et-whitman, A large-scale evaluation of acoustic and subjective music-similarity measures, Computer Music Journal, vol.28, issue.2, p.6376, 2004.

J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Et-kégl, Aggregate features and AdaBoost for music classication, Springer Journal on Machine learning, vol.65, issue.2-3, p.473484, 2006.

J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Et-kégl, Meta-features and AdaBoost for music classication, Machine Learning Journal: Special Issue on Machine Learning in Music, p.128, 2006.

T. Bertin-mahieux, D. Eck, and M. I. Et-mandel, Automatic tagging of audio: The state-of-the-art, Machine Audition: Principles Algorithms and Systems, chapitre 14, p.334352, 2010.

T. Bertin-mahieux, D. P. Ellis, B. Whitman, and P. Et-lamere, The million song dataset, Proceedings of the 12th International Society for Music Information Retrieval Conference, p.591596, 2011.

J. Bispham, Rhythm in music: What is it ? Who has it ? And why ? Music Perception, An Interdisciplinary Journal, vol.24, issue.2, p.125134, 2006.

R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam et al.,

J. P. Bello, MedleyDB: A multitrack dataset for annotation-intensive MIR research, Proceedings of the 15th International Society for Music Information Retrieval Conference, p.155160, 2014.

D. Bogdanov, J. Serrà, N. Wack, P. Herrera, and X. Et-serra, Unifying low-level and high-level music similarity measures, IEEE Transactions on Multimedia, vol.13, issue.4, p.687701, 2011.

D. Bogdanov, N. Wack, E. Gómez, S. Gulati, P. Herrera et al., Essentia: An audio analysis library for music information retrieval, Proceedings of the 14th International Society for Music Information Retrieval Conference, p.493498, 2013.

C. Bonnet, Manuel pratique de psychophysique. A. Colin, 1986.

G. Bonnin and D. Et-jannach, Automated generation of music playlists: Survey and experiments, ACM Computing Surveys, vol.47, issue.2, p.135, 2014.

P. Branco, L. Torgo, and R. P. Et-ribeiro, A survey of predictive modeling on imbalanced domains, ACM Computing Surveys, vol.49, issue.2, p.150, 2016.

J. S. Breese, D. Heckerman, and C. Et-kadie, Empirical analysis of predictive algorithms for collaborative ltering, Proceedings of the 14th, 1998.

, Conference on Uncertainty in Articial Intelligence, p.4352

L. Breiman, Random forests. Machine learning, vol.45, p.532, 2001.

L. Breiman, J. Friedman, C. J. Stone, and R. A. Et-olshen, Classication and regression trees, 1984.

J. Bu, S. Tan, C. Chen, C. Wang, H. Wu et al., Music recommendation by unied hypergraph: Combining social media information and music content, Proceedings of the 18th ACM International Conference on Multimedia, p.391400, 2010.

S. Carcagno, C. Semal, and L. Et-demany, Frequency-shift detectors bind binaural as well as monaural frequency representations, Journal of Experimental Psychology: Human Perception and Performance, vol.37, issue.6, p.19761987, 2011.

M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes et al., Content-based music information retrieval: Current directions and future challenges, Proceedings of the IEEE, vol.96, issue.4, p.668696, 2008.

Ò. Celma, M. Ramírez, and P. Et-herrera, Foang the music: A music recommendation system based on RSS feeds and user preferences, Proceedings of the 6th International Conference on Music Information Retrieval, p.457464, 2005.

C. Yeh, T. Fan, Z. Chen, H. Su, L. Yang et al.,

R. Jang, Vocal activity informed singing voice separation with the iKala dataset, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, p.718722, 2015.

P. Y. Chau, S. Y. Ho, K. K. Ho, and Y. Et-yao, Examining the eects of malfunctioning personalized services on online users' distrust and behaviors. Decision Support Systems, vol.56, p.180191, 2013.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Et-kegelmeyer, Smote: Synthetic minority over-sampling technique, Journal of Articial Intelligence Research, vol.16, p.321357, 2002.

S. Chen, J. L. Moore, D. Turnbull, and T. Et-joachims, Playlist prediction via metric embedding, Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining, p.714722, 2012.

K. Cho, B. Van-merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, Proceedings of the 19th Conference on Empirical Methods in Natural Language Processing, p.17241734, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

K. Choi, G. Fazekas, M. Sandler, and K. Et-cho, Convolutional Recurrent Neural Networks for Music Classication, 2016.

K. Choi, G. Fazekas, K. Cho, and M. B. Et-sandler, A comparison on audio signal preprocessing methods for deep neural networks on music tagging, 2017.

K. Choi, G. Fazekas, K. Cho, and M. B. Et-sandler, A tutorial on deep learning for music information retrieval, 2017.

K. Choi, G. Fazekas, and M. B. Et-sandler, Automatic tagging using deep convolutional neural networks, Proceedings of the 17th International Society for Music Information Retrieval Conference, p.805811, 2016.

K. Choi, G. Fazekas, M. B. Sandler, and K. Et-cho, Transfer learning for music classication and regression tasks, Proceedings of the 18th International Society for Music Information Retrieval Conference, p.141149, 2017.

C. Suzhou,

K. Choi, T. U. Kingdom, M. Sandler, and T. U. Et-kingdom, Understanding Music Playlists, 2015.

F. Chollet, Keras: Deep learning library for theano and tensorow, 2015.

V. Chudá£ek, G. Georgoulas, L. Lhotská, C. Stylios, M. Petrík et al., Examining cross-database global training to evaluate ve dierent methods for ventricular beat classication. Physiological measurement, vol.30, p.661677, 2009.

A. Coates, A. Ng, and H. Et-lee, An analysis of single-layer networks in unsupervised feature learning, Proceedings of the 14th International Conference on Articial Intelligence and Statistics, p.215223, 2011.

N. J. Conard, M. Malina, and S. C. Et-münzel, New utes document the earliest musical tradition in southwestern germany, Nature, vol.460, issue.7256, p.737740, 2009.

F. C. Constantino, L. Pinggera, S. Paranamana, M. Kashino, and M. Et-chait, Detection of appearing and disappearing objects in complex acoustic scenes, PLoS One, vol.7, issue.9, p.46167, 2012.

C. Cortes and V. Et-vapnik, Support-vector networks, Machine learning, vol.20, issue.3, p.273297, 1995.

T. Cover and P. E. Hart, Nearest neighbor pattern classication, IEEE Transactions on Information Theory, vol.13, issue.1, p.2127, 1967.

A. J. Craft, G. A. Wiggins, and T. Crawford, How many beans make ve ? The consensus problem in music-genre classication and a new evaluation method for single-genre categorisation systems, Proceedings of the 8th International Conference on Music Information Retrieval, p.7376, 2007.

A. De-cheveigné, Harmonic fusion and pitch shifts of mistuned partials, The Journal of the Acoustical Society of America, vol.102, issue.2, pp.1083-1087, 1997.

J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin et al., Large scale distributed deep networks, Proceedings of the 26th Conference on the Advances in Neural Information Processing Systems, p.12231231, 2012.

M. Defferrard, K. Benzi, P. Vandergheynst, and X. Et-bresson, FMA: A dataset for music analysis, Proceedings of the 18th International Society for Music Information Retrieval Conference, p.316323, 2017.

C. Suzhou,

L. Demany, Y. Bayle, E. Puginier, and C. Et-semal, Detecting temporal changes in acoustic scenes: The variable benet of selective attention, Hearing Research, p.1725, 2017.

L. Demany and C. Et-ramos, On the binding of successive sounds: Perceiving shifts in nonperceived pitches, The Journal of the Acoustical Society of America, vol.117, issue.2, p.833841, 2005.

L. Demany, C. Semal, and D. Et-pressnitzer, Implicit versus explicit frequency comparisons: Two mechanisms of auditory change detection, vol.37, p.597606, 2011.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p.248255, 2009.

F. L. Miami and U. ,

S. Dieleman, P. Brakel, and B. Et-schrauwen, Audio-based music classication with a pretrained convolutional network, Proceedings of the 12th International Society for Music Information Retrieval Conference, p.669674, 2011.

S. Dieleman and B. Et-schrauwen, End-to-end learning for music audio, 2014.

T. G. Dietterich, Approximate statistical tests for comparing supervised classication learning algorithms, Neural computation, vol.10, issue.7, p.18951923, 1998.

F. Doshi-velez and B. Et-kim, Towards a rigorous science of interpretable machine learning, 2017.

C. Drummond, Replicability is not reproducibility: nor is it good science, Proceedings of the Evaluation Methods for Machine Learning Workshop at the 26th ICML Conference, p.14, 2009.

D. Eck, P. Lamere, T. Bertin-mahieux, and S. Et-green, Automatic generation of social tags for music recommendation, Proceedings of the 21st Conference on Advances in Neural Information Processing Systems, p.385392, 2007.

B. Efron, Bootstrap methods: Another look at the jackknife, The Annals of Statistics, vol.7, p.126, 1979.

H. Eghbal-zadeh, B. Lehner, M. Dorfer, and G. Et-widmer, Cp-jku submissions for dcase-2016: A hybrid approach using binaural i-vectors and deep convolutional neural networks, IEEE AASP Challenge on Detection and Classication of Acoustic Scenes and Events, 2016.

H. Eghbal-zadeh and G. Et-widmer, Noise robust music artist recognition using I-Vector features, Proceedings of the 17th International Society for Music Information Retrieval Conference, p.709715, 2016.

R. H. Ehmer, Masking by tones vs noise bands, The Journal of the Acoustical Society of America, vol.31, issue.9, p.12531256, 1959.
DOI : 10.1121/1.1907853

C. Elkan, The foundations of cost-sensitive learning, Proceedings of the 17th International Joint Conference on Articial Intelligence, p.973978, 2001.

D. P. Ellis and C. V. Et-cotton, LabROSA cover song detection system, 2007.

D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent et al.,

S. Bengio, Why does unsupervised pre-training help deep learning, Journal of Machine Learning Research, vol.11, p.625660, 2010.

A. Eronen and A. Et-klapuri, Musical instrument recognition using cepstral coecients and temporal features, Proceedings of the 24th IEEE International Conference on Acoustics, Speech, and Signal Processing, p.753756, 2000.

Z. Fan, T. S. Chan, and Y. R. Yang, Music signal processing using vector product neural networks, Proceedings of the 1st International Workshop on Deep Learning for Music, p.2630, 2017.

C. Fernández, I. Huerta, and A. Et-prati, A comparative evaluation of regression learning algorithms for facial age estimation. Dans Face and Facial Expression Recognition from Real World Videos, p.133144, 2015.

B. Fields, Contextualize your listening: The playlist as recommendation engine, 2011.

M. A. Fischler and R. C. Et-bolles, Random sample consensus: A paradigm for model tting with application to image analysis and automated cartography, Communications of the ACM, vol.24, issue.6, p.381395, 1981.

A. Flexer, A closer look on artist lters for musical genre classication, 2007.

, Dans Proceedings of the 8th International Conference on Music Information Retrieval, p.341344

A. Flexer and D. Et-schnitzer, Album and artist eects for audio similarity at the scale of the web, Proceedings of 6th Sound and Music Computing, p.5964, 2009.

A. Flexer and D. Et-schnitzer, Eects of album and artist lters in audio similarity computed for very large music databases, Computer Music Journal, vol.34, issue.3, p.2028, 2010.

J. T. Foote, Multimedia Storage and Archiving Systems II, tome 3229, p.138148, 1997.

C. Formby, Dierential sensitivity to tonal frequency and to the rate of amplitude modulation of broadband noise by normally hearing listeners, The Journal of the Acoustical Society of America, vol.78, issue.1, p.7077, 1985.

E. Frank, M. A. Hall, I. H. Et-witten, W. The, and . Workbench, , 2016.

, Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques

Y. Freund and R. E. Et-schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of computer and system sciences, vol.55, issue.1, p.119139, 1997.

T. Fritz, S. Jentschke, N. Gosselin, D. Sammler, I. Peretz et al., Universal recognition of three basic emotions in music, Current biology, vol.19, issue.7, p.573576, 2009.

Z. Fu, G. Lu, K. M. Ting, and D. Et-zhang, A Survey of Audio-Based Music Classication and Annotation, IEEE Transactions on Multimedia, vol.13, issue.2, p.303319, 2011.

S. Furui, Speaker-independent isolated word recognition based on emphasized spectral dynamics, Proceedings of the 11th IEEE International Conference on Acoustics, Speech, and Signal Processing, 1986.

J. Futrelle and J. S. Et-downie, Interdisciplinary communities and research issues in music information retrieval, Proceedings of the 3rd International Society for Music Information Retrieval Conference, pp.215-221, 2002.
DOI : 10.1076/jnmr.32.2.121.16740

J. Futrelle and J. S. Et-downie, Interdisciplinary research issues in music information retrieval: Ismir, Journal of New Music Research, vol.32, issue.2, p.121131, 2003.

A. Gabrielsson and E. Et-lindström, Music and emotion: Theory and research, 2001.

S. K. Gaikwad, B. W. Gawali, and P. Et-yannawar, A review on speech recognition technique, International Journal of Computer Applications, vol.10, issue.3, p.1624, 2010.

J. F. Gemmeke, D. P. Ellis, D. Freedman, A. Jansen, W. Lawrence et al., Audio Set: An ontology and human-labeled dataset for audio events, Proceedings of the 42nd IEEE International Conference on Acoustics Speech and Signal Processing, p.15, 2017.

A. Ghosal, R. Chakraborty, B. C. Dhara, and S. K. Saha, A hierarchical approach for speech-instrumental-song classication, SpringerPlus, vol.2, issue.526, p.111, 2013.
DOI : 10.1186/2193-1801-2-526
URL : https://springerplus.springeropen.com/track/pdf/10.1186/2193-1801-2-526

G. Lorenzi, C. Ashburner, J. Wable, J. Johnsrude, I. Frackowiak et al., Representation of the temporal BIBLIOGRAPHIE envelope of sounds in the human brain, Journal of Neurophysiology, vol.84, issue.3, p.15881598, 2000.

X. Glorot, A. Bordes, and Y. Et-bengio, Deep sparse rectier neural networks, Proceedings of the 14th International Conference on Articial Intelligence and Statistics, p.315323, 2011.

M. Goto, H. Hashiguchi, T. Nishimura, and R. Et-oka, RWC music database: Popular, classical and jazz music databases, Proceedings of the 3rd International Conference on Music Information Retrieval, p.287288, 2002.

F. Gouyon and S. Et-dixon, Dance music classication: A tempo-based approach, Proceedings of the 5th International Society for Music Information Retrieval Conference, p.501504, 2004.

F. Gouyon, B. L. Sturm, J. L. Oliveira, N. Hespanhol, and T. Et-langlois, On evaluation validity in music autotagging, 2014.

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann et al., The WEKA data mining software, vol.11, p.1018, 2009.

H. Han and W. Wang, Borderline-smote: A new oversampling method in imbalanced data sets learning, Proceedings of the 1st International Conference on Intelligent Computing, p.878887, 2005.

C. Hefei,

P. E. Hart, The condensed nearest neighbor rule, IEEE Transactions on Information Theory, vol.14, issue.3, p.515516, 1968.

T. Hastie and R. Et-tibshirani, Discriminant adaptive nearest neighbor classication, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.18, issue.6, p.607616, 1996.

H. He, Y. Bai, E. A. Garcia, and S. Li, Adasyn: Adaptive synthetic sampling approach for imbalanced learning, Proceedings of the 5th IEEE International Joint Conference on Neural Networks, p.13221328, 2008.

H. Kong and C. ,

P. Herrera, A. Dehamel, and F. Et-gouyon, Automatic labeling of unpitched percussion sounds, Proceedings of the 114th Audio Engineering Society Convention, p.114, 2003.

N. Hespanhol, Using Autotagging for Classication of Vocals in Music Signals, 2013.

B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Et-tikk, Sessionbased recommendations with recurrent neural networks, Proceedings of the 4th International Conference on Learning Representations, vol.1, p.10, 2016.

G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Et-salakhutdinov, Improving neural networks by preventing coadaptation of feature detectors, 2012.

K. Hoashi, K. Matsumoto, and N. Et-inoue, Personalization of user proles for content-based music retrieval based on relevance feedback, Proceedings of the 11th ACM International Conference on Multimedia, p.110119, 2003.

S. Hochreiter and J. Et-schmidhuber, Long short-term memory, Neural Computation, vol.9, issue.8, p.17351780, 1997.

C. R. Hsu, On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset, IEEE Transactions on Audio, Speech, and Language Processing, vol.18, issue.2, p.310319, 2010.

X. Hu, J. S. Downie, C. Laurier, M. Bay, and A. F. Et-ehmann, The 2007 mirex audio mood classication task: Lessons learned, Proceedings of the 9th International Conference on Music Information Retrieval, p.462467, 2008.

K. Hua, Modeling singing F0 with neural network driven transitionsustain models, 2018.

C. Huang, Y. Li, C. Loy, C. Et-tang, and X. , Learning deep representation for imbalanced classication, Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, p.53755384, 2016.

D. Huron, Perceptual and cognitive applications in music information retrieval, Proceedings of the 1st International Symposium on Music Information Retrieval, p.12, 2000.

D. Huron, Is music an evolutionary adaptation ?, Annals of the New York Academy of sciences, vol.930, issue.1, p.4361, 2001.

S. Ikeda, K. Oku, and K. Et-kawagoe, Music playlist recommendation using acoustic-feature transition inside the songs, Proceedings of the 15th International Conference on Advances in Mobile Computing and Multimedia, p.216219, 2017.

C. Inskip, A. Macfarlane, and P. Et-rafferty, Towards the disintermediation of creative music search: Analysing queries to determine important facets, International Journal on Digital Libraries, vol.12, issue.2, p.137147, 2012.

S. Ioffe and C. Et-szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on Machine Learning, p.448456, 2015.

A. Jansson, E. J. Humphrey, N. Montecchio, R. Bittner, A. Kumar et al., Singing voice separation with deep u-net convolutional networks, Proceedings of the 18th International Society for Music Information Retrieval Conference, p.745751, 2017.

R. Jäschke, L. Marinho, A. Hotho, L. Schmidt-thieme, and G. Et-stumme, Tag recommendations in folksonomies, Proceedings of the 11th, 2007.

, European Conference on Principles and Practice of Knowledge Discovery in Databases, p.506514

B. Jeon, C. Kim, A. Kim, D. Kim, J. Park et al., Music emotion recognition via end-to-end multimodal neural networks, Proceedings of the 11th ACM Conference on Recommender Systems, p.12, 2017.

M. R. Jones, G. Kidd, and R. Et-wetzel, Evidence for rhythmic attention, Journal of Experimental Psychology: Human Perception and Performance, vol.7, issue.5, p.10591073, 1981.

M. R. Jones, H. Moynihan, N. Mackenzie, and J. Et-puente, Temporal aspects of stimulus-driven attending in dynamic arrays, Psychological science, vol.13, issue.4, p.313319, 2002.

C. Kereliuk, B. L. Sturm, and J. Et-larsen, Deep learning and music adversaries, IEEE Transactions on Multimedia, vol.17, issue.11, p.20592071, 2015.

N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. Et-tang, On large-batch training for deep learning: Generalization gap and sharp minima, Proceedings of the 5th International Conference on Learning Representations, p.116, 2016.

J. Keuper and F. Et-preundt, Distributed training of deep neural networks: Theoretical and practical limits of parallel scalability, Proceedings of the 1st Workshop on Machine Learning in High Performance Computing Environments, p.1926, 2016.

S. Kim, E. Unal, and S. Et-narayanan, Music ngerprint extraction for classical music cover song identication, Proceedings of the IEEE International Conference on Multimedia and Expo, p.12611264, 2008.

S. C. Kleene, Representation of events in nerve nets and nite automata. Rapport technique, Rand project air force, 1951.

P. Knees, T. Pohle, M. Schedl, and G. Et-widmer, A music search engine built upon audio-based and web-based similarity measures, Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p.447454, 2007.

S. Kotsiantis, D. Kanellopoulos, and P. Et-pintelas, Handling imbalanced datasets: A review, GESTS International Transactions on Computer Science and Engineering, vol.30, issue.1, p.2536, 2006.

S. B. Kotsiantis, I. Zaharakis, and P. Et-pintelas, Supervised machine learning: A review of classication techniques. Emerging articial intelligence applications in computer engineering, vol.160, p.324, 2007.

A. Krizhevsky and G. Et-hinton, Learning multiple layers of features from tiny images, 2009.

A. Krizhevsky, I. Sutskever, and G. E. Et-hinton, Imagenet classication with deep convolutional neural networks, Proceedings of the 26th, 2012.

, Annual Conference on Advances in Neural Information Processing Systems, p.10971105

M. Kubat and S. Et-matwin, Addressing the curse of imbalanced training sets: One-sided selection, Proceedings of the 14th International Conference on Machine Learning, p.179186, 1997.

S. Kum and J. Et-nam, Classication-based singing melody extraction using deep convolutional neural networks. Rapport technique, Music and Audio Computing Lab, 2017.

T. Langlois and G. Et-marques, A music classication method based on timbral features, Proceedings of the 10th International Society for Music Information Retrieval Conference, p.8186, 2009.

J. Laurikkala, Improving identication of dicult small classes by balancing class distribution, Proceedings of the 8th Conference on, 2001.

, Articial Intelligence in Medicine, p.6366

E. Law, K. West, M. I. Mandel, M. Bay, and J. S. Et-downie, Evaluation of algorithms using games: The case of music tagging, Proceedings of the 10th International Society for Music Information Retrieval Conference, p.387392, 2009.

E. L. Law, L. Von-ahn, R. B. Dannenberg, and M. Et-crawford, Tagatune : A game for music and sound annotation, Proceedings of the 8th International Conference on Music Information Retrieval, pp.361-364, 2007.

Y. Lecun, L. Bottou, Y. Bengio, and P. Et-haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, p.22782324, 1998.

Y. Lecun, L. D. Jackel, L. Bottou, A. Brunot, C. Cortes et al., Comparison of learning algorithms for handwritten digit recognition, Proceedings of the 1st International Conference on Articial Neural Networks, p.5360, 1995.

H. Lee, P. Pham, Y. Largman, and A. Y. Et-ng, Unsupervised feature learning for audio classication using convolutional deep belief networks, 2009.

, Dans Proceedings of the 23rd Conference on Advances in Neural Information Processing Systems, p.10961104

S. Leglaive, R. Hennequin, and R. Et-badeau, Singing voice detection with deep recurrent neural networks, Proceedings of the 40th IEEE International Conference on Acoustics Speech and Signal Processing, p.121125, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01110035

B. Lehner and G. Et-widmer, Monaural blind source separation in the context of vocal detection, Proceedings of the 16th International Society for Music Information Retrieval Conference, p.309315, 2015.

B. Lehner, G. Widmer, and R. Et-sonnleitner, On the reduction of false positives in singing voice detection, Proceedings of the 39th IEEE International Conference on Acoustics Speech and Signal Processing, p.74807484, 2014.

B. Lehner, G. Widmer, and S. Et-böck, A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks, Proceedings ot the 23rd European Signal Processing Conference, p.2125, 2015.

G. Lemaître, F. Nogueira, and C. K. Et-aridas, Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning, Journal of Machine Learning Research, vol.18, issue.17, p.15, 2017.

B. Lerch and A. , An introduction to audio content analysis: Applications in signal processing and music informatics, 2012.

C. Lesimple, C. Sankey, M. Richard, and M. Et-hausberger, Do horses expect humans to solve their problems ? Frontiers in psychology, vol.3, p.306309, 2012.

C. A. Levitan, Y. A. Ban, N. R. Stiles, and S. Et-shimojo, Rate perception adapts across the senses: Evidence for a unied timing mechanism, Scientic reports, vol.5, p.88578862, 2015.

M. Levy and M. B. Sandler, A semantic space for music derived from social tags, Proceedings of the 8th International Conference on Music Information Retrieval, p.411416, 2007.

T. Li and M. Et-ogihara, Detecting emotion in music, Proceedings of the 4th International Conference on Music Information Retrieval, p.12, 2003.

H. G. Liddell and R. Scott, A greek-english lexicon, 1996.

T. Lidy and A. Schindler, CQT-based convolutional neural networks for audio scene classication and Domestic Audio Tagging, Proceedings of the IEEE Audio and Acoustic Signal Processing Challenge Workshop on Detection and Classication of Acoustic Scenes and Events, p.6064, 2016.

Y. Lin, Y. Yang, and H. H. Et-chen, Exploiting online music tags for music emotion classication, ACM Transactions on Multimedia Computing, vol.7, issue.1, p.2640, 2011.

G. Litjens, T. Kooi, B. E. Bejnordi, A. A. Setio, F. Ciompi et al., A survey on deep learning in medical image analysis, Medical image analysis, p.6088, 2017.

A. Liutkus, D. Fitzgerald, Z. Rafii, B. Pardo, and L. Et-daudet, Kernel additive models for source separation, IEEE Transactions on Signal Processing, vol.62, issue.16, p.42984310, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01011044

A. Livshin and X. Et-rodet, The importance of cross database evaluation in sound classication, Proceedings of the 4th International Conference On Music Information Retrieval, p.12, 2003.

A. Livshin and X. Et-rodet, Purging musical instrument sample databases using automatic musical instrument recognition methods, IEEE Transactions on Audio, Speech, and Language Processing, vol.17, p.10461051, 2009.
URL : https://hal.archives-ouvertes.fr/hal-01161417

M. Llamedo, A. Khawaja, and J. P. Et-martinez, Cross-database evaluation of a multilead heartbeat classier, IEEE Transactions on Information Technology in Biomedecine, vol.16, issue.4, p.658664, 2012.

B. Logan, Content-based playlist generation: Exploratory experiments, 2002.

, Dans Proceedings of the 3rd International Conference on Music Information Retrieval, p.67

J. Long, E. Shelhamer, and T. Et-darrell, Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p.34313440, 2015.

R. Longadge and S. Et-dongre, Class imbalance problem in data mining review, International Journal of Computer Science and Network, vol.2, p.16, 2013.

L. Lu, D. Liu, and H. J. Et-zhang, Automatic mood detection and tracking of music audio signals, IEEE Transactions on Audio Speech and Language Processing, vol.14, issue.1, p.518, 2006.

H. Lukashevich, M. Gruhne, and C. Et-dittmar, Eective singing voice detection in popular music using arma ltering, Proceedings of the 10th International Workshop on Digital Audio Eects, p.165168, 2007.

N. A. Macmillan and C. D. Et-creelman, Detection theory: A user's guide, 2004.

M. I. Mandel and D. P. Ellis, Multiple-instance learning for music information retrieval, Proceedings of the 9th International Conference on Music Information Retrieval, p.577582, 2008.

I. Mani and J. Et-zhang, knn approach to unbalanced data distributions: A case study involving information extraction, Proceedings of the Workshop on Learning from Imbalanced Data Sets, p.17, 2003.

M. Markaki, A. Holzapfel, and Y. Et-stylianou, Singing voice detection using modulation frequency features, Proceedings of the Workshop on Statistical and Perceptual Audition at the 9th Annual Conference of the International Speech Communication Association, p.710, 2008.

G. Marques, M. A. Domingues, T. Langlois, and F. Et-gouyon, Three current issues in music autotagging, Proceedings of the 12th International Society for Music Information Retrieval Conference, p.795800, 2011.

F. L. Miami and U. ,

M. Baro and H. , A deep learning approach to source separation and remixing of hiphop music, 2017.

R. Martn, R. A. Mollineda, and V. Garca, Melodic track identication in midi les considering the imbalanced context, Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis, p.489496, 2009.

L. Mar²ík, harmony-analyser.org-Java Library and Tools for Chordal Analysis, Proceedings of 2016 Joint WOCMAT-IRCAM Forum Conference, p.3843, 2016.

B. Mathieu, S. Essid, T. Fillon, J. Prado, and G. Et-richard, YAAFE, an easy to use and ecient audio feature extraction software, Proceedings of the 11th International Society for Music Information Retrieval Conference, p.441446, 2010.

M. Mauch and S. Et-dixon, Approximate Note Transcription for the Improved Identication of Dicult Chords, Proceedings of the 11th International Society for Music Information Retrieval Conference, p.135140, 2010.

M. Mauch, C. Cannam, R. Bittner, G. Fazekas, J. Salamon et al., Computer-aided melody note transcription using the tony software: Accuracy and eciency, Proceedings of the 1st International Conference on Technologies for Music Notation and Representation, p.18, 2015.

A. M. Mayer, Researches in acoustics. The London, Edinburgh, and Dublin Philosophical Magazine, Journal of Science, vol.37, issue.226, pp.259-288, 1894.
DOI : 10.2475/ajs.s3-8.47.362

A. Mcafee, E. Brynjolfsson, T. H. Davenport, D. J. Patil, and D. Et-barton, Big data: The management revolution, Harvard business review, vol.90, issue.10, p.6068, 2012.

W. S. Mcculloch and W. H. Et-pitts, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics, vol.5, issue.4, p.115133, 1943.

B. Mcfee, E. J. Humphrey, and J. P. Bello, A software framework for musical data augmentation, Proceedings of the 16th International Society for Music Information Retrieval Conference, p.248254, 2015.

B. Mcfee, C. Raffel, D. Liang, D. P. Ellis, M. Mcvicar et al., Librosa: Audio and music signal analysis in python, Proceedings of the 14th Python in Science Conference, p.1825, 2015.

M. Mcvicar, R. Santos-rodriguez, and T. Et-de-bie, Learning to separate vocals from polyphonic mixtures via ensemble methods and structured output prediction, Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, p.450454, 2016.

F. Medhat, D. Chesmore, and J. Et-robinson, Music genre classication using masked conditional neural networks, Proceedings of the 1st International Conference on Neural Information Processing, p.470481, 2017.
DOI : 10.1007/978-3-319-70096-0_49
URL : http://eprints.whiterose.ac.uk/143168/1/1802.06432v1.pdf

A. Mencattini, E. Martinelli, G. Costantini, M. Todisco, and B. Basile,

M. Bozzali and C. Et-di-natale, Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure. Knowledge-Based Systems, vol.63, p.6881, 2014.

O. C. Meyers, A mood-based music classication and exploration system, 2007.

S. I. Mimilakis, K. Drossos, J. F. Santos, G. Schuller, T. Virtanen et al., Monaural singing voice separation with skip-ltering connections and recurrent inference of time-frequency mask, 2017.

M. Miron, J. Mestres, J. Et-gómez-gutiérrez, and E. , Generating data to train convolutional neural networks for classical music source separation, Proceedings of the 14th Sound and Music Computing Conference, p.227234, 2017.

D. Moffat, D. Ronan, and J. D. Et-reiss, An evaluation of audio feature extraction toolboxes, Proc of the 18th Int. Conference on Digital Audio Eects, p.17, 2015.

B. C. Moore, An introduction to the psychology of hearing, 2012.

B. C. Moore and M. J. Shailer, Modulation discrimination interference and auditory grouping, Philosophical Transactions of the Royal Society of London for Biology, vol.336, p.339346, 1278.
DOI : 10.1098/rstb.1992.0067

R. Munkong and B. Et-juang, Auditory perception and cognition, IEEE Signal Processing Magazine, vol.25, issue.3, p.98117, 2008.

B. Murauer and G. Et-specht, International World Wide Web Conferences Steering Committee, Republic and Canton of, Proceedings of the Companion of the The Web Conference 2018, p.19231927, 2018.

C. V. Nanayakkara and H. A. Et-caldera, Music emotion recognition with audio and lyrics features, International Journal of Digital Information and Wireless Communications, vol.6, issue.4, p.260273, 2016.
DOI : 10.17781/p002150
URL : http://sdiwc.net/digital-library/web-admin/upload-pdf/00001940.pdf

A. Y. Ng, Preventing "overtting" of cross-validation data, Proceedings of the 14th International Conference on Machine Learning, p.245253, 1997.

H. M. Nguyen, E. W. Cooper, and K. Et-kamei, Borderline over-sampling for imbalanced data classication, Proceedings of the 5th International Workshop on Computational Intelligence and Applications, p.2429, 2009.
DOI : 10.1504/ijkesdp.2011.039875
URL : http://ousar.lib.okayama-u.ac.jp/files/public/1/19617/20160528004522391723/IWCIA2009_A1005.pdf

J. Hiroshima,

N. J. Nilsson, Learning machines: Foundations of trainable patternclassifying systems, 1965.

T. L. Nwe, A. Shenoy, and Y. Et-wang, Singing voice detection in popular music, Proceedings of the 12th annual ACM International Conference on Multimedia, p.324327, 2004.

S. Oramas, O. Nieto, M. Sordo, and X. Et-serra, A deep multimodal approach for cold-start music recommendation, Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems, p.3237, 2017.

N. Orio, Music retrieval: A tutorial and review, Foundations and Trends in Information Retrieval, vol.1, issue.1, p.190, 2006.

J. Osmalskyj, S. Piérard, M. Van-droogenbroeck, and J. J. Et-embrechts, Ecient database pruning for large-scale cover song recognition, 2013.

, Dans Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, p.714718

K. Oyamada, H. Kameoka, T. Kaneko, K. Tanaka, N. Hojo et al., Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms, 2018.

F. Pachet and P. Et-roy, Automatic generation of music programs, Proceedings of the 5th International Conference on Constraint Programming, p.331345, 1999.

R. P. Paiva, Moodetector: Automatic music emotion recognition, 2013.

E. Pampalk, A. Flexer, and G. Et-widmer, Improvements of audio-based music similarity and genre classicaton, Proceedings of the 6th International Conference on Music Information Retrieval, p.634637, 2005.

R. Panda and R. P. Et-paiva, Using support vector machines for automatic mood tracking in audio music, Proceedings of the 130th Audio Engineering Society Convention, p.18, 2011.

K. L. Payton and L. D. Et-braida, A method to determine the speech transmission index from speech waveforms, The Journal of the Acoustical Society of America, vol.106, issue.6, p.36373648, 1999.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, and B. Thirion,

O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al.,

É. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, p.28252830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

B. Peng, K. Guan, M. Chen, D. M. Lawrence, Y. Pokhrel et al.,

T. Arkebauer and Y. Lu, Improving maize growth processes in the community land model: Implementation and evaluation, Agricultural and Forest Meteorology, p.6489, 2018.

O. Pfungst, Clever Hans:(the horse of Mr. Von Osten.) a contribution to experimental animal and human psychology, 1911.

A. Pikrakis, Audio latin music genre classication: A MIREX 2013 submission based on a deep learning approach to rhythm modelling, Proceedings of the 9th Music Information Retrieval Evaluation eXchange, p.12, 2013.

T. Pohle, P. Knees, M. Schedl, E. Pampalk, and G. Et-widmer, reinventing the wheel: A novel approach to music player interfaces, IEEE Transactions on Multimedia, vol.9, issue.3, p.567575, 2007.

T. Pohle, E. Pampalk, and G. Et-widmer, Evaluation of frequently used audio features for classication of music into perceptual categories, Proceedings of the 4th International Workshop on Content-Based Multimedia Indexing, p.18, 2005.

J. Pons, T. Lidy, and X. Et-serra, Experimenting with musically motivated convolutional neural networks, Proceedings of the 14th International Workshop on Content-Based Multimedia Indexing, p.16, 2016.
DOI : 10.1109/cbmi.2016.7500246
URL : http://repositori.upf.edu/bitstream/10230/27038/1/serra_CBMI16_expe.pdf

J. Pons, O. Nieto, M. Prockup, E. M. Schmidt, A. F. Ehmann et al., End-to-end learning for music audio tagging at scale, Proceedings of the 31st Conference on the Advances in Neural Information Processing Systems, p.15, 2017.

J. Pons, O. Slizovskaia, R. Gong, E. Gómez, and X. Et-serra, Timbre Analysis of Music Audio Signals with Convolutional Neural Networks, 2017.
DOI : 10.23919/eusipco.2017.8081710
URL : http://repositori.upf.edu/bitstream/10230/33998/1/Pons_EUSIPCO2017_timb.pdf

A. Porter, D. Bogdanov, R. Kaye, R. Tsukanov, and X. Et-serra, Acousticbrainz: A community platform for gathering music information obtained from audio, Proceedings of the 16th International Conference on Music Information Retrieval, p.786792, 2015.

M. Prockup, A. F. Ehmann, F. Gouyon, E. M. Schmidt, O. Celma et al.,

Y. E. Kim, Modeling genre with the Music Genome Project: Comparing human-labeled attributes and audio features, Proceedings of the 16th International Society for Music Information Retrieval Conference, p.3137, 2015.

Z. Pr·²a and P. Et-rajmic, Toward high-quality real-time signal reconstruction from stft magnitude, IEEE Signal Processing Letters, vol.24, issue.6, p.892896, 2017.

J. R. Quinlan, C4.5: Programming for machine learning, p.48, 1993.

L. R. Rabiner and B. Et-juang, Fundamentals of speech recognition, 1993.

R. Raina, A. Madhavan, and A. Y. Et-ng, Large-scale deep unsupervised learning using graphics processors, Proceedings of the 26th Annual International Conference on Machine Learning, p.873880, 2009.

M. Ramona, G. Richard, and B. Et-david, Vocal detection in music with support vector machines, Proceedings of the 32nd IEEE International Conference on Acoustics Speech and Signal Processing, p.18851888, 2008.

B. Recht, C. Re, S. Wright, and F. Et-niu, Hogwild: A lock-free approach to parallelizing stochastic gradient descent, Proceedings of the 25th Conference on the Advances in Neural Information Processing Systems, p.693701, 2011.

L. Regnier and G. Et-peeters, Singing voice detection in music tracks using direct voice vibrato detection, Proceedings of the 33rd IEEE International Conference on Acoustics Speech and Signal Processing, p.16851688, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00662312

M. T. Ribeiro, S. Singh, and C. Et-guestrin, Model-agnostic interpretability of machine learning, Proceedings of the Workshop on Human Interpretability in Machine Learning at the 33rd International Conference on Machine Learning, p.9195, 2016.

M. Riedmiller, Advanced supervised learning in multi-layer perceptronsfrom backpropagation to adaptive learning algorithms, Computer Standards & Interfaces, vol.16, issue.3, p.265278, 1994.

M. Rocamora and P. Et-herrera, Comparing audio descriptors for singing voice detection in music audio les, Proceedings of the 11th Brazilian Symposium on Computer Music, p.187196, 2007.

G. Roma, E. M. Grais, A. J. Simpson, and M. D. Et-plumbley, Singing voice separation using deep neural networks and F0 estimation. Dans Proceedings of the 9th Music Information Retrieval Evaluation eXchange, 2016.

F. Rosenblatt, The perceptron, a perceiving and recognizing automaton Project Para, 1957.

S. O. Sadjadi, S. M. Ahadi, and O. Et-hazrati, Unsupervised speech/music classication using one-class support vector machines, Proceedings of the 6th International Conference on Information, Communications and Signal Processing, p.15, 2007.

C. Sanden and J. Z. Et-zhang, Enhancing multi-label music genre classication through ensemble techniques, Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, p.705714, 2011.

J. Schlüter, Learning to pinpoint singing voice from weakly labeled examples, Proceedings of the 17th International Society for Music Information Retrieval Conference, p.4450, 2016.

J. Schlüter and T. Et-grill, Exploring data augmentation for improved singing voice detection with neural networks, Proceedings of the 16th International Society for Music Information Retrieval Conference, p.121126, 2015.

J. Schlüter and S. Et-böck, Improved musical onset detection with convolutional neural networks, Proceedings of the 39rd IEEE International Conference on Acoustics, Speech and Signal Processing, p.69796983, 2014.

I. Florence,

J. Schmidhuber, Deep learning in neural networks: An overview. Neural networks, p.85117, 2015.

C. Schreiner and G. Et-langner, Periodicity coding in the inferior colliculus of the cat. ii. topographical organization, Journal of Neurophysiology, vol.60, issue.6, p.18231840, 1988.

E. Schröger, A. Bendixen, S. L. Denham, R. W. Mill, T. M. B®hm et al.,

I. Winkler, Predictive regularity representations in violation detection and auditory stream segregation: From conceptual to computational models, Brain topography, vol.27, issue.4, p.565577, 2014.

J. Schulman, N. Heess, T. Weber, and P. Et-abbeel, Gradient estimation using stochastic computation graphs, Proceedings of the 29th Conference on the Advances in Neural Information Processing Systems, p.35283536, 2015.

C. Senac, T. Pellegrini, F. Mouret, and J. Et-pinquier, Music feature maps with convolutional neural networks for music genre classication, Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, p.1923, 2017.

J. Serrà, X. Serra, and R. G. Et-andrzejak, Cross recurrence quantication for cover song identication, New Journal of Physics, vol.11, issue.9, p.93017, 2009.

U. Shardanand and P. Et-maes, Social information ltering: Algorithms for automating word of mouth, Proceedings of the Special Interest Group on Computer-Human Interaction Conference on Human Factors in Computing Systems, p.210217, 1995.

A. Shepitsen, J. Gemmell, B. Mobasher, and R. Et-burke, Personalized recommendation in social tagging systems using hierarchical clustering, 2008.

, Dans Proceedings of the Association for Computing Machinery 2nd Conference on Recommender Systems, p.259266

R. Shwartz-ziv and N. Et-tishby, Opening the black box of deep neural networks via information, 2017.

J. Skowronek, M. F. Mckinney, and S. Et-van-de-par, Ground truth for automatic music mood classication, Proceedings of the 7th International Conference on Music Information Retrieval, p.395396, 2006.

J. C. Smith, Correlation analyses of encoded music performance, 2013.

M. R. Smith, T. Martinez, and C. Et-giraud-carrier, An instance level analysis of data complexity, Machine learning, vol.95, issue.2, p.225256, 2014.

M. Sordo, C. Laurier, and Ò. Et-celma, Annotating music collections: How content-based similarity helps to propagate labels, Proceedings of the 8th International Conference on Music Information Retrieval, p.531534, 2007.

S. Sridharan, K. Vaidyanathan, D. Kalamkar, D. Das, M. E. Smorkalov et al., On scale-out deep learning training for cloud and HPC, 2018.

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Et-salakhutdinov, Dropout: A simple way to prevent neural networks from overtting, Journal of Machine Learning Research, vol.15, issue.1, 2014.

S. S. Stevens, J. Volkmann, and E. B. Et-newman, A scale for the measurement of the psychological magnitude pitch, The Journal of the Acoustical Society of America, vol.8, issue.3, p.185190, 1937.

A. Stolcke, S. Kajarekar, and L. Et-ferrer, Nonparametric feature normalization for svm-based speaker verication, Proceedings of the 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing, p.15771580, 2008.

D. Stoller, S. Ewert, and S. Et-dixon, Adversarial semi-supervised audio source separation applied to singing voice extraction, Proceedings of the 43rd IEEE International Conference on Acoustics, Speech and Signal Processing, p.15, 2018.
DOI : 10.1109/icassp.2018.8461722
URL : http://arxiv.org/pdf/1711.00048

B. L. Sturm, The GTZAN dataset: Its contents, its faults, their eects on evaluation, and its future use, 2013.

B. L. Sturm, A simple method to determine if a music information retrieval system is a "Horse, IEEE Transactions on Multimedia, vol.16, issue.6, p.16361644, 2014.

B. L. Sturm, The state of the art ten years after a state of the art: Future research in music information retrieval, Journal of New Music Research, vol.43, issue.2, p.147172, 2014.

B. L. Sturm, Faults in the latin music database and with its use, Proceedings of the Late Breaking Demo of the 16th International Society for Music Information Retrieval Conference, p.12, 2015.

B. L. Sturm, C. Kereliuk, and A. Et-pikrakis, A closer look at deep learning neural networks with low-level spectral periodicity features, Proceedings of the 4th International Workshop on Cognitive Information Processing, p.16, 2014.

N. Sturmel and L. Et-daudet, Signal reconstruction from stft magnitude: A state of the art, Proceedings of the 14th International Conference on Digital Audio Eects, p.375386, 2011.

I. Sutskever, O. Vinyals, and Q. V. Et-le, Sequence to sequence learning with neural networks, Proceedings of the 28th Conference on the Advances in Neural Information Processing Systems, p.31043112, 2014.

C. Montréal,

H. Tachibana, T. Ono, N. Ono, and S. Et-sagayama, Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source, Proceedings of the 34th IEEE International Conference on Acoustics, Speech, and Signal Processing, p.425428, 2010.

A. Tajadura-jiménez, G. Pantelidou, P. Rebacz, D. Västfjäll, and M. Et-tsakiris, I-space: The eects of emotional valence and source of music on interpersonal distance, PLoS One, vol.6, issue.10, p.26083, 2011.

N. Takahashi, M. Gygli, B. Pfister, and L. Et-van-gool, Deep convolutional neural networks and data augmentation for acoustic event detection, 2016.

, Dans Proceedings of the Interspeech conference, p.29822986

E. Terhardt, G. Stoll, and M. Et-seewann, Algorithm for extraction of pitch and pitch salience from complex tonal signals, The Journal of the Acoustical Society of America, vol.71, issue.3, p.679688, 1982.

J. Thickstun, Z. Harchaoui, D. Foster, and S. M. Et-kakade, Invariances and data augmentation for supervised music transcription, 2017.

D. Tingle, Y. E. Kim, and D. Et-turnbull, Exploring automatic music annotation with "acoustically-objective" tags, Proceedings of the 11th ACM International Conference on Multimedia Information Retrieval, p.5562, 2010.

N. Tintarev, C. Lofi, and C. Et-liem, Sequences of diverse song recommendations: An exploratory study in a commercial system, Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, p.391392, 2017.

I. Tomek, An experiment with the edited nearest-neighbor rule, IEEE Transactions on Systems, Man, and Cybernetics, vol.1, issue.6, p.448452, 1976.

I. Tomek, Two modications of CNN, IEEE Transactions on Systems, Man, and Cybernetics, vol.6, p.769772, 1976.

D. Turnbull, L. Barrington, D. Torres, and G. Et-lanckriet, Towards musical query-by-semantic-description using the CAL500 data set, 2007.

, Dans Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 439446. Amsterdam

D. Turnbull, L. Barrington, D. Torres, and G. Et-lanckriet, Semantic annotation and retrieval of music and sound eects, IEEE Transactions on Audio Speech and Language Processing, vol.16, issue.2, p.467476, 2008.

G. Tzanetakis and P. Et-cook, Marsyas: A framework for audio analysis, Organised sound, vol.4, issue.3, p.169175, 2000.

G. Tzanetakis and P. Et-cook, Musical genre classication of audio signals, IEEE Transaction on Speech and Audio Processing, vol.10, issue.5, p.293302, 2002.

J. Urbano, D. Bogdanov, P. Herrera, E. Gómez, and X. Et-serra, What is the eect of audio quality on the robustness of MFCCs and chroma features, Proceedings of the 15th International Society for Music Information Retrieval Conference, p.573578, 2014.

J. Valin, A hybrid DSP/deep learning approach to real-time fullband speech enhancement, 2017.

J. Van-noort, The structure and connections of the inferior colliculus: An investigation of the lower auditory system, 1969.

G. Velarde, Convolutional methods for music analysis, 2017.

V. Schroeter, T. Doraisamy, S. Et-rüger, and S. M. , From raw polyphonic audio to locating recurring themes, Proceedings of the 1st International Symposium on Music Information Retrieval, p.111, 2000.

X. Wan, J. Liu, W. K. Cheung, and T. Et-tong, Learning to improve medical decision making from imbalanced data without a priori cost, BMC Medical Informatics and Decision Making, vol.14, issue.1, p.111, 2014.
DOI : 10.1186/s12911-014-0111-9
URL : https://bmcmedinformdecismak.biomedcentral.com/track/pdf/10.1186/s12911-014-0111-9

F. Weninger, J. Durrieu, F. Eyben, G. Richard, and B. Et-schuller, Combining monaural source separation with long short-term memory for increased robustness in vocalist gender recognition, Proceedings of the 36th IEEE International Conference on Acoustics Speech and Signal Processing, p.21962199, 2011.

K. West and S. Et-cox, Features and classiers for the automatic classication of musical audio signals, Proceedings of the 5th International Society for Music Information Retrieval Conference, p.16, 2004.

B. Whitman and D. P. Et-ellis, Automatic record reviews, Proceedings of the 5th International Conference on Music Information Retrieval, p.470477, 2004.

G. A. Wiggins, Semantic gap ? ? Schemantic schmap ! ! Methodological considerations in the scientic study of music, Proceedings of the 11th IEEE International Symposium on Multimedia, p.477482, 2009.

D. L. Wilson, Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man, and Cybernetics, vol.2, issue.3, p.408421, 1972.

K. Woods and K. W. Bowyer, Generating roc curves for articial neural networks, IEEE Transactions on Medical Images, vol.16, issue.3, p.329337, 1997.
DOI : 10.1109/cbms.1994.316012

D. Wu, N. Sharma, and M. Et-blumenstein, Recent advances in videobased human action recognition using deep learning: A review, Proceedings of the 30th International Joint Conference on Neural Networks, p.28652872, 2017.

X. Wu, V. Kumar, R. Quinlan, J. Ghosh, J. Yang et al.,

D. J. Hand and D. Steinberg, Top 10 algorithms in data mining, Knowledge and Information Systems, vol.14, issue.1, p.137, 2008.

Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi et al.,

J. Dean, Google's neural machine translation system: Bridging the gap between human and machine translation, 2016.

Q. Yang and X. Et-wu, 10 challenging problems in data mining research, International Journal of Information Technology and Decision Making, vol.5, issue.4, p.597604, 2006.

A. Ycart and E. Et-benetos, A study on LSTM networks for polyphonic music sequence modelling, Proceedings of the 18th International Conference on Music Information Retrieval, p.421427, 2017.

D. Yin, S. D. Bond, and H. Et-zhang, Are bad reviews always stronger than good ? Asymmetric negativity bias in the formation of online consumer trust, Proceedings of 31st International Conference of Information Systems, p.118, 2010.

P. Zhang, X. Zheng, W. Zhang, S. Li, S. Qian et al., A deep neural network for modeling music, Proceedings of the 5th Annual ACM International Conference on Multimedia Retrieval, 2015.

T. Zhang and C. J. Et-kuo, Content-Based Audio Classication and Retrieval for Audiovisual Data Parsing, 2013.
DOI : 10.1007/978-1-4757-3339-6

Y. C. Zhang, D. Ó. Séaghdha, D. Quercia, and T. Et-jambor, Auralist: Introducing serendipity into music recommendation, Proceedings of the 5th ACM International Conference on Web Search and Data Mining, p.1322, 2012.

P. Zhao, R. Jin, T. Yang, and S. C. Et-hoi, Online AUC maximization, 2011.

, Dans Proceedings of the 28th International Conference on Machine Learning, p.233240

, Glossaire Cent Échelle de représentation des notes musicales qui découpe un demi-ton en cent unités, vol.66, p.81

, Chanson Morceau comprenant au moins une piste audio sur laquelle ont été enregistrés des sons provenant directement ou indirectement de la voix humaine, p.185, 2934.

, Colliculus inférieur Élément du cerveau qui fait partie de la voie auditive ascendante entre l'oreille et le cortex auditif, vol.61, p.82, 1969.

, Échantillon Représentation numérique de la valeur de la pression acoustique à un instant donné au niveau du micro d'enregistrement, vol.24, p.176

, Émotion Réaction aective transitoire d'assez grande intensité, habituellement provoquée par une stimulation venue de l'environnement 1, vol.140, p.176

, Enregistrement Morceau enregistré sur support matérialisé ou dématérialisé xant une interprétation et composé d'un ensemble d'échantillons. 2224, 31, vol.38, p.176

, Harmonique Son dont la fréquence est un multiple entier d'une fréquence fondamentale propre à un autre son, p.62

, Heuristique Se dit d'une méthode de calcul qui fournit relativement rapidement, en temps polynomial, une solution réalisable mais pas nécessairement optimale pour un problème d'optimisation NP-dicile, vol.50, p.51

, Horse Se dit d'une méthode qui ne traite pas le problème qu'elle prétend résoudre, vol.100, p.138

, Instrumental Morceau ne comprenant aucune piste audio sur laquelle ont été enregistrés des sons provenant directement ou indirectement de la voix humaine, vol.182, p.185, 2934.

, Glossary Morceau ×uvre musicale interprétée, vol.143, p.181186, 1849.

, Mégadonnées Les mégadonnées désignent des ensembles de données croissants devenus si volumineux et complexes qu'ils dépassent l'intuition et les capacités humaines d'analyse ainsi que celles des outils informatiques classiques de gestion de bases de données ou de l'information. L'utilisation de ce mot est recommandée par la délégation générale à la langue française et aux langues de France 1. L'équivalent anglais des mégadonnées est big data

, Piste Un enregistrement comprend une ou plusieurs pistes qui correspondent à chaque source unique de sons, vol.30, p.176

, Reprise Se dit d'un morceau existant qui est rejoué, de façon similaire ou non, par un interprète diérent de celui de la version originale, vol.21, p.184

, Reproduire Capacité déterministe à générer à nouveau les résultats d'une expérience scientique, à ne pas confondre avec la réplicabilité [Drummond, p.176, 2009.

, Réplicabilité Fait de parvenir aux mêmes conclusions qu'une expérience scientique, vol.22, p.176

, Sentiment État aectif complexe et durable lié à certaines émotions ou représentations 2. 5, vol.25, p.26

, Streaming (anglicisme) Diusion ou transfert de données en ux continu, vol.48, p.93

, Sérendipité Capacité, art de faire une découverte, notamment scientique, par hasard. La sérendipité dénote également la découverte ainsi faite, vol.8, p.9

, Tessiture Ensemble de notes pouvant être émises avec homogénéité par un instrument

, Timbre Ensemble des caractéristiques sonores qui permettent d'identier un instrument

, Trame Subdivision arbitraire d'un enregistrement regroupant plusieurs échantillons audio, vol.26, p.185, 113116.

A. Acronymes and . Adaptive, SYNthetic sampling approach for imbalanced learning, p.51

, AllKNN All K-Nearest Neighbours, vol.49, p.53

, API Application Programming Interface, vol.32, p.136

, AUC Area Under the Curve, vol.16, p.185

, BPM Battements Par Minute. 63 bSMOTE borderline Synthetic Minority Over-sampling TEchnique, vol.50, p.53

, CIFAR Canadian Institute For Advanced Research, vol.19, p.20

, CNN Convolutional Neural Networks, vol.119, p.129

, Condensed-NN Condensed Nearest Neighbours. 49, 50, 53 CSV Comma Separated Value, p.33

, DAMP Digital Archive of Mobile Performances, p.39

, EFF Electronic Frontier Foundation. 33 ENN Edited Nearest Neighbours, vol.4951, p.53

, FMA Free Music Archive, vol.24, p.131

. Ga-algorithme-de-ghosal, , vol.108, p.185, 2013.

, GRU Gated Recurrent Unit, vol.120

, IFPI International Federation of the Phonographic Industry, vol.4, p.183

, IHT Instance Hardness Threshold, vol.50, p.53

, INRIA Institut National de Recherche en Informatique et en Automatique, p.30

, IRISA Institut de Recherche en Informatique et Systèmes Aléatoires. 30 ISO International Organization for Standardization. 32 ISRC International Standard Recording Code, vol.3133, p.183

, Acronyms k-NN K-Nearest Neighbours, p.103

, Kara1K Karaoke database of 1,000 tracks, vol.4044, p.183

L. Long,

. Mbid-musicbrainz-identier, , vol.32, p.33

, MDI Modulation Discrimination Interference, vol.62, p.76

, MFCC Mel-Frequency Cepstral Coecients, vol.85, p.128

, MIREX Music Information Retrieval Evaluation eXchange, vol.38, p.45

, MLP Multi Layer Perceptron, vol.118, p.185

, MP3 Moving Picture Experts Group Phase 1 Audio Layer III. 58 MSD Million Song Dataset, p.30

, NCR Neighboorhood Cleaning Rule, vol.50

, NearMiss NearMiss using nearest neighbours, vol.50, p.53

, OSS One-Sided Selection, vol.50, p.53

, PEA ProbaH2-10, ETRAll et AdaBoost, vol.109115, p.185

, RAM Random-Access Memory, vol.125127, p.129

, RANSAC RANdom Sample And Consensus, vol.103, p.115

, ReLU Rectied Linear Unit, vol.120, p.128

, RENN Repeated Edited Nearest Neighbours, vol.49, p.53

, RNN Recurrent Neural

, ROS Random minority Over-Sampling, vol.51, p.53

, RUS Random majority Under-Sampling, vol.50, p.53

, SATIN Set of Audio Tags and Identiers Normalized, vol.31, p.185, 128130.

. Shs-second-hand and . Songs, , p.38

, SMOTE Synthetic Minority Over-sampling TEchnique, vol.50, p.53

, SMOTEENN Synthetic Minority Over-sampling TEchnique followed by Edited Nearest Neighbours, vol.51, p.53

, SMOTETOMEK Synthetic Minority Over-sampling TEchnique followed by Tomek links, vol.51, p.53

, Acronyms SOFT1 rst Set Of FeaTures. 33, 34, 128 SPL Sound Pressure Level, vol.60, p.70

. Svm-support-vector-machine, , vol.51, p.115

, SVMBFF Support Vector Machine applied to Bags of Frames of Features, vol.108, p.185, 110113.

, SVMSMOTE Support Vector Machine and Synthetic Minority Over-sampling TEchnique, vol.51, p.53

, Tomek Extraction of majority-minority Tomek links, vol.50, p.53

, VQMM Vector Quantization and Markov Model. 108, 110113, vol.115, p.185

L. Des-figures,

. Mcfee, et des premières secondes du morceau Elysium d'Al Di Meola et (b) d'une image qui est une photographie personnelle, Exemples (a) d'un spectrogramme réalisé à partir du logiciel librosa, 2015.

. , Détails de l'ensemble des étapes de la chaîne de traitement de morceaux permettant de constituer des listes de lecture musicale