. .. Summary, 82 6.1.1 ABClass: a motif-based MIL approach for sequence data with across-bag dependencies

, ABSim: a similarity-based MIL approach for sequence data with across-bag dependencies

. .. Future, 83 6.2.1 Short-term perspective

S. Z. Alborzi, Automatic Discovery of Hidden Associations Using Vector Similarity : Application to Biological Annotation Prediction, 2018.
URL : https://hal.archives-ouvertes.fr/tel-01792299

L. Albuquerque, C. Simoes, M. F. Nobre, N. M. Pino, J. R. Battista et al., Truepera radiovictrix gen. nov., sp. nov., a new radiation resistant species and the proposal of trueperaceae fam. nov. FEMS microbiology letters, vol.247, pp.161-169, 2005.

E. Alpayd?n, V. Cheplygina, M. Loog, and D. M. Tax, Single-vs. multiple-instance classification, Pattern Recognition, vol.48, issue.9, pp.2831-2838, 2015.

S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, Basic local alignment search tool, Journal of molecular biology, vol.215, issue.3, pp.403-410, 1990.

S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang et al., Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic acids research, vol.25, issue.17, pp.3389-3402, 1997.

J. Amores, Multiple instance classification: Review, taxonomy and comparative study, Artificial Intelligence, vol.201, pp.81-105, 2013.

J. Amores, MILDE: multiple instance learning by discriminative embedding, Knowledge and Information Systems, vol.42, issue.2, pp.381-407, 2015.

A. Andreeva, D. Howorth, C. Chothia, E. Kulesha, and A. G. Murzin, SCOP2 prototype: a new approach to protein structure mining, Nucleic acids research, vol.42, issue.1, pp.310-314, 2013.

S. Andrews, I. Tsochantaridis, and T. Hofmann, Support Vector Machines for Multiple-Instance Learning, Advances in Neural Information Processing Systems, pp.561-568, 2003.

R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, E. Birney et al., The InterPro database, an integrated documentation resource for protein families, domains and functional sites, Nucleic acids research, vol.29, issue.1, pp.37-40, 2001.
URL : https://hal.archives-ouvertes.fr/hal-00427125

R. Apweiler, A. Bairoch, C. H. Wu, W. C. Barker, B. Boeckmann et al., UniProt: the universal protein knowledgebase, Nucleic acids research, vol.32, issue.1, pp.115-119, 2004.

S. Aridhi, H. Sghaier, M. Zoghlami, M. Maddouri, and E. M. Nguifo, Prediction of ionizing radiation resistance in bacteria using a multiple instance learning model, Journal of Computational Biology, vol.23, issue.1, pp.10-20, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01807946

T. K. Attwood, P. Bradley, D. R. Flower, A. Gaulton, N. Maudling et al., Prints and its automatic supplement, preprints, Nucleic acids research, vol.31, issue.1, pp.400-402, 2003.

A. Bateman, L. Coin, R. Durbin, R. D. Finn, V. Hollich et al., The Pfam protein families database, Nucleic acids research, vol.32, issue.1, pp.138-141, 2004.
URL : https://hal.archives-ouvertes.fr/hal-01294685

A. L. Bauer, W. S. Hlavacek, P. J. Unkefer, and F. Mu, Using sequencespecific chemical and structural properties of DNA to predict transcription factor binding sites, PLoS computational biology, vol.6, issue.11, p.1001007, 2010.

D. A. Benson, M. Cavanaugh, K. Clark, I. Karsch-mizrachi, D. J. Lipman et al., Genbank. Nucleic acids research, vol.41, issue.1, pp.36-42, 2012.

M. Bhagwat, L. Young, and R. R. Robison, Using BLAT to find sequence similarity in closely related genomes, Current protocols in bioinformatics, vol.37, issue.1, pp.10-18, 2012.

D. Billi, E. Friedmann, K. Hofer, M. Caiola, and R. Ocampo-friedmann, Ionizing-radiation resistance in the desiccation-tolerant cyanobacterium chroococcidiopsis, Applied and Environmental Microbiology, vol.66, pp.1489-1492, 2002.

K. Blekas, D. I. Fotiadis, and A. Likas, Motif-based protein sequence classification using neural networks, Journal of Computational Biology, vol.12, issue.1, pp.64-82, 2005.

J. L. Bouchot, W. L. Trimble, G. Ditzler, Y. Lan, S. Essinger et al., Advances in Machine Learning for Processing and Comparison of Metagenomic Data, pp.295-329, 2014.

E. Boutet, D. Lieberherr, M. Tognolli, M. Schneider, P. Bansal et al., UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view, Plant Bioinformatics, pp.23-54, 2016.

H. Brim, S. C. Mcfarlan, J. K. Fredrickson, K. W. Minton, M. Zhai et al., Engineering deinococcus radiodurans for metal remediation in radioactive mixed waste environments, Nature biotechnology, vol.18, issue.1, pp.85-90, 2000.

H. Brim, A. Venkateswaran, H. M. Kostandarithes, J. K. Fredrickson, and M. J. Daly, Engineering deinococcus geothermalis for bioremediation of high-temperature radioactive waste environments, Applied and environmental microbiology, vol.69, issue.8, pp.4575-4582, 2003.

B. Brooks and R. Murray, Nomenclature for "micrococcus radiodurans" and other radiation-resistant cocci: Deinococcaceae fam. nov. and deinococcus gen. nov., including five species, Journal of Systematic Bacteriology, vol.31, pp.353-360, 1981.

Y. Chen, J. Bi, and J. Z. Wang, MILES: Multiple-instance learning via embedded instance selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.12, pp.1931-1947, 2006.

B. Y. Cheng, J. G. Carbonell, and J. Klein-seetharaman, Protein classification based on text document classification techniques, Proteins: Structure, Function, and Bioinformatics, vol.58, issue.4, pp.955-970, 2005.

V. Cheplygina, D. M. Tax, and M. Loog, Multiple instance learning with bag dissimilarities, Pattern Recognition, vol.48, issue.1, pp.264-275, 2015.

N. A. Chuzhanova, A. J. Jones, and S. Margetts, Feature selection for genetic sequence classification, Bioinformatics, vol.14, issue.2, pp.139-143, 1998.

M. J. Daly, E. K. Gaidamakova, V. Y. Matrosova, A. Vasilenko, M. Zhai et al., Accumulation of Mn(II) in deinococcus radiodurans facilitates gamma-radiation resistance, vol.306, pp.1025-1028, 2004.

M. Dayhoff, R. Schwartz, and B. Orcutt, A model of evolutionary change in proteins, Atlas of protein sequence and structure, vol.5, pp.345-352, 1978.

T. G. Dietterich, R. H. Lathrop, and T. Lozano-pérez, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, vol.89, issue.1-2, pp.31-71, 1997.

L. Dong, A comparison of multi-instance learning algorithms, 2006.

G. Fang, N. Bhardwaj, R. Robilotto, and M. B. Gerstein, Getting started in gene orthology and functional analysis, PLoS computational biology, vol.6, issue.3, p.1000703, 2010.

A. W. Faria, F. G. Coelho, A. Silva, H. Rocha, G. Almeida et al., MILKDE: A new approach for multiple instance learning based on positive instance selection and kernel density estimation, Engineering Applications of Artificial Intelligence, vol.59, pp.196-204, 2017.

S. Federhen, K. Clark, T. Barrett, H. Parkinson, J. Ostell et al., , 2014.

, Toward richer metadata for microbial sequences: replacing strain-level ncbi taxonomy taxids with bioproject, biosample and assembly records, Standards in genomic sciences, vol.9, issue.3, p.1275

M. Federighi and J. Tholozan, Traitements ionisants et hautes pressions des aliments, Economica, pp.161-169, 2001.

D. N. Fedorov, G. A. Ekimova, N. V. Doronina, and Y. A. Trotsenko, 1-Aminocyclopropane-1-carboxylate (ACC) deaminases from Methylobacterium radiotolerans and Methylobacterium nodulans with higher specificity for ACC, FEMS Microbiol Lett, vol.343, issue.1, pp.70-76, 2013.

A. C. Ferreira, M. F. Nobre, E. Moore, F. A. Rainey, J. R. Battista et al., Characterization and radiation resistance of new isolates of rubrobacter radiotolerans and rubrobacter xylanophilus, vol.3, pp.235-238, 1999.

R. D. Finn, T. K. Attwood, P. C. Babbitt, A. Bateman, P. Bork et al., 2017-beyond protein family and domain annotations, vol.45, pp.190-199, 2016.

R. D. Finn, P. Coggill, R. Y. Eberhardt, S. R. Eddy, J. Mistry et al., The Pfam protein families database: towards a more sustainable future, Nucleic acids research, vol.44, issue.1, pp.279-285, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01294685

J. Foulds and E. Frank, A review of multi-instance learning assumptions, The Knowledge Engineering Review, vol.25, issue.1, pp.1-25, 2010.

P. Gabani and O. V. Singh, Radiation-resistant extremophiles and their potential in biotechnology and therapeutics, Applied microbiology and biotechnology, vol.97, issue.3, pp.993-1004, 2013.

P. Gane, A. Bateman, M. Mj, C. O'donovan, M. Magrane et al.,

B. Bursteinas, G. Chavali, E. Cibrián-uhalte, S. Ad, M. De-giorgi et al., Uniprot: A hub for protein information, Nucleic Acids Research, vol.43, pp.204-212, 2014.

Z. Gao and J. Ruan, A structure-based multiple-instance learning approach to predicting in vitro transcription factor-DNA interaction, BMC genomics, vol.16, pp.9-9, 2013.

J. Gracy and P. Argos, Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment, Bioinformatics, vol.14, issue.2, pp.164-173, 1998.

P. Green and I. Bousfield, Emendation of methylobacterium; methylobacterium rhodinum comb. nov. corrig.; methylobacterium radiotolerans comb. nov. corrig.; and methylobacterium mesophilicum comb, International Journal of Bacteriology, vol.33, pp.875-877, 1983.

M. Gtari, I. Essoussi, R. Maaoui, H. Sghaier, R. Boujmil et al., Contrasted resistance of stone-dwelling geodermatophilaceae species to stresses known to give rise to reactive oxygen species, FEMS Microbiology Ecology, vol.80, issue.3, pp.566-577, 2012.
URL : https://hal.archives-ouvertes.fr/halsde-00722684

D. H. Haft, J. D. Selengut, R. A. Richter, D. Harkins, M. K. Basu et al., Tigrfams and genome properties in 2013, Nucleic acids research, vol.41, issue.1, pp.387-395, 2012.

D. H. Haft, J. D. Selengut, and O. White, The tigrfams database of protein families, Nucleic acids research, vol.31, issue.1, pp.371-373, 2003.

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann et al., The weka data mining software: an update, ACM SIGKDD explorations newsletter, vol.11, issue.1, pp.10-18, 2009.

S. M. Halling, B. D. Peterson-burch, B. J. Bricker, R. L. Zuerner, Z. Qing et al., Completion of the genome Bibliography 93 sequence of brucella abortus and comparison to the highly similar genomes of brucella melitensis and brucella suis, Journal of Bacteriology, vol.187, issue.8, pp.2715-2726, 2005.

S. Henikoff and J. G. Henikoff, Amino acid substitution matrices from protein blocks, Proceedings of the National Academy of Sciences, vol.89, pp.10915-10919, 1992.

F. Herrera, S. Ventura, R. Bello, C. Cornelis, A. Zafra et al., Multiple instance learning: foundations and algorithms, 2016.

J. Hu, J. Wang, J. Lin, T. Liu, Y. Zhong et al., MD-SVM: a novel SVM-based algorithm for the motif discovery of transcription factor binding sites, BMC bioinformatics, vol.20, issue.7, p.200, 2019.

H. Ito and H. Iizuka, Taxonomic studies on a radio-resistant pseudomonas. Part XII. studies on the microorganisms of cereal grain. Agricultural and Biological Chemistry, vol.35, pp.1566-1571, 1971.

H. Ito, W. H. Takeshia, M. Iizuka, and H. , Isolation and identification of radiation-resistant cocci belonging to the genus deinococcus from sewage sludges and animal feeds, Agricultural and Biological Chemistry, vol.47, pp.1239-1247, 1983.

I. Jonassen, Efficient discovery of conserved patterns using a pattern graph, Bioinformatics, vol.13, issue.5, pp.509-522, 1997.

P. Jones, D. Binns, H. Chang, M. Fraser, W. Li et al., InterProScan 5: genomescale protein function classification, Bioinformatics, vol.30, issue.9, pp.1236-1240, 2014.

M. Kandemir and F. A. Hamprecht, Computer-aided diagnosis from weak supervision: a benchmarking study, Computerized medical imaging and graphics, vol.42, pp.44-50, 2015.

J. Kelley, The cutting-plane method for solving convex programs, Journal of the Society for Industrial and Applied Mathematics, pp.703-712, 1960.

W. J. Kent, BLAT -the BLAST-like alignment tool, Genome research, vol.12, issue.4, pp.656-664, 2002.

J. Kim, E. N. Moriyama, C. G. Warr, P. J. Clyne, and J. R. Carlson, Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties, Bioinformatics, vol.16, issue.9, pp.767-775, 2000.

R. Leinonen, F. G. Diez, D. Binns, W. Fleischmann, R. Lopez et al., UniProt archive, vol.20, pp.3236-3237, 2004.

N. Lesh, M. J. Zaki, and M. Ogihara, Mining features for sequence classification, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.342-346, 1999.

I. Letunic, T. Doerks, and P. Bork, SMART 7: recent updates to the protein domain annotation resource, Nucleic acids research, vol.40, issue.1, pp.302-305, 2011.

W. Li, A. Cowley, M. Uludag, T. Gur, H. Mcwilliam et al., The EMBL-EBI bioinformatics web and programmatic tools framework, Nucleic acids research, vol.43, issue.1, pp.580-584, 2015.

Z. Li, G. Geng, J. Feng, J. Peng, C. Wen et al., Multiple instance learning based on positive instance selection and bag structure construction, Pattern Recognition Letters, vol.40, pp.19-26, 2014.

K. Liolios, K. Mavromatis, N. Tavernarakis, and N. C. Kyrpides, The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata, Nucleic Acids Research, vol.36, issue.1, pp.475-479, 2008.

G. Liu, J. Wu, and Z. Zhou, Key instance detection in multi-instance learning, Journal of Machine Learning Research, vol.25, pp.253-268, 2012.

O. Lukjancenko, T. M. Wassenaar, and D. W. Ussery, Comparison of 61 sequenced Escherichia coli genomes, Microbial ecology, vol.60, issue.4, pp.708-720, 2010.

M. Maddouri and M. Elloumi, Encoding of primary structures of biological macromolecules within a data mining perspective, Journal of Computer Science and Technology, vol.19, issue.1, pp.78-88, 2004.

K. S. Makarova, M. V. Omelchenko, E. K. Gaidamakova, V. Y. Matrosova, A. Vasilenko et al., Deinococcus geothermalis: the pool of extreme radiation resistance genes shrinks, PLoS One, vol.2, issue.9, p.955, 2007.

A. Marchler-bauer, J. B. Anderson, P. F. Cherukuri, C. Deweese-scott, L. Y. Geer et al., CDD: a conserved domain database for protein classification, Nucleic acids research, vol.33, issue.1, pp.192-196, 2005.

A. Marchler-bauer, M. K. Derbyshire, N. R. Gonzales, S. Lu, F. Chitsaz et al., CDD: NCBI's conserved domain database, Nucleic acids research, vol.43, issue.1, pp.222-226, 2014.

O. Maron and T. Lozano-pérez, A framework for multiple-instance learning, Proceedings of the 1997 conference on Advances in neural information processing systems 10, NIPS '97, pp.570-576, 1998.

O. Maron and A. L. Ratan, Multiple-instance learning for natural scene classification, ICML, vol.98, pp.341-349, 1998.

G. Melki, A. Cano, and S. Ventura, MIRSVM: multi-instance support vector machine with bag representatives, Pattern Recognition, vol.79, pp.228-241, 2018.

C. Menichelli, O. Gascuel, and L. Bréhélin, Improving pairwise comparison of protein sequences with domain co-occurrence, PLoS computational biology, vol.14, issue.1, p.1005889, 2018.
URL : https://hal.archives-ouvertes.fr/lirmm-01744475

H. Mi, X. Huang, A. Muruganujan, H. Tang, C. Mills et al., PANTHER version 11: expanded annotation data from gene Bibliography ontology and reactome pathways, and data analysis tool enhancements, Nucleic acids research, vol.45, issue.1, pp.183-189, 2016.

J. Mistry, A. Bateman, and R. D. Finn, Predicting active site residue annotations in the Pfam database, BMC bioinformatics, vol.8, issue.1, p.298, 2007.

S. Mukherjee, D. Stamatis, J. Bertsch, G. Ovchinnikova, O. Verezemska et al., , 2016.

, Genomes OnLine Database (GOLD) v. 6: data updates and feature enhancements, Nucleic acids research, p.992

A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, SCOP: a structural classification of proteins database for the investigation of sequences and structures, Journal of molecular biology, vol.247, issue.4, pp.536-540, 1995.

S. B. Needleman and C. D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of molecular biology, vol.48, issue.3, pp.443-453, 1970.

H. M. Ni, D. W. Qi, and H. Mu, Applying MSSIM combined chaos game representation to genome sequences analysis, Genomics, vol.110, issue.3, pp.180-190, 2018.

C. A. Orengo, A. Michie, S. Jones, D. T. Jones, M. Swindells et al., CATH -a hierarchic classification of protein domain structures, Structure, vol.5, issue.8, pp.1093-1109, 1997.

I. A. Pagnuco, J. I. Pastore, G. Abras, M. Brun, and V. L. Ballarin, Analysis of genetic association using hierarchical clustering and cluster validation indices, Genomics, vol.109, issue.5-6, pp.438-445, 2017.

E. Parzen, On estimation of a probability density function and mode. The annals of mathematical statistics, vol.33, pp.1065-1076, 1962.

F. M. Pearl, C. Bennett, J. E. Bray, A. P. Harrison, N. Martin et al., The CATH database: an extended protein family resource for structural and functional genomics, Nucleic acids research, vol.31, issue.1, pp.452-455, 2003.

R. W. Phillips, J. Wiegel, C. J. Berry, C. Fliermans, A. D. Peacock et al., Kineococcus radiotolerans sp. nov., a radiation-resistant, gram-positive bacterium, International journal of systematic and evolutionary microbiology, vol.52, issue.3, pp.933-938, 2002.

J. Platt, Sequential minimal optimization: A fast algorithm for training support vector machines, 1998.

K. D. Pruitt, T. Tatusova, G. R. Brown, and D. R. Maglott, NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy, Nucleic acids research, vol.40, issue.1, pp.130-135, 2011.

F. Rainey, M. F. Nobre, S. P. Costa, and M. , Phylogenetic diversity of the deinococci as determined by 16s ribosomal DNA sequence comparison, International Journal of Bacteriology, vol.47, pp.510-514, 1997.

F. A. Rainey, K. Ray, M. Ferreira, B. Z. Gatz, M. F. Nobre et al., Extensive diversity of ionizing-radiation-resistant bacteria recovered from sonoran desert soil and description of nine new species of the genus deinococcus obtained from a single soil sample, Applied and environmental microbiology, vol.71, issue.9, pp.5225-5235, 2005.

S. Ray and M. Craven, Learning statistical models for annotating proteins with function information using biomedical text, BMC bioinformatics, vol.6, issue.1, p.18, 2005.

H. Saigo, J. Vert, N. Ueda, and T. Akutsu, Protein homology detection using string alignment kernels, Bioinformatics, vol.20, issue.11, pp.1682-1689, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00433587

P. Santiago-sotelo and J. H. Ramirez-prado, prfectBLAST: a platformindependent portable front end for the command terminal BLAST+ stand-alone suite, BioTechniques, vol.53, issue.5, pp.299-300, 2012.

J. Schultz, F. Milpetz, P. Bork, and C. P. Ponting, SMART, a simple modular architecture research tool: identification of signaling domains, Proceedings of the National Academy of Sciences, vol.95, issue.11, pp.5857-5864, 1998.

H. Sghaier, K. Ghedira, A. Benkahla, and I. Barkallah, Basal dna repair machinery is subject to positive selection in ionizing-radiation-resistant bacteria, BMC genomics, vol.9, issue.1, p.297, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01358559

R. She, F. Chen, K. Wang, M. Ester, J. L. Gardy et al., Frequent-subsequence-based prediction of outer membrane proteins, Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp.436-445, 2003.

D. Slade and M. Radman, Oxidative stress resistance in deinococcus radiodurans, Microbiology and Molecular Biology Reviews, vol.75, pp.133-191, 2011.

A. J. Smola, S. Vishwanathan, and T. Hofmann, Kernel methods for missing variables, Proceedings of International Workshop on Artificial Intelligence and Statistics, pp.325-332, 2005.

M. Sniedovich, Dynamic programming: foundations and principles, 2010.

P. K. Srivastava, D. K. Desai, S. Nandi, and A. M. Lynn, HMM-ModE-Improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences, BMC bioinformatics, vol.8, issue.1, p.104, 2007.

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Advances in neural information processing systems, pp.3104-3112, 2014.

B. E. Suzek, H. Huang, P. Mcgarvey, R. Mazumder, and C. H. Wu, UniRef: comprehensive and non-redundant UniProt reference clusters, Bioinformatics, vol.23, issue.10, pp.1282-1288, 2007.

Q. Tao, S. Scott, N. Vinodchandran, and T. T. Osugi, SVM-based generalized multiple-instance learning via approximate box counting, Proceedings of the twenty-first international conference on Machine learning, pp.799-806, 2004.

P. D. Thomas, A. Kejariwal, M. J. Campbell, H. Mi, K. Diemer et al., PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification, Nucleic acids research, vol.31, issue.1, pp.334-341, 2003.

S. Vinga and J. Almeida, Alignment-free sequence comparison -a review, Bioinformatics, vol.19, issue.4, pp.513-523, 2003.

J. Wang and J. Zucker, Solving the multiple-instance problem: A lazy learning approach, Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00, pp.1119-1126, 2000.

W. Wang, Y. Ning, H. Rangwala, and N. Ramakrishnan, A multiple instance learning framework for identifying key sentences and detecting events, Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp.509-518, 2016.

M. S. Waterman, Identification of common molecular subsequence, Journal of Molecular Biology, vol.147, pp.195-197, 1981.

D. Wei, Q. Jiang, Y. Wei, W. , and S. , A novel hierarchical clustering algorithm for gene sequences, BMC bioinformatics, vol.13, issue.1, p.174, 2012.

X. Wei, J. Wu, and Z. Zhou, Scalable algorithms for multi-instance learning, IEEE transactions on neural networks and learning systems, vol.28, pp.975-987, 2016.

M. Woolfit and L. Bromham, Increased rates of sequence evolution in endosymbiotic bacteria and fungi with small effective population sizes, Molecular Biology and Evolution, vol.20, issue.9, pp.1545-1555, 2003.

Z. Xing, J. Pei, and E. Keogh, A brief survey on sequence classification, ACM SIGKDD Explorations Newsletter, vol.12, issue.1, pp.40-48, 2010.

O. Yakhnenko, A. Silvescu, and V. Honavar, Discriminatively trained markov model for sequence classification, Proceedings of the fifth IEEE International Conference on Data Mining, pp.498-505, 2005.

J. Yuan, X. Huang, H. Liu, B. Li, and W. Xiong, SubMIL: discriminative subspaces for multi-instance learning, Neurocomputing, vol.173, pp.1768-1774, 2016.

D. Zhang, Y. Liu, L. Si, J. Zhang, and R. D. Lawrence, Multiple instance learning on structured data, Advances in Neural Information Processing Systems, pp.145-153, 2011.

Q. Zhang and S. A. Goldman, EM-DD: an improved multiple-instance learning technique, Advances in neural information processing systems, pp.1073-1080, 2002.

Q. Zhang, Z. Shen, and D. Huang, Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network, Scientific reports, vol.9, issue.1, p.8484, 2019.

Z. Zhou, Y. Sun, and Y. Li, Multi-instance learning by treating instances as non-iid samples, Proceedings of the 26th annual international conference on machine learning, pp.1249-1256, 2009.

M. Zoghlami, S. Aridhi, M. Maddouri, and E. M. Nguifo, ABClass: une approche d'apprentissage multi-instances pour les séquences(ABClass: A multiple instance learning approach for sequence data), Actes de la Conférence Nationale d'Intelligence Artificielle et Rencontres des Jeunes Chercheurs en Intelligence Artificielle (CNIA+RJCIA 2018), pp.10-18, 2018.

M. Zoghlami, S. Aridhi, M. Maddouri, and E. M. Nguifo, An overview of in silico methods for the prediction of ionizing radiation resistance in bacteria, Ionizing Radiation: Advances in Research and Applications, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01807944

M. Zoghlami, S. Aridhi, M. Maddouri, and E. M. Nguifo, Multiple instance learning for sequence data with across bag dependencies, International Journal of Machine Learning and Cybernetics, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02393742

M. Zoghlami, S. Aridhi, M. Maddouri, and E. M. Nguifo, A structure based multiple instance learning approach for bacterial ionizing radiation resistance prediction, 23rd International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02307048

A. Table, 2: Number of occurrences of each type of protein sequence in the positive and negative bags

, Protein ID Positive bags Negative bags P1