82 6.1.1 ABClass: a motif-based MIL approach for sequence data with across-bag dependencies ,
, ABSim: a similarity-based MIL approach for sequence data with across-bag dependencies
83 6.2.1 Short-term perspective ,
Automatic Discovery of Hidden Associations Using Vector Similarity : Application to Biological Annotation Prediction, 2018. ,
URL : https://hal.archives-ouvertes.fr/tel-01792299
, Truepera radiovictrix gen. nov., sp. nov., a new radiation resistant species and the proposal of trueperaceae fam. nov. FEMS microbiology letters, vol.247, pp.161-169, 2005.
Single-vs. multiple-instance classification, Pattern Recognition, vol.48, issue.9, pp.2831-2838, 2015. ,
Basic local alignment search tool, Journal of molecular biology, vol.215, issue.3, pp.403-410, 1990. ,
Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic acids research, vol.25, issue.17, pp.3389-3402, 1997. ,
Multiple instance classification: Review, taxonomy and comparative study, Artificial Intelligence, vol.201, pp.81-105, 2013. ,
MILDE: multiple instance learning by discriminative embedding, Knowledge and Information Systems, vol.42, issue.2, pp.381-407, 2015. ,
SCOP2 prototype: a new approach to protein structure mining, Nucleic acids research, vol.42, issue.1, pp.310-314, 2013. ,
Support Vector Machines for Multiple-Instance Learning, Advances in Neural Information Processing Systems, pp.561-568, 2003. ,
The InterPro database, an integrated documentation resource for protein families, domains and functional sites, Nucleic acids research, vol.29, issue.1, pp.37-40, 2001. ,
URL : https://hal.archives-ouvertes.fr/hal-00427125
UniProt: the universal protein knowledgebase, Nucleic acids research, vol.32, issue.1, pp.115-119, 2004. ,
Prediction of ionizing radiation resistance in bacteria using a multiple instance learning model, Journal of Computational Biology, vol.23, issue.1, pp.10-20, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01807946
Prints and its automatic supplement, preprints, Nucleic acids research, vol.31, issue.1, pp.400-402, 2003. ,
The Pfam protein families database, Nucleic acids research, vol.32, issue.1, pp.138-141, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-01294685
Using sequencespecific chemical and structural properties of DNA to predict transcription factor binding sites, PLoS computational biology, vol.6, issue.11, p.1001007, 2010. ,
, Genbank. Nucleic acids research, vol.41, issue.1, pp.36-42, 2012.
Using BLAT to find sequence similarity in closely related genomes, Current protocols in bioinformatics, vol.37, issue.1, pp.10-18, 2012. ,
Ionizing-radiation resistance in the desiccation-tolerant cyanobacterium chroococcidiopsis, Applied and Environmental Microbiology, vol.66, pp.1489-1492, 2002. ,
Motif-based protein sequence classification using neural networks, Journal of Computational Biology, vol.12, issue.1, pp.64-82, 2005. ,
Advances in Machine Learning for Processing and Comparison of Metagenomic Data, pp.295-329, 2014. ,
UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view, Plant Bioinformatics, pp.23-54, 2016. ,
Engineering deinococcus radiodurans for metal remediation in radioactive mixed waste environments, Nature biotechnology, vol.18, issue.1, pp.85-90, 2000. ,
Engineering deinococcus geothermalis for bioremediation of high-temperature radioactive waste environments, Applied and environmental microbiology, vol.69, issue.8, pp.4575-4582, 2003. ,
Nomenclature for "micrococcus radiodurans" and other radiation-resistant cocci: Deinococcaceae fam. nov. and deinococcus gen. nov., including five species, Journal of Systematic Bacteriology, vol.31, pp.353-360, 1981. ,
MILES: Multiple-instance learning via embedded instance selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.12, pp.1931-1947, 2006. ,
Protein classification based on text document classification techniques, Proteins: Structure, Function, and Bioinformatics, vol.58, issue.4, pp.955-970, 2005. ,
Multiple instance learning with bag dissimilarities, Pattern Recognition, vol.48, issue.1, pp.264-275, 2015. ,
Feature selection for genetic sequence classification, Bioinformatics, vol.14, issue.2, pp.139-143, 1998. ,
, Accumulation of Mn(II) in deinococcus radiodurans facilitates gamma-radiation resistance, vol.306, pp.1025-1028, 2004.
A model of evolutionary change in proteins, Atlas of protein sequence and structure, vol.5, pp.345-352, 1978. ,
Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, vol.89, issue.1-2, pp.31-71, 1997. ,
A comparison of multi-instance learning algorithms, 2006. ,
Getting started in gene orthology and functional analysis, PLoS computational biology, vol.6, issue.3, p.1000703, 2010. ,
MILKDE: A new approach for multiple instance learning based on positive instance selection and kernel density estimation, Engineering Applications of Artificial Intelligence, vol.59, pp.196-204, 2017. ,
, , 2014.
, Toward richer metadata for microbial sequences: replacing strain-level ncbi taxonomy taxids with bioproject, biosample and assembly records, Standards in genomic sciences, vol.9, issue.3, p.1275
Traitements ionisants et hautes pressions des aliments, Economica, pp.161-169, 2001. ,
1-Aminocyclopropane-1-carboxylate (ACC) deaminases from Methylobacterium radiotolerans and Methylobacterium nodulans with higher specificity for ACC, FEMS Microbiol Lett, vol.343, issue.1, pp.70-76, 2013. ,
, Characterization and radiation resistance of new isolates of rubrobacter radiotolerans and rubrobacter xylanophilus, vol.3, pp.235-238, 1999.
, 2017-beyond protein family and domain annotations, vol.45, pp.190-199, 2016.
The Pfam protein families database: towards a more sustainable future, Nucleic acids research, vol.44, issue.1, pp.279-285, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01294685
A review of multi-instance learning assumptions, The Knowledge Engineering Review, vol.25, issue.1, pp.1-25, 2010. ,
Radiation-resistant extremophiles and their potential in biotechnology and therapeutics, Applied microbiology and biotechnology, vol.97, issue.3, pp.993-1004, 2013. ,
,
Uniprot: A hub for protein information, Nucleic Acids Research, vol.43, pp.204-212, 2014. ,
A structure-based multiple-instance learning approach to predicting in vitro transcription factor-DNA interaction, BMC genomics, vol.16, pp.9-9, 2013. ,
Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment, Bioinformatics, vol.14, issue.2, pp.164-173, 1998. ,
Emendation of methylobacterium; methylobacterium rhodinum comb. nov. corrig.; methylobacterium radiotolerans comb. nov. corrig.; and methylobacterium mesophilicum comb, International Journal of Bacteriology, vol.33, pp.875-877, 1983. ,
Contrasted resistance of stone-dwelling geodermatophilaceae species to stresses known to give rise to reactive oxygen species, FEMS Microbiology Ecology, vol.80, issue.3, pp.566-577, 2012. ,
URL : https://hal.archives-ouvertes.fr/halsde-00722684
Tigrfams and genome properties in 2013, Nucleic acids research, vol.41, issue.1, pp.387-395, 2012. ,
The tigrfams database of protein families, Nucleic acids research, vol.31, issue.1, pp.371-373, 2003. ,
The weka data mining software: an update, ACM SIGKDD explorations newsletter, vol.11, issue.1, pp.10-18, 2009. ,
Completion of the genome Bibliography 93 sequence of brucella abortus and comparison to the highly similar genomes of brucella melitensis and brucella suis, Journal of Bacteriology, vol.187, issue.8, pp.2715-2726, 2005. ,
Amino acid substitution matrices from protein blocks, Proceedings of the National Academy of Sciences, vol.89, pp.10915-10919, 1992. ,
Multiple instance learning: foundations and algorithms, 2016. ,
MD-SVM: a novel SVM-based algorithm for the motif discovery of transcription factor binding sites, BMC bioinformatics, vol.20, issue.7, p.200, 2019. ,
, Taxonomic studies on a radio-resistant pseudomonas. Part XII. studies on the microorganisms of cereal grain. Agricultural and Biological Chemistry, vol.35, pp.1566-1571, 1971.
Isolation and identification of radiation-resistant cocci belonging to the genus deinococcus from sewage sludges and animal feeds, Agricultural and Biological Chemistry, vol.47, pp.1239-1247, 1983. ,
Efficient discovery of conserved patterns using a pattern graph, Bioinformatics, vol.13, issue.5, pp.509-522, 1997. ,
InterProScan 5: genomescale protein function classification, Bioinformatics, vol.30, issue.9, pp.1236-1240, 2014. ,
Computer-aided diagnosis from weak supervision: a benchmarking study, Computerized medical imaging and graphics, vol.42, pp.44-50, 2015. ,
The cutting-plane method for solving convex programs, Journal of the Society for Industrial and Applied Mathematics, pp.703-712, 1960. ,
BLAT -the BLAST-like alignment tool, Genome research, vol.12, issue.4, pp.656-664, 2002. ,
Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties, Bioinformatics, vol.16, issue.9, pp.767-775, 2000. ,
, UniProt archive, vol.20, pp.3236-3237, 2004.
Mining features for sequence classification, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.342-346, 1999. ,
SMART 7: recent updates to the protein domain annotation resource, Nucleic acids research, vol.40, issue.1, pp.302-305, 2011. ,
The EMBL-EBI bioinformatics web and programmatic tools framework, Nucleic acids research, vol.43, issue.1, pp.580-584, 2015. ,
Multiple instance learning based on positive instance selection and bag structure construction, Pattern Recognition Letters, vol.40, pp.19-26, 2014. ,
The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata, Nucleic Acids Research, vol.36, issue.1, pp.475-479, 2008. ,
Key instance detection in multi-instance learning, Journal of Machine Learning Research, vol.25, pp.253-268, 2012. ,
Comparison of 61 sequenced Escherichia coli genomes, Microbial ecology, vol.60, issue.4, pp.708-720, 2010. ,
Encoding of primary structures of biological macromolecules within a data mining perspective, Journal of Computer Science and Technology, vol.19, issue.1, pp.78-88, 2004. ,
Deinococcus geothermalis: the pool of extreme radiation resistance genes shrinks, PLoS One, vol.2, issue.9, p.955, 2007. ,
CDD: a conserved domain database for protein classification, Nucleic acids research, vol.33, issue.1, pp.192-196, 2005. ,
CDD: NCBI's conserved domain database, Nucleic acids research, vol.43, issue.1, pp.222-226, 2014. ,
A framework for multiple-instance learning, Proceedings of the 1997 conference on Advances in neural information processing systems 10, NIPS '97, pp.570-576, 1998. ,
Multiple-instance learning for natural scene classification, ICML, vol.98, pp.341-349, 1998. ,
MIRSVM: multi-instance support vector machine with bag representatives, Pattern Recognition, vol.79, pp.228-241, 2018. ,
Improving pairwise comparison of protein sequences with domain co-occurrence, PLoS computational biology, vol.14, issue.1, p.1005889, 2018. ,
URL : https://hal.archives-ouvertes.fr/lirmm-01744475
PANTHER version 11: expanded annotation data from gene Bibliography ontology and reactome pathways, and data analysis tool enhancements, Nucleic acids research, vol.45, issue.1, pp.183-189, 2016. ,
Predicting active site residue annotations in the Pfam database, BMC bioinformatics, vol.8, issue.1, p.298, 2007. ,
, , 2016.
, Genomes OnLine Database (GOLD) v. 6: data updates and feature enhancements, Nucleic acids research, p.992
SCOP: a structural classification of proteins database for the investigation of sequences and structures, Journal of molecular biology, vol.247, issue.4, pp.536-540, 1995. ,
A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of molecular biology, vol.48, issue.3, pp.443-453, 1970. ,
Applying MSSIM combined chaos game representation to genome sequences analysis, Genomics, vol.110, issue.3, pp.180-190, 2018. ,
CATH -a hierarchic classification of protein domain structures, Structure, vol.5, issue.8, pp.1093-1109, 1997. ,
Analysis of genetic association using hierarchical clustering and cluster validation indices, Genomics, vol.109, issue.5-6, pp.438-445, 2017. ,
On estimation of a probability density function and mode. The annals of mathematical statistics, vol.33, pp.1065-1076, 1962. ,
The CATH database: an extended protein family resource for structural and functional genomics, Nucleic acids research, vol.31, issue.1, pp.452-455, 2003. ,
Kineococcus radiotolerans sp. nov., a radiation-resistant, gram-positive bacterium, International journal of systematic and evolutionary microbiology, vol.52, issue.3, pp.933-938, 2002. ,
Sequential minimal optimization: A fast algorithm for training support vector machines, 1998. ,
NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy, Nucleic acids research, vol.40, issue.1, pp.130-135, 2011. ,
Phylogenetic diversity of the deinococci as determined by 16s ribosomal DNA sequence comparison, International Journal of Bacteriology, vol.47, pp.510-514, 1997. ,
Extensive diversity of ionizing-radiation-resistant bacteria recovered from sonoran desert soil and description of nine new species of the genus deinococcus obtained from a single soil sample, Applied and environmental microbiology, vol.71, issue.9, pp.5225-5235, 2005. ,
Learning statistical models for annotating proteins with function information using biomedical text, BMC bioinformatics, vol.6, issue.1, p.18, 2005. ,
Protein homology detection using string alignment kernels, Bioinformatics, vol.20, issue.11, pp.1682-1689, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00433587
prfectBLAST: a platformindependent portable front end for the command terminal BLAST+ stand-alone suite, BioTechniques, vol.53, issue.5, pp.299-300, 2012. ,
SMART, a simple modular architecture research tool: identification of signaling domains, Proceedings of the National Academy of Sciences, vol.95, issue.11, pp.5857-5864, 1998. ,
Basal dna repair machinery is subject to positive selection in ionizing-radiation-resistant bacteria, BMC genomics, vol.9, issue.1, p.297, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-01358559
Frequent-subsequence-based prediction of outer membrane proteins, Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp.436-445, 2003. ,
Oxidative stress resistance in deinococcus radiodurans, Microbiology and Molecular Biology Reviews, vol.75, pp.133-191, 2011. ,
Kernel methods for missing variables, Proceedings of International Workshop on Artificial Intelligence and Statistics, pp.325-332, 2005. ,
Dynamic programming: foundations and principles, 2010. ,
HMM-ModE-Improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences, BMC bioinformatics, vol.8, issue.1, p.104, 2007. ,
Sequence to sequence learning with neural networks, Advances in neural information processing systems, pp.3104-3112, 2014. ,
UniRef: comprehensive and non-redundant UniProt reference clusters, Bioinformatics, vol.23, issue.10, pp.1282-1288, 2007. ,
SVM-based generalized multiple-instance learning via approximate box counting, Proceedings of the twenty-first international conference on Machine learning, pp.799-806, 2004. ,
PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification, Nucleic acids research, vol.31, issue.1, pp.334-341, 2003. ,
Alignment-free sequence comparison -a review, Bioinformatics, vol.19, issue.4, pp.513-523, 2003. ,
Solving the multiple-instance problem: A lazy learning approach, Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00, pp.1119-1126, 2000. ,
A multiple instance learning framework for identifying key sentences and detecting events, Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp.509-518, 2016. ,
Identification of common molecular subsequence, Journal of Molecular Biology, vol.147, pp.195-197, 1981. ,
A novel hierarchical clustering algorithm for gene sequences, BMC bioinformatics, vol.13, issue.1, p.174, 2012. ,
Scalable algorithms for multi-instance learning, IEEE transactions on neural networks and learning systems, vol.28, pp.975-987, 2016. ,
Increased rates of sequence evolution in endosymbiotic bacteria and fungi with small effective population sizes, Molecular Biology and Evolution, vol.20, issue.9, pp.1545-1555, 2003. ,
A brief survey on sequence classification, ACM SIGKDD Explorations Newsletter, vol.12, issue.1, pp.40-48, 2010. ,
Discriminatively trained markov model for sequence classification, Proceedings of the fifth IEEE International Conference on Data Mining, pp.498-505, 2005. ,
SubMIL: discriminative subspaces for multi-instance learning, Neurocomputing, vol.173, pp.1768-1774, 2016. ,
Multiple instance learning on structured data, Advances in Neural Information Processing Systems, pp.145-153, 2011. ,
EM-DD: an improved multiple-instance learning technique, Advances in neural information processing systems, pp.1073-1080, 2002. ,
Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network, Scientific reports, vol.9, issue.1, p.8484, 2019. ,
Multi-instance learning by treating instances as non-iid samples, Proceedings of the 26th annual international conference on machine learning, pp.1249-1256, 2009. ,
ABClass: une approche d'apprentissage multi-instances pour les séquences(ABClass: A multiple instance learning approach for sequence data), Actes de la Conférence Nationale d'Intelligence Artificielle et Rencontres des Jeunes Chercheurs en Intelligence Artificielle (CNIA+RJCIA 2018), pp.10-18, 2018. ,
An overview of in silico methods for the prediction of ionizing radiation resistance in bacteria, Ionizing Radiation: Advances in Research and Applications, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01807944
Multiple instance learning for sequence data with across bag dependencies, International Journal of Machine Learning and Cybernetics, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02393742
A structure based multiple instance learning approach for bacterial ionizing radiation resistance prediction, 23rd International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02307048
2: Number of occurrences of each type of protein sequence in the positive and negative bags ,
, Protein ID Positive bags Negative bags P1