, Constrained Laplacian score (CLS), and ReliefF-Sc the proposed semi-supervised margin-based constrained feature selection method de, -1 (CS1)

, ReliefF-Sc is applied in three versions: (1) alone with RCG denoted ReliefF-Sc+RCG, (2) after applying ACS

N. Verónica-bolón-canedo, A. Sánchez-maroño, and . Alonso-betanzos, Recent advances and emerging challenges of feature selection in the context of big data. Knowledge-Based Systems, vol.86, pp.33-45, 2015.

R. J. Urbanowicz, M. Meeker, W. Lacava, R. S. Olson, and J. H. Moore, Relief-based feature selection: introduction and review, Journal of Biomedical Informatics, vol.85, pp.189-203, 2018.

D. Dua and C. Graff, UCI Machine Learning Repository. School of Information and Computer Sciences, 2019.

I. Kononenko, Estimating attributes: analysis and extensions of RELIEF, Proceedings of the European Conference on Machine Learning (ECML'94), pp.6-8, 1994.

M. Robnik, -. ?ikonja, and I. Kononenko, Comprehensible interpretation of relief's estimates, Proceedings of the 18th International Conference on Machine Learning (ICML'01), pp.433-473, 2001.

Y. Sun and J. Li, Iterative RELIEF for feature weighting, Proceedings of the 23rd International Conference on Machine Learning (ICML'06), pp.25-29, 2006.

T. T. Le, R. J. Urbanowicz, J. H. Moore, and B. Mckinney, STatistical Inference Relief (STIR) feature selection, Bioinformatics, vol.35, issue.8, pp.1358-1365, 2019.

H. Liu and H. Motoda, Computational methods of feature selection, Data Mining and Knowledge Discovery Series, 2007.

J. Tang, S. Alelyani, and H. Liu, Feature selection for classification: A review, Data Classification: Algorithms and Applications, pp.37-64, 2014.

R. Sheikhpour, A. Mehdi, S. Sarram, M. Gharaghani, and . Chahooki, A survey on semi-supervised feature selection methods, Pattern Recognition, vol.64, pp.141-158, 2017.

M. Dash and H. Liu, Feature selection for classification, Intelligent Data Analysis, vol.1, issue.1-4, pp.131-156, 1997.

H. Liu and L. Yu, Toward integrating feature selection algorithms for classification and clustering, IEEE Transactions on Knowledge and Data Engineering, vol.17, issue.4, pp.491-502, 2005.

J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino et al., Feature selection: A data perspective, ACM Computing Surveys (CSUR), vol.50, issue.6, p.94, 2017.

J. Miao and L. Niu, A survey on feature selection, Procedia Computer Science, vol.91, pp.919-926, 2016.

M. N. Benyamin-ghojogh, . Samad, T. Sayema-asif-mashhadi, W. Kapoor, F. Ali et al., Feature selection and feature extraction in pattern analysis: A literature review, Computing Research, 2019.

R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals of Eugenics, vol.7, issue.2, pp.179-188, 1936.

S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. Mullers, Fisher discriminant analysis with kernels, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468), pp.41-48, 1999.

E. Barshan, A. Ghodsi, Z. Azimifar, and M. Jahromi, Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds, Pattern Recognition, vol.44, issue.7, pp.1357-1371, 2011.

K. Pearson, LIII. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and, Journal of Science, vol.2, issue.11, pp.559-572, 1901.

A. A. Michael, T. F. Cox, and . Cox, Multidimensional scaling, 2008.

J. B. Tenenbaum, V. D. Silva, and J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, vol.290, issue.5500, pp.2319-2323, 2000.

T. Sam, L. K. Roweis, and . Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, vol.290, issue.5500, pp.2323-2326, 2000.

L. Van-der-maaten and G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research (JMLR, vol.9, pp.2579-2605, 2008.

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research (JMLR), vol.3, pp.1157-1182, 2003.

J. Cai, J. Luo, S. Wang, and S. Yang, Feature selection in machine learning: A new perspective, Neurocomputing, vol.300, pp.70-79, 2018.

L. Yu and H. Liu, Feature selection for high-dimensional data: A fast correlation-based filter solution, Proceedings of the 20th International Conference on Machine Learning (ICML'03), pp.21-24, 2003.

H. Peng, F. Long, and C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.8, pp.1226-1238, 2005.

Y. Li, T. Li, and H. Liu, Recent advances in feature selection and its applications, Knowledge and Information Systems, vol.53, issue.3, pp.551-577, 2017.

J. Liang, F. Wang, C. Dang, and Y. Qian, A group incremental approach to feature selection applying rough set technique, IEEE Transactions on Knowledge and Data Engineering, vol.26, issue.2, pp.294-308, 2014.

P. Moradi and M. Rostami, A graph theoretic approach for unsupervised feature selection, Engineering Applications of Artificial Intelligence, vol.44, pp.33-45, 2015.

M. Liu and D. Zhang, Sparsity score: A novel graph-preserving feature selection method, International Journal of Pattern Recognition and Artificial Intelligence, vol.28, issue.04, p.1450009, 2014.

U. Von and L. , A tutorial on spectral clustering, Statistics and Computing, vol.17, issue.4, pp.395-416, 2007.

S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang et al., Graph embedding and extensions: A general framework for dimensionality reduction, IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.29, issue.1, pp.40-51, 2007.

C. Cortes and M. Mohri, On transductive regression, Proceedings of the 19th International Conference on Neural Information Processing Systems, pp.4-7, 2006.

Z. Zhao and H. Liu, Spectral feature selection for supervised and unsupervised learning, Proceedings of the 24th International Conference on Machine Learning, pp.20-24, 2007.

R. K. Fan, F. C. Chung, and . Graham, Spectral graph theory, American Mathematical Society, vol.92, 1997.

M. Robnik, -. ?ikonja, and I. Kononenko, Theoretical and empirical analysis of ReliefF and RReliefF, Machine Learning, vol.53, pp.23-69, 2003.

X. He, D. Cai, and P. Niyogi, Laplacian score for feature selection, Proceedings of the 18th International Conference on Neural Information Processing Systems, vol.18, pp.5-8, 2005.

Z. Zhao and H. Liu, Semi-supervised feature selection via spectral analysis, Proceedings of the 7th SIAM International Conference on Data Mining, pp.26-28, 2007.

D. Zhang, S. Chen, and Z. Zhou, Constraint score: A new filter method for feature selection with pairwise constraints, Pattern Recognition, vol.41, issue.5, pp.1440-1451, 2008.

L. Zelnik, -. Manor, and P. Perona, Self-tuning spectral clustering, Proceedings of the 17th International Conference on Neural Information Processing Systems, pp.13-18, 2004.

A. Zheng, H. Zhao, and . Liu, Spectral Feature Selection for Data Mining (Open Access). Data Mining and Knowledge Discovery Series, 2011.

M. José, F. Sotoca, and . Pla, Supervised feature selection by clustering using conditional mutual information-based distances, Pattern Recognition, vol.43, issue.6, pp.2068-2081, 2010.

S. Solorio-fernández, J. Carrasco-ochoa, and J. , A review of unsupervised feature selection methods, Artificial Intelligence Review, pp.1-42, 2019.

A. K. Farahat, A. Ghodsi, and M. S. Kamel, An efficient greedy method for unsupervised feature selection, Proceedings of the 11th IEEE International Conference on Data Mining (ICDM), pp.11-14, 2011.

Y. Li, B. Lu, and Z. Wu, A hybrid method of unsupervised feature selection based on ranking, Proceedings of the 18th IEEE International Conference on Pattern Recognition (ICPR'06), vol.2, pp.20-24, 2006.

M. Liu, D. Sun, and D. Zhang, Sparsity score: A new filter feature selection method based on graph, Proceedings of the 21st IEEE International Conference on Pattern Recognition, pp.959-962

V. , M. Rao, and V. N. Sastry, Unsupervised feature ranking based on representation entropy, Proceedings of the 1st IEEE International Conference on Recent Advances in Information Technology (RAIT), pp.15-17, 2012.

K. Kira and . Larry-a-rendell, The feature selection problem: Traditional methods and a new algorithm, AAAI, pp.12-16, 1992.

Y. Sun, Iterative relief for feature weighting: algorithms, theories, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.6, pp.1035-1051, 2007.

Q. Liu, J. Zhang, J. Xiao, H. Zhu, and Q. Zhao, A supervised feature selection algorithm through minimum spanning tree clustering, Proceedings of the 26th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp.10-12, 2014.

. Vinh-truong-hoang, Multi color space LBP-based feature selection for texture classification, 2018.

M. Kalakech, P. Biela, L. Macaire, and D. Hamad, Constraint scores for semi-supervised feature selection: A comparative study. Pattern Recognition Letters, vol.32, pp.656-665, 2011.

K. Benabdeslem and M. , Efficient semi-supervised feature selection: constraint, relevance, and redundancy, IEEE Transactions on Knowledge and Data Engineering, vol.26, issue.5, pp.1131-1143, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01301033

Y. Wang, J. Wang, H. Liao, and H. Chen, An efficient semisupervised representatives feature selection algorithm based on information theory, Pattern Recognition, vol.61, pp.511-523, 2017.

X. Yang, L. He, D. Qu, and W. Zhang, Semi-supervised minimum redundancy maximum relevance feature selection for audio classification, Multimedia Tools and Applications, vol.77, issue.1, pp.713-739, 2018.

I. Davidson, K. L. Wagstaff, and S. Basu, Measuring constraint-set utility for partitional clustering algorithms, Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), pp.18-22, 2006.

M. Hindawi, K. Allab, and K. Benabdeslem, Constraint selection-based semi-supervised feature selection, Proceedings of the 11th IEEE International Conference on Data Mining (ICDM), pp.11-14, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00874960

M. Hindawi, Feature selection for semi-supervised data analysis in decisional information systems, 2013.
URL : https://hal.archives-ouvertes.fr/tel-01371515

. Md, M. Kabir, K. Islam, and . Murase, A new wrapper feature selection approach using neural network, Neurocomputing, vol.73, pp.3273-3283, 2010.

A. E. Akadi and A. Amine, Abdeljalil El Ouardighi, and Driss Aboutajdine. A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowledge and Information Systems, vol.26, pp.487-500, 2011.

Z. Ma, Y. Yang, F. Nie, J. Uijlings, and N. Sebe, Exploiting the entire feature space with sparsity for automatic image annotation, Proceedings of the 19th ACM International Conference on Multimedia, pp.283-292, 2011.

C. Shi, Q. Ruan, and G. An, Sparse feature selection based on graph laplacian for web image annotation, Image and Vision Computing, vol.32, issue.3, pp.189-201, 2014.

C. M. Bishop, Neural Networks: A Pattern Recognition Perspective, Handbook of Neural Computation edition, 1996.

M. Dash and H. Liu, Feature selection for clustering, Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp.110-121, 2000.

K. Kira and L. A. Rendell, A practical approach to feature selection, Proceedings of the 9th International Workshop on Machine Learning, pp.1-3, 1992.

Y. Yang, Z. Heng-tao-shen, Z. Ma, X. Huang, and . Zhou, L2,1-norm regularized discriminative feature selection for unsupervised, Proceedngs of the 22nd International Joint Conference on Artificial Intelligence (IJCAI'11), pp.16-22, 2011.

A. Mark and . Hall, Correlation-based feature selection for machine learning, 1999.

P. A. Estévez, M. Tesmer, C. A. Perez, and J. M. Zurada, Normalized mutual information feature selection, IEEE Transactions on Neural Networks, vol.20, issue.2, pp.189-201, 2009.

H. Liu, J. Sun, L. Liu, and H. Zhang, Feature selection with dynamic mutual information, Pattern Recognition, vol.42, issue.7, pp.1330-1339, 2009.

Y. Jiang and J. Ren, Eigenvalue sensitive feature selection, Proceedings of the 28th International Conference on Machine Learning (ICML'11), pp.89-96, 2011.

M. Masaeli, Y. Yan, Y. Cui, G. Fung, and J. G. Dy, Convex principal feature selection, Proceedings of the 10th SIAM International Conference on Data Mining (SDM), pp.619-628, 2010.

S. Maldonado and R. Weber, A wrapper method for feature selection using support vector machines, Information Sciences, vol.179, issue.13, pp.2208-2217, 2009.

S. Solorio-fernández, J. Carrasco-ochoa, and J. , A new hybrid filter-wrapper feature selection method for clustering based on ranking, Neurocomputing, vol.214, pp.866-880, 2016.

T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, vol.13, issue.1, pp.21-27, 1967.

N. Cristianini and J. Shawe-taylor, An introduction to support vector machines and other kernel-based learning methods, 2000.

H. George, P. John, and . Langley, Estimating continuous distributions in bayesian classifiers, Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp.18-20, 1995.

R. Quinlan, C4.5: Programs for Machine Learning, 1993.

J. Zhao, K. Lu, and X. He, Locality sensitive semi-supervised feature selection, Neurocomputing, vol.71, pp.1842-1849, 2008.

C. Ding and H. Peng, Minimum redundancy feature selection from microarray gene expression data, Journal of Bioinformatics and Computational Biology, vol.3, issue.02, pp.185-205, 2005.

J. Michael, U. V. Kearns, and . Vazirani, An Introduction to Computational Learning Theory, 1994.

Z. Zhao, L. Wang, and H. Liu, Efficient spectral feature selection with minimum redundancy, Proceedings of the 24th AAAI Conference on Artificial Intelligence, pp.11-15, 2010.

J. Weston, A. Elisseeff, B. Schölkopf, and M. Tipping, Use of the zero-norm with linear models and kernel methods, Journal of Machine Learning Research, vol.3, pp.1439-1461, 2003.

L. Yu and H. Liu, Efficient feature selection via analysis of relevance and redundancy, Journal of Machine Learning Research, vol.5, pp.1205-1224, 2004.

Z. Zhao, L. Wang, H. Liu, and J. Ye, On similarity preserving feature selection, IEEE Transactions on Knowledge and Data Engineering, vol.25, pp.619-632, 2011.

S. Inderjit, S. Dhillon, R. Mallela, and . Kumar, A divisive information-theoretic feature clustering algorithm for text classification, Journal of Machine Learning Research, vol.3, pp.1265-1287, 2003.

D. Ienco and R. Meo, Exploration and reduction of the feature space by hierarchical clustering, Proceedings of the 8th SIAM International Conference on Data Mining, pp.24-26, 2008.

A. Leo, W. H. Goodman, and . Kruskal, Measures of association for cross classifications III: Approximate sampling theory, Journal of the American Statistical Association, vol.58, issue.302, pp.310-364, 1963.

W. Au, C. C. Keith, A. K. Chan, Y. Wong, and . Wang, Attribute clustering for grouping, selection, and classification of gene expression data

, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.2, issue.2, pp.83-101, 2005.

Q. Song, J. Ni, and G. Wang, A fast clustering-based feature subset selection algorithm for high-dimensional data, IEEE Transactions on Knowledge and Data Engineering, vol.25, issue.1, pp.1-14, 2011.

H. Liu, X. Wu, and S. Zhang, Feature selection using hierarchical feature clustering, Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM '11), pp.24-28, 2011.

X. Zhao, W. Deng, and Y. Shi, Feature selection with attributes clustering by maximal information coefficient, Procedia Computer Science, vol.17, pp.70-79, 2013.

H. Liu and H. Motoda, Feature extraction, construction and selection: A data mining perspective. The Springer International Series in Engineering and Computer Science, 1998.

Y. Sun, S. Todorovic, and S. Goodison, Local-learning-based feature selection for high-dimensional data analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1610-1626, 2010.

D. W. Aha, Incremental constructive induction: An instance-based approach, Machine Learning Proceedings, pp.117-121, 1991.

J. P. Callan, T. Fawcett, and E. L. Rissland, CABOT: An adaptive approach to case-based search, Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI), vol.2, pp.24-30, 1991.

I. Kononenko, E. ?imec, and M. Robnik-?ikonja, Overcoming the myopia of inductive learning algorithms with relieff, Applied Intelligence, vol.7, issue.1, pp.39-55, 1997.

H. Liu and H. Motoda, Non-myopic feature quality evaluation with (R)relieff, Computational Methods of Feature Selection, pp.169-191, 2007.

I. Kononenko, M. Robnik-sikonja, and U. Pompe, ReliefF for estimation and discretization of attributes in classification, regression, and ILP problems, Artificial Intelligence: Methodology, Systems, Applications, pp.31-40, 1996.

R. Kohavi and G. H. John, Wrappers for feature subset selection, Artificial Intelligence, vol.97, issue.1-2, pp.273-324, 1997.

M. Robnik, -. ?ikonja, and I. Kononenko, An adaptation of Relief for attribute estimation in regression, Proceedings of the 14th International Conference on Machine Learning (ICML'97), pp.8-12, 1997.

R. Gilad-bachrach, A. Navot, and N. Tishby, Margin based feature selection-theory and algorithms, Proceedings of the 21st International Conference on Machine Learning, pp.4-8, 2004.

R. E. Schapire, Y. Freund, P. Bartlett, and W. Lee, Boosting the margin: A new explanation for the effectiveness of voting methods, The Annals of Statistics, vol.26, issue.5, pp.1651-1686, 1998.

I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, Feature extraction: foundations and applications, 2008.

B. E. Boser, I. M. Guyon, and V. N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp.27-29, 1992.

K. Crammer, R. Gilad-bachrach, A. Navot, and N. Tishby, Margin analysis of the LVQ algorithm, Proceedings of the 15th International Conference on Neural Information Processing Systems, pp.9-14, 2002.

M. Yang and J. Song, A novel hypothesis-margin based approach for feature selection with side pairwise constraints, Neurocomputing, vol.73, pp.2859-2872, 2010.

B. Draper, C. Kaito, and J. Bins, Iterative relief, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, vol.6, pp.16-22, 2003.

A. Moujahid and F. Dornaika, Feature selection for spatially enhanced LBP: application to face recognition, International Journal of Data Science and Analytics, vol.5, issue.1, pp.11-18, 2018.

C. Stover and E. W. Weisstein, Closed-form solution, pp.2019-2023

K. P. Edwin, S. H. Chong, and . Zak, An introduction to optimization. Wiley series in discrete mathematics and optimization, Canada, 2013.

G. Mclachlan and T. Krishnan, The EM algorithm and extensions, Wiley Series in Probability and Statistics, 2007.

Y. Cheng, Y. Cai, Y. Sun, and J. Li, Semi-supervised feature selection under Logistic I-Relief framework, Proceedings of the 19th International Conference on Pattern Recognition (ICPR), pp.8-11, 2008.

B. Tang and L. Zhang, Semi-supervised feature selection based on logistic I-RELIEF for multi-classification, Proceedings of the 15th Pacific Rim International Conference on Artificial Intelligence, pp.28-31, 2018.

B. Tang and L. Zhang, Multi-class semi-supervised logistic I-RELIEF feature selection based on nearest neighbor, Proceedings of the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp.14-17, 2019.

J. Bins and B. A. Draper, Feature selection from huge feature sets, Proceedings of the 18th IEEE International Conference on Computer Vision (ICCV'01), pp.7-14, 2001.

Y. Sun, S. Todorovic, and S. Goodison, A feature selection algorithm capable of handling extremely large data dimensionality, Proceedings of the 8th SIAM International Conference on Data Mining, pp.24-26, 2008.

S. Hijazi, M. Kalakech, D. Hamad, and A. Kalakech, Feature selection approach based on hypothesis-margin and pairwise constraints, Proceedings of the IEEE Middle East and North Africa Communications Conference (MENA-COMM), pp.18-20, 2018.

C. Krier, D. François, F. Rossi, and M. Verleysen, Feature clustering and mutual information for the selection of variables in spectral data, Proceedings of the 15th European Symposium on Artificial Neural Networks (ESANN), pp.25-27, 2007.

J. Jiao, X. Mo, and C. Shen, Image clustering via sparse representation, Proceedings of the 16th International Conference on Multimedia Modeling, pp.6-8, 2010.

D. M. Witten and R. Tibshirani, A framework for feature selection in clustering, Journal of the American Statistical Association, vol.105, issue.490, pp.713-726, 2010.

X. Huang, L. Zhang, B. Wang, Z. Zhang, and F. Li, Feature weight estimation based on dynamic representation and neighbor sparse reconstruction, Pattern Recognition, vol.81, pp.388-403, 2018.

Y. Zhu, X. Zhang, R. Wang, W. Zheng, and Y. Zhu, Selfrepresentation and PCA embedding for unsupervised feature selection, World Wide Web, vol.21, issue.6, pp.1675-1688, 2018.

L. Qiao, S. Chen, and X. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recognition, vol.43, issue.1, pp.331-341, 2010.

J. Xu, G. Yang, H. Man, and H. He, L1 graph based on sparse coding for feature selection, Proceedings of the 10th International Symposium on Neural Networks, pp.4-6, 2013.

X. Zhu, X. Li, S. Zhang, C. Ju, and X. Wu, Robust joint graph sparse coding for unsupervised spectral feature selection, IEEE Transactions on Neural Networks and Learning Systems, vol.28, issue.6, pp.1263-1275, 2017.

C. Hou, F. Nie, X. Li, D. Yi, and Y. Wu, Joint embedding learning and sparse regression: A framework for unsupervised feature selection, IEEE Transactions on Cybernetics, vol.44, issue.6, pp.793-804, 2014.

A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algorithm, Proceedings of the 14th International Conference on Neural Information Processing Systems, pp.3-8, 2001.

J. Liu, S. Ji, and J. Ye, SLEP: Sparse learning with efficient projections, 2009.

T. Ronan, Z. Qi, and K. M. Naegle, Avoiding common pitfalls when clustering biological data, Science Signaling, vol.9, issue.432, pp.6-6, 2016.

U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra et al., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy of Sciences, vol.96, issue.12, pp.6745-6750, 1999.

T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, vol.286, issue.5439, pp.531-537, 1999.

J. Se and . Hong, Use of contextual information for feature ranking and discretization, IEEE Transactions on knowledge and Data Engineering, vol.9, issue.5, pp.718-730, 1997.

H. Liu, L. Yu, M. Dash, and H. Motoda, Active feature selection using classes, Proceedings of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp.474-485, 2003.

A. Kumar-shekar, T. Bocklisch, P. I. Sánchez, C. N. Straehle, and E. Müller, Including multi-feature interactions and redundancy for feature ranking in mixed datasets, Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.18-22, 2017.

C. Xiong, D. M. Johnson, and J. J. Corso, Active clustering with model-based uncertainty reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, issue.1, pp.5-17, 2017.

Z. Lu and M. A. Carreira-perpinan, Constrained spectral clustering through affinity propagation, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp.24-26, 2008.

K. Wagstaff, C. Cardie, S. Rogers, and S. Schrödl, Constrained kmeans clustering with background knowledge, Proceedings of the 18th International Conference on Machine Learning (ICML'01), pp.577-584, 2001.

P. Kumar-mallapragada, R. Jin, and A. K. Jain, Active query selection for semi-supervised clustering, Proceedings of the 19th IEEE International Conference on Pattern Recognition (ICPR), pp.8-11, 2008.

A. Ali-abin and H. Beigy, Active selection of clustering constraints: a sequential approach, Pattern Recognition, vol.47, issue.3, pp.1443-1458, 2014.

F. L. Wauthier, N. Jojic, and M. I. Jordan, Active spectral clustering via iterative uncertainty reduction, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.12-16, 2012.

K. Benabdeslem, H. Elghazel, and M. Hindawi, Ensemble constrained laplacian score for efficient and robust semi-supervised feature selection, Knowledge and Information Systems, vol.49, issue.3, pp.1161-1185, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01233971

W. Gilbert, J. Stewart, and . Sun, Matrix perturbation theory, 1990.

D. Klein, D. Sepandar, C. D. Kamvar, and . Manning, From instancelevel constraints to space-level constraints: Making the most of prior knowledge in data clustering, 2002.

Q. Xu, M. Desjardins, and K. L. Wagstaff, Active constrained clustering by examining spectral eigenvectors, Proceedings of the 8th International Conference on Discovery Science, pp.8-11, 2005.

Z. Fu and Z. Lu, Pairwise constraint propagation: A survey, 2015.

D. Sepandar, D. Kamvar, C. D. Klein, and . Manning, Spectral learning, Proceedings of the 18th International Joint Conference of Artificial Intelligence, pp.9-15, 2003.

Z. Lu, H. S. Horace, and . Ip, Constrained spectral clustering via exhaustive and efficient constraint propagation, Proceedings of the 11th European Conference on Computer Vision, pp.5-11, 2010.

J. Brendan, D. Frey, and . Dueck, Clustering by passing messages between data points, Science, vol.315, issue.5814, pp.972-976, 2007.

I. Givoni and B. Frey, Semi-supervised affinity propagation with instance-level constraints, Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, pp.16-18, 2009.

Z. Li, J. Liu, and X. Tang, Pairwise constraint propagation by semidefinite programming for semi-supervised classification, Proceedings of the 25th International Conference on Machine Learning, pp.5-9, 2008.

O. Zoidi, Person identity label propagation in stereo videos, Anastasios Tefas, Nikos Nikolaidis, and Ioannis Pitas, vol.16, pp.1358-1368, 2014.

D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, Learning with local and global consistency, Proceedings of the 16th International Conference on Neural Information Processing Systems, pp.8-13, 2003.

X. Wang, J. Wang, B. Qian, F. Wang, and I. Davidson, Self-taught spectral clustering via constraint augmentation, Proceedings of the 14th SIAM International Conference on Data Mining, pp.24-26, 2014.

S. Basu, A. Banerjee, and R. J. Mooney, Active semisupervision for pairwise constrained clustering, Proceedings of the 4th SIAM International Conference on Data Mining, pp.22-24, 2004.

Y. Jiang and J. Ren, Eigenvector sensitive feature selection for spectral clustering, Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.10-14, 2011.

L. Huang, D. Yan, N. Taft, and M. I. Jordan, Spectral clustering with perturbed data, Proceedings of the 21st International Conference on Neural Information Processing Systems, pp.8-11, 2008.

H. Ning, W. Xu, Y. Chi, Y. Gong, and T. Huang, Incremental spectral clustering by efficiently updating the eigen-system, Pattern Recognition, vol.43, pp.113-127, 2010.

H. Liu, H. Motoda, and L. Yu, A selective sampling approach to active feature selection, Artificial Intelligence, vol.159, issue.1-2, pp.49-74, 2004.

A. Rodriguez and A. Laio, Clustering by fast search and find of density peaks, Science, vol.344, issue.6191, pp.1492-1496, 2014.

S. Balakrishnama and A. Ganapathiraju, Linear discriminant analysis-a brief tutorial. Institute for Signal and information Processing, vol.18, pp.1-8, 1998.

S. Hijazi, M. Kalakech, D. Hamad, and A. Kalakech, Feature selection approach based on hypothesis-margin and pairwise constraints, IEEE Middle East and North Africa Communications Conference (MENACOMM), pp.1-6, 2018.

S. Hijazi, R. Fawaz, M. Kalakech, A. Kalakech, and D. Hamad, Top development indicators for middle eastern countries, IEEE Sixth International Conference on Digital Information, Networking, and Wireless Communications (DINWC), pp.98-102, 2018.

, Accepted and submitted Articles to, International Journal

S. Hijazi, D. Hamad, M. Kalakech, and A. Kalakech, Journal of Advances in Data Analysis and Classification. 40 pages, 2019.

S. Hijazi, D. Hamad, M. Kalakech, and A. Kalakech, A constrained feature selection approach based on feature clustering and hypothesis margin maximization, International Journal of Data Science and Analytics. 27 pages. Submitted on, vol.12, 2020.

, The Running time (in ms) for Simba-Sc when considering the maximum number of cannot-link constraints and the maximum number of starting points i.e. 220 for ColonCancer and 297 for Leukemia. Each constraint is randomly selected without replacement

, The Running times (in ms) for each of Fisher and Laplacian Scores averaged over 20 runs on the two high dimensional datasets, p.107

.. .. Datasets,

, Average execution time (in ms) of the different Unsupervised, Supervised, and Semi-supervised feature selection methods over 10 independent runs

. .. Datasets, , p.143

, The highest classification accuracy rates Acc (in %) obtained using 1-NN classifier on the features ranked by ReliefF-Sc algorithm with RCG and ACS, where d represents the dimension by which Acc is reached and F represents the original feature space

, The highest classification accuracy rates Acc (in %) obtained using 1-NN classifier on the features ranked by ReliefF-Sc before and after constraints propagation, with d representing the dimension where Acc is reached

, different numbers of selected features obtained by supervised (Fisher, ReliefF), unsupervised (Laplacian) and constrained ReliefF-Sc, Simba-Sc, CS1, CS2, CS4 and CLS algorithms with RCG and ACS on 2 UCI datasets: Wine; Segmentation; and 1 high-dimensional gene-expression dataset: ColonCancer. . 156 2.2 The two types of nearest neighbor margin denoted by ?. We show how to measure the margin (radius of dotted circles) with respect to a new data point considering a set of labeled data points; (a) The samplemargin i.e. the distance between the considered data point and the decision boundary. (b) The hypothesis, The highest accuracy rates Acc (in %) and their corresponding dimensions d using 1-NN and SVM classifiers vs, p.60

, The evolution of the nearmiss and nearhit notions from the supervised context to the semi-supervised constrained one

, Classification accuracy vs. different number of selected features on 4 UCI datasets: (a) Sonar; (b) Soybean; (c) Wine; (d) Heart. The number of constraints used for each dataset is, vol.101

, Classification Accuracy vs. different number of selected features on 2 high dimensional gene-expression datasets using 1NN: (a) ColonCancer

, The number of constraints used for each dataset is presented in Table 3.1

, Classification Accuracies after successive runs of Relief-Sc and Simba-Sc on (a) Wine; (b) Heart with a fixed set of cannot-link constraints, p.105

, Relief-Sc and the proposed FCRSC over 10 independent runs on (a) Wine, (c) WDBC, and (e) Ionosphere datasets. The averaged Representation Entropy (RE) of each dataset is also shown in (b), (d), and (f) respectively, The averaged classification accuracy rates using C4.5 classifier vs. the number of ranked features obtained by the constrained algorithms: Simba-Sc

, Relief-Sc and the proposed FCRSC over 10 independent runs on (a) Spambase, (c) Sonar, and (e) Arrhythmia datasets. The averaged Representation Entropy (RE) of each dataset is also shown in (b), (d), and (f) respectively, The averaged classification accuracy rates using C4.5 classifier vs. the number of ranked features obtained by the constrained algorithms: Simba-Sc

. .. , 114 3.8 WDBC dataset: the averaged classification accuracy rates using (a) K-NN, (b) SVM, and (c) NB classifiers vs. the number of ranked features obtained by Variance score, Laplacian score, ReliefF, mRMR and the proposed FCRSC over 10 independent runs, Wine dataset: the averaged classification accuracy rates using (a) K-NN, (b) SVM, and (c)

, Ionosphere dataset: the averaged classification accuracy rates using (a) K-NN, (b) SVM, and (c) NB classifiers vs. the number of ranked features obtained by Variance score, Laplacian score, ReliefF, mRMR and the proposed FCRSC over 10 independent runs

, Spambase dataset: the averaged classification accuracy rates using (a) K-NN, (b) SVM, and (c) NB classifiers vs. the number of ranked features obtained by Variance score, Laplacian score, ReliefF, mRMR and the proposed FCRSC over 10 independent runs

, Sonar dataset: the averaged classification accuracy rates using (a) K-NN, (b) SVM, and (c) NB classifiers vs. the number of ranked features obtained by Variance score, Laplacian score, ReliefF, mRMR and the proposed FCRSC over 10 independent runs

, Arrhythmia dataset: the averaged classification accuracy rates using (a) K-NN, (b) SVM, and (c) NB classifiers vs. the number of ranked features obtained by Variance score, Laplacian score, ReliefF, mRMR and the proposed FCRSC over 10 independent runs

, 3D-scatter of data points; (c),(d),(e) Data projected on features A1, A2, A3 respectively; (i) shows changes in v 2 during ACS; (g) shows the ACS by Algorithm 4.1; (f) shows RCG and (h) shows the results of features ranked by different methods, Three-dimensional Example Dataset: (a) Dataset, (b)

, The effects of propagating cannot-link constraints to the data in (a): knowing that the true partitioning is shown in (c) and that Feature 2 should be selected, two instance-level constraints only may be not enough to select Feature 2 based on Relief family distance-based feature selection

. .. , Whereas, a stronger space-level propagation changes results to clearly select Feature 2 as the data discriminative feature (c), p.136

, The resulting propagation of our previous actively selected constraints: (a) Propagation of initial random constraints (6, 2) and (6, 1), (b) Propagation of actively selected constraints (5, 2) and (3, 6) by Algorithm 4.1, and (c) Improved results of feature selection after propagation of constraints

, Relationship between the terms used in the Performance measures: Precision and Distance

. .. Leukemia, Accuracy rates using 1-NN classifier vs. different number of selected features obtained by ReliefF-Sc+RCG and ReliefF-Sc+ACS on 4 UCI datasets: (a) Soybean; (b) Wine; (c) Heart; (d) Sonar; and on 2 high dimensional gene-expression datasets: (e) ColonCancer; (f), p.148

. .. Leukemia, Accuracy rates using 1-NN classifier vs. different number of selected features obtained by ReliefF-Sc+ACS and ReliefF-Sc+PACS on 4 UCI datasets: (a) Soybean; (b) Wine; (c) Heart; (d) Sonar; and 2 high-dimensional gene-expression datasets: (e) ColonCancer; (f), p.150

, Accuracy rates over the first 50% ranked features using 1-NN classifier vs

. .. Coloncancer, Accuracy rates using 1-NN and SVM classifiers vs. different number of selected features obtained by supervised (Fisher, ReliefF), unsupervised (Laplacian) and constrained ReliefF-Sc and Simba-Sc algorithms with RCG and ACS on 2 UCI datasets: (a,b) Wine; (c,d) Segmentation; and 1 high-dimensional gene-expression dataset, p.154

, Simba-Sc algorithms with RCG and ACS followed by 1-NN and SVM classifiers vs. different number of constraints on 2 UCI datasets: (a,b) Wine; (c,d) Segmentation; and 1 high-dimensional gene-expression dataset: (e,f) ColonCancer. Fisher, ReliefF and Laplacian do not use constraints and are used as supervised and unsupervised baselines, Accuracy rates obtained on the first 50% ranked features by the marginbased constrained ReliefF-Sc and

. Relieff-sc, . Simba-sc, . Cs1, and C. Cs2, 197 B Trois exemples illustrant les notions de pertinence et de redondance. (a) montre deux attributs non-pertinents; (b) montre un attribut pertinent (attribut 2) et un attribut non-pertinent (attribut 1); et (c) montre deux attributs redondants, Three groups of two bar graphs each. Every two graphs in a row respectively show the Precision Pr and Distance Dt measures of the first 50% ranked features by

. Dans-le-premier-chapitre, nous présentons les définitions de réduction de dimensionnalité, d'extraction d'attributs, de sélection d'attributs, de pertinence et de redondance d'attributs. Nous présentons également les principales notations de données et la représentation des connaissances, ainsi que les méthodes de construction des données à base de graphes. En outre, nous classons le processus de sélection d'attributs en fonction de la disponibilité des informations de supervision (labellisation de classe et contraintes par paires) et en fonction du critère de performance de l'évaluation

. Dans-le-deuxième-chapitre, nous présentons une bibliographie sur les algorithmes Relief de type filtre les plus populaires et nous mettons l'accent sur l'importance des marges dans ces algorithmes. L'algorithme Relief supervisé original est expliqué en détail en mettant l'accent sur ses points forts, ses points faibles et sur ses applications dans différents contextes en tant qu'algorithme sensible au contexte. Nous couvrons également toutes les variantes et extensions de Relief suggérées pour traiter les problèmes de données bruitées, incomplètes et à classes multiples. Le chapitre est divisé en quatre sections principales

. Dans-le-troisième-chapitre,

, Relief-Sc et sa version robuste ReliefF-Sc. Leur objectif est de réduire la haute dimensionnalité des données en trouvant un sous-ensemble d'attributs pertinents uniques