Skip to Main content Skip to Navigation
Theses

Semi-supervised Margin-based Feature Selection for Classification

Abstract : Feature selection is a preprocessing step crucial to the performance of machine learning algorithms. It allows reducing computational costs, improving classification performances and building simple and understandable models. Recently, using pairwise constraints, a cheaper kind of supervision information that does not need to reveal the class labels of data points, received a great deal of interest in the domain of feature selection. Accordingly, we first proposed a semi-supervised margin-based constrained feature selection algorithm called Relief-Sc. It is a modification of the well-known Relief algorithm from its optimization perspective. It utilizes cannot-link constraints only to solve a simple convex problem in a closed-form giving a unique solution. However, we noticed that in the literature these pairwise constraints are generally provided passively and generated randomly over multiple algorithmic runs by which the results are averaged. This leads to the need for a large number of constraints that might be redundant, unnecessary, and under some circumstances even inimical to the algorithm’s performance. It also masks the individual effect of each constraint set and introduces a human labor-cost burden. Therefore, we suggested a framework for actively selecting and then propagating constraints for feature selection. For that, we made use of the similarity matrix based on Laplacian graph. We assumed that when a small perturbation of the similarity value between a data couple leads to a more well-separated cluster indicator based on the second eigenvector of the graph Laplacian, this couple is expected to be a pairwise query of higher and more significant impact. Constraints propagation, on the other side, ensures increasing supervision information while decreasing the cost of human labor. Besides, for the sake of handling feature redundancy, we proposed extending Relief- Sc to a feature selection approach that combines feature clustering and hypothesis margin maximization. This approach is able to deal with the two core aspects of feature selection i.e. maximizing relevancy while minimizing redundancy (maximizing diversity) among features. Eventually, we experimentally validate our proposed algorithms in comparison to other known feature selection methods on multiple well-known UCI benchmark datasets which proved to be prominent. Only with little supervision information, the proposed algorithms proved to be comparable to supervised feature selection algorithms and were superior to the unsupervised ones.
Complete list of metadatas

Cited literature [204 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02489733
Contributor : Abes Star :  Contact
Submitted on : Monday, February 24, 2020 - 4:01:08 PM
Last modification on : Friday, October 23, 2020 - 4:45:54 PM
Long-term archiving on: : Monday, May 25, 2020 - 6:17:34 PM

File

these_Hijazi_Samah.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02489733, version 1

Collections

Citation

Samah Hijazi. Semi-supervised Margin-based Feature Selection for Classification. Artificial Intelligence [cs.AI]. Université du Littoral Côte d'Opale; École Doctorale des Sciences et de Technologie (Beyrouth), 2019. English. ⟨NNT : 2019DUNK0546⟩. ⟨tel-02489733⟩

Share

Metrics

Record views

125

Files downloads

148