KNN-LC: Classification in Unbalanced Datasets using a KNN-Based Algorithm and Local Centralities

Omar Jaafor; B. Birregah

doi:10.1007/978-3-030-13697-0_7

Chapitre D'ouvrage Année : 2020

KNN-LC: Classification in Unbalanced Datasets using a KNN-Based Algorithm and Local Centralities

(1) , (1)

Omar Jaafor

Fonction : Auteur

Laboratoire Modélisation et Sûreté des Systèmes

B. Birregah

Fonction : Auteur
PersonId : 747491
IdHAL : babiga-birregah
ORCID : 0000-0002-1274-3034
IdRef : 181479141

Laboratoire Modélisation et Sûreté des Systèmes

Résumé

Classification is one of the most central topics in machine learning. Yet, most of the algorithms that solve the classification problem operate under the assumption that the training datasets are balanced. While this assumption is reasonable for many classification problems, it is often not valid. For example, application domains such as fraud and spam detection are characterized by highly unbalanced classes where the examples of malicious items are far less numerous then the benign ones. This paper proposes a KNN-based algorithm adapted to unbalanced classes. The algorithm precomputes distances in the training set as well as a centrality score for every training item. It then weights the distances between the items to be classified and their K-nearest training neighbors, accounting for the distribution of distances in every class and the centrality (and outlierness) of neighbors. This reduces the noise from outliers of the majority class and enhances the weights of central data points allowing the proposed algorithm to achieve high accuracy in addition to high TPR in the minority class.

Mots clés

Classification Unbalanced classes KNN Centrality Pagerank

Domaines

Algorithme et structure de données [cs.DS]

Jean-Baptiste VU VAN : Connectez-vous pour contacter le contributeur

https://utt.hal.science/hal-02330150

Soumis le : mercredi 23 octobre 2019-18:45:00

Dernière modification le : vendredi 12 janvier 2024-16:47:12

Dates et versions

hal-02330150 , version 1 (23-10-2019)

Identifiants

HAL Id : hal-02330150 , version 1
DOI : 10.1007/978-3-030-13697-0_7

Citer

Omar Jaafor, B. Birregah. KNN-LC: Classification in Unbalanced Datasets using a KNN-Based Algorithm and Local Centralities. Data-Driven Modeling for Sustainable Engineering, Springer, Cham, pp.85-97, 2020, 978-3-030-13697-0. ⟨10.1007/978-3-030-13697-0_7⟩. ⟨hal-02330150⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UTT UTT-LIST3N LM2S-UTT

104 Consultations

0 Téléchargements

KNN-LC: Classification in Unbalanced Datasets using a KNN-Based Algorithm and Local Centralities

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager