Skip to Main content Skip to Navigation
Book sections

KNN-LC: Classification in Unbalanced Datasets using a KNN-Based Algorithm and Local Centralities

Abstract : Classification is one of the most central topics in machine learning. Yet, most of the algorithms that solve the classification problem operate under the assumption that the training datasets are balanced. While this assumption is reasonable for many classification problems, it is often not valid. For example, application domains such as fraud and spam detection are characterized by highly unbalanced classes where the examples of malicious items are far less numerous then the benign ones. This paper proposes a KNN-based algorithm adapted to unbalanced classes. The algorithm precomputes distances in the training set as well as a centrality score for every training item. It then weights the distances between the items to be classified and their K-nearest training neighbors, accounting for the distribution of distances in every class and the centrality (and outlierness) of neighbors. This reduces the noise from outliers of the majority class and enhances the weights of central data points allowing the proposed algorithm to achieve high accuracy in addition to high TPR in the minority class.
Document type :
Book sections
Complete list of metadatas

https://hal-utt.archives-ouvertes.fr/hal-02330150
Contributor : Jean-Baptiste Vu Van <>
Submitted on : Wednesday, October 23, 2019 - 6:45:00 PM
Last modification on : Thursday, October 24, 2019 - 1:44:53 AM

Identifiers

Collections

CNRS | ROSAS | UTT

Citation

Omar Jaafor, Babiga Birregah. KNN-LC: Classification in Unbalanced Datasets using a KNN-Based Algorithm and Local Centralities. Data-Driven Modeling for Sustainable Engineering, pp.85-97, In press, ⟨10.1007/978-3-030-13697-0_7⟩. ⟨hal-02330150⟩

Share

Metrics

Record views

110