Title
A similarity-based approach for data stream classification
Date Issued
01 July 2014
Access level
metadata only access
Resource Type
journal article
Author(s)
University Pablo de Olavide
Abstract
Incremental learning techniques have been used extensively to address the data stream classification problem. The most important issue is to maintain a balance between accuracy and efficiency, i.e.; the algorithm should provide good classification performance with a reasonable time response. This work introduces a new technique, named Similarity-based Data Stream Classifier (SimC), which achieves good performance by introducing a novel insertion/removal policy that adapts quickly to the data tendency and maintains a representative, small set of examples and estimators that guarantees good classification rates. The methodology is also able to detect novel classes/labels, during the running phase, and to remove useless ones that do not add any value to the classification process. Statistical tests were used to evaluate the model performance, from two points of view: efficacy (classification rate) and efficiency (online response time). Five well-known techniques and sixteen data streams were compared, using the Friedman's test. Also, to find out which schemes were significantly different, the Nemenyi's, Holm's and Shaffer's tests were considered. The results show that SimC is very competitive in terms of (absolute and streaming) accuracy, and classification/updating time, in comparison to several of the most popular methods in the literature. © 2014 Elsevier Ltd. All rights reserved.
Start page
4224
End page
4234
Volume
41
Issue
9
Language
English
OCDE Knowledge area
Informática y Ciencias de la Información
Scopus EID
2-s2.0-84893460871
Source
Expert Systems with Applications
ISSN of the container
09574174
DOI of the container
10.1016/j.eswa.2013.12.041
Source funding
Ministerio de Ciencia e Innovación
Asociación Universitaria Iberoamericana de Postgrado
Sponsor(s)
Special thanks to Agencia Universitaria Iberoamericana de Postgrado (AUIP) for funding the research visits, and the Spanish Ministry of Science and Innovation for supporting the research under Grant TIN2011-28956-C02-01. We are grateful to the anonymous referees for their invaluable suggestions to improve the paper.
Sources of information: Directorio de Producción Científica Scopus