Title
Cluster of reuters 21578 collections using genetic algorithms and NZIPF method
Date Issued
01 January 2009
Access level
metadata only access
Resource Type
conference paper
Author(s)
Del Castillo J.R.F.
Sotos L.G.
University of Alcalá
Abstract
In this paper, we discuss a feature reduction technique and their application to document clustering, showing that feature reduction improves efficiency as well as accuracy. We select the terms starting from the Goffman point, selecting an area of suitable transition making use for it of the Zipf law (our method is called NZIPF). Finally, we demonstrate experimentally that the transition zone that provides better results is taking 40 terms starting from the Goffman point for a cluster of documents with a genetic algorithm non-supervised. The experiments are carried out with the collection Reuters 21578 and the results are grouped by new genetic operators designed to find the affinity and similarity of the documents without having prior knowledge of other characteristics. © 2009 IADIS.
Start page
174
End page
176
Language
English
OCDE Knowledge area
Otras ingenierías y tecnologías
Scopus EID
2-s2.0-77955621567
ISBN
9789728924881
Source
Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09 Part of the IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2009
Sources of information: Directorio de Producción Científica Scopus