Title
Selection of statistically representative subset from a large data set
Date Issued
01 January 2017
Access level
metadata only access
Resource Type
conference paper
Author(s)
ALEXANDROV, MIKHAIL
SKITALINSKAYA, GABRIELLA
STEFANOVSKIY, DMITRY
Autonomous University of Barcelona
Moscow Institute of Physics and Technology
Russian Presidential Academy of National Economy and Public Administration
Publisher(s)
Springer Verlag
Abstract
Selecting a representative subset of objects is one of the effective ways for processing large data sets. It concerns both automatic time-consuming algorithms and manual study of object properties by experts. ‘Representativity’ is considered here in a narrow sense as the equality of the statistical distributions of objects parameters for the subset and for the whole set. We propose a simple method for the selection of such a subset based on testing complex statistical hypotheses including an artificial hypothesis to avoid ambiguity. We demonstrate its functionality on two data sets, where one is related to the companies of mobile communication in Russia and the other – to the intercity autobuses communication in Peru.
Start page
476
End page
483
Volume
10125 LNCS
Language
English
OCDE Knowledge area
Estadísticas, Probabilidad Economía Negocios, Administración
Scopus EID
2-s2.0-85013498380
Source
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Resource of which it is part
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN of the container
03029743
ISBN of the container
978-331952276-0
Sources of information: Directorio de Producción Científica Scopus