Selection of statistically representative subset from a large data set

TEJADA CARCAMO, JAVIER LEANDRO; ALEXANDROV, MIKHAIL; SKITALINSKAYA, GABRIELLA; STEFANOVSKIY, DMITRY

Title

Date Issued

01 January 2017

Access level

metadata only access

Resource Type

conference paper

Author(s)

TEJADA CARCAMO, JAVIER LEANDRO

ALEXANDROV, MIKHAIL

SKITALINSKAYA, GABRIELLA

STEFANOVSKIY, DMITRY

Universidad Católica San Pablo

Autonomous University of Barcelona

Moscow Institute of Physics and Technology

Russian Presidential Academy of National Economy and Public Administration

Publisher(s)

Springer Verlag

Abstract

Selecting a representative subset of objects is one of the effective ways for processing large data sets. It concerns both automatic time-consuming algorithms and manual study of object properties by experts. ‘Representativity’ is considered here in a narrow sense as the equality of the statistical distributions of objects parameters for the subset and for the whole set. We propose a simple method for the selection of such a subset based on testing complex statistical hypotheses including an artificial hypothesis to avoid ambiguity. We demonstrate its functionality on two data sets, where one is related to the companies of mobile communication in Russia and the other – to the intercity autobuses communication in Peru.

Start page

476

End page

483

Volume

10125 LNCS

Language

English

OCDE Knowledge area

Estadísticas, Probabilidad Negocios, Administración Economía

Subjects

DOI

10.1007/978-3-319-52277-7_58

Scopus EID

2-s2.0-85013498380

Source

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Resource of which it is part

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ISSN of the container

03029743

ISBN of the container

978-331952276-0

Sources of information: Directorio de Producción Científica Scopus

Options