Title
Analyzing how we do Analysis and Consume Data, Results from the SciDAC-Data Project
Date Issued
23 November 2017
Access level
metadata only access
Resource Type
conference paper
Author(s)
Ding P.
Mubarak M.
Tsaris A.
Norman A.
Lyon A.
Ross R.
Fermi National Accelerator Laboratory
Publisher(s)
Institute of Physics Publishing
Abstract
One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data delivery is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption.
Volume
898
Issue
9
Language
English
OCDE Knowledge area
Informática y Ciencias de la Información Estadísticas, Probabilidad
Scopus EID
2-s2.0-85039422312
Source
Journal of Physics: Conference Series
ISSN of the container
17426588
Conference
22nd International Conference on Computing in High Energy and Nuclear Physics, CHEP 2016
Sponsor(s)
The author acknowledges support for this research that was carried out by Fermilab and Argonne National Laboratory. Fermilab is Operated by Fermi Research Alliance, LLC under Contract No. De-AC02-07CH11359 with the United States Department of Energy. Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.
Sources of information: Directorio de Producción Científica Scopus