Title
Searching for rules to detect defective modules: A subgroup discovery approach
Date Issued
15 May 2012
Access level
open access
Resource Type
journal article
Author(s)
Rodríguez D.
Ruiz R.
Riquelme J.C.
Pablo de Olavide University
Abstract
Data mining methods in software engineering are becoming increasingly important as they can support several aspects of the software development life-cycle such as quality. In this work, we present a data mining approach to induce rules extracted from static software metrics characterising fault-prone modules. Due to the special characteristics of the defect prediction data (imbalanced, inconsistency, redundancy) not all classification algorithms are capable of dealing with this task conveniently. To deal with these problems, Subgroup Discovery (SD) algorithms can be used to find groups of statistically different data given a property of interest. We propose EDER-SD (Evolutionary Decision Rules for Subgroup Discovery), a SD algorithm based on evolutionary computation that induces rules describing only fault-prone modules. The rules are a well-known model representation that can be easily understood and applied by project managers and quality engineers. Thus, rules can help them to develop software systems that can be justifiably trusted. Contrary to other approaches in SD, our algorithm has the advantage of working with continuous variables as the conditions of the rules are defined using intervals. We describe the rules obtained by applying our algorithm to seven publicly available datasets from the PROMISE repository showing that they are capable of characterising subgroups of fault-prone modules. We also compare our results with three other well known SD algorithms and the EDER-SD algorithm performs well in most cases. © 2011 Elsevier Inc. All rights reserved.
Start page
14
End page
30
Volume
191
Language
English
OCDE Knowledge area
Bioinformática Otras ingenierías y tecnologías
Scopus EID
2-s2.0-84857597379
Source
Information Sciences
ISSN of the container
00200255
Sponsor(s)
The authors are grateful to the anonymous reviewers and Prof Rachel Harrison for their useful comments. This work has been supported by the projects TIN2007-68084-C02-00 and TIN2010-21715-C02-01 (Spanish Ministry of Education and Science). D. Rodríguez carried out part of this work as a visiting research fellow at Oxford Brookes University, UK.
Sources of information: Directorio de Producción Científica Scopus