Title
A corpus builder: Retrieving raw data from Github for knowledge reuse in requirements elicitation
Date Issued
01 January 2016
Access level
metadata only access
Resource Type
conference paper
Author(s)
Roque H.
Do Prado Leite J.C.S.
PUC-Rio
Publisher(s)
CEUR-WS
Abstract
Requirement elicitation is an important task, which can lead to cost reduction in the overall software process, as it avoids failures due to lack of proper understanding about what to build. However, usually, there is a lack of time devoted to proper elicitation during software construction. We assume information from similar projects is a valuable knowledge for requirements engineers when facing a new project in the same or related domain, and its acquisition can be speeded up by knowing their main features. This information is usually located in Readme documents of GitHub. We present a tool that helps in handle this large amount of information by retrieving a corpus of Readme documents given a domain-related query. It is described, in detail, how a corpus is created and stresses the importance of having a quality corpus as base for data mining, or as input for tools of qualitative data analysis.
Start page
48
End page
54
Volume
1743
Language
English
OCDE Knowledge area
Ciencias de la computación Sistemas de automatización, Sistemas de control
Scopus EID
2-s2.0-85006115858
Source
CEUR Workshop Proceedings
ISSN of the container
16130073
Conference
CEUR Workshop Proceedings
Sources of information: Directorio de Producción Científica Scopus