A corpus builder: Retrieving raw data from Github for knowledge reuse in requirements elicitation

QUINTANILLA PORTUGAL, ROXANA LISETTE; Roque H.; Do Prado Leite J.C.S.

Title

Date Issued

01 January 2016

Access level

metadata only access

Resource Type

conference paper

Author(s)

QUINTANILLA PORTUGAL, ROXANA LISETTE

Roque H.

Do Prado Leite J.C.S.

PUC-Rio

Publisher(s)

CEUR-WS

Abstract

Requirement elicitation is an important task, which can lead to cost reduction in the overall software process, as it avoids failures due to lack of proper understanding about what to build. However, usually, there is a lack of time devoted to proper elicitation during software construction. We assume information from similar projects is a valuable knowledge for requirements engineers when facing a new project in the same or related domain, and its acquisition can be speeded up by knowing their main features. This information is usually located in Readme documents of GitHub. We present a tool that helps in handle this large amount of information by retrieving a corpus of Readme documents given a domain-related query. It is described, in detail, how a corpus is created and stresses the importance of having a quality corpus as base for data mining, or as input for tools of qualitative data analysis.

Start page

48

End page

54

Volume

1743

Language

English

OCDE Knowledge area

Sistemas de automatización, Sistemas de control Ciencias de la computación

Scopus EID

2-s2.0-85006115858

Source

CEUR Workshop Proceedings

ISSN of the container

16130073

Conference

CEUR Workshop Proceedings

Sources of information: Directorio de Producción Científica Scopus

Options