Title
Biomedical term extraction: overview and a new methodology
Date Issued
01 April 2016
Access level
open access
Resource Type
research article
Author(s)
Jonquet C.
Roche M.
Teisseire M.
Abstract
Terminology extraction is an essential task in domain knowledge acquisition, as well as for information retrieval. It is also a mandatory first step aimed at building/enriching terminologies and ontologies. As often proposed in the literature, existing terminology extraction methods feature linguistic and statistical aspects and solve some problems related (but not completely) to term extraction, e.g. noise, silence, low frequency, large-corpora, complexity of the multi-word term extraction process. In contrast, we propose a cutting edge methodology to extract and to rank biomedical terms, covering all the mentioned problems. This methodology offers several measures based on linguistic, statistical, graphic and web aspects. These measures extract and rank candidate terms with excellent precision: we demonstrate that they outperform previously reported precision results for automatic term extraction, and work with different languages (English, French, and Spanish). We also demonstrate how the use of graphs and the web to assess the significance of a term candidate, enables us to outperform precision results. We evaluated our methodology on the biomedical GENIA and LabTestsOnline corpora and compared it with previously reported measures.
Start page
59
End page
99
Volume
19
Issue
February 1
Language
English
OCDE Knowledge area
Ciencias de la Información
Scopus EID
2-s2.0-84956658909
Source
Information Retrieval
ISSN of the container
13864564
Sponsor(s)
This work was supported in part by the French National Research Agency under the JCJC program, Grant ANR-12-JS02-01001, as well as by the University of Montpellier, CNRS, IBC of Montpellier project and the FINCyT program, Peru. Finally, the authors thank DIST (Scientific and Technical Information Service) for the acquisition of Cirad corpus.
Sources of information: Directorio de Producción Científica Scopus