Title
The Ontologies Community of Practice: A CGIAR Initiative for Big Data in Agrifood Systems
Date Issued
09 October 2020
Access level
open access
Resource Type
journal article
Author(s)
Arnaud E.
Laporte M.A.
Kim S.
Aubert C.
Leonelli S.
Miro B.
Cooper L.
Jaiswal P.
Kruseman G.
Shrestha R.
Buttigieg P.L.
Mungall C.J.
Pietragalla J.
Agbona A.
Muliro J.
Detras J.
Hualla, Vilma
Rathore A.
Das R.R.
Dieng I.
Bauchet G.
Menda N.
Pommier C.
Shaw F.
Lyon D.
Mwanzia L.
Bonaiuti E.
Chiputwa B.
Obileye O.
Auzoux S.
Yeumo E.D.
Mueller L.A.
Silverstein K.
Lafargue A.
Antezana E.
Devare M.
King B.
Publisher(s)
Cell Press
Elsevier B.V.
Abstract
Heterogeneous and multidisciplinary data generated by research on sustainable global agriculture and agrifood systems requires quality data labeling or annotation in order to be interoperable. As recommended by the FAIR principles, data, labels, and metadata must use controlled vocabularies and ontologies that are popular in the knowledge domain and commonly used by the community. Despite the existence of robust ontologies in the Life Sciences, there is currently no comprehensive full set of ontologies recommended for data annotation across agricultural research disciplines. In this paper, we discuss the added value of the Ontologies Community of Practice (CoP) of the CGIAR Platform for Big Data in Agriculture for harnessing relevant expertise in ontology development and identifying innovative solutions that support quality data annotation. The Ontologies CoP stimulates knowledge sharing among stakeholders, such as researchers, data managers, domain experts, experts in ontology design, and platform development teams. Digital technology use in agriculture and agrifood systems research accelerates the production of multidisciplinary data, which spans genetics, environment, agroecology, biology, and socio-economics. Quality labeling of data secures its online findability, reusability, interoperability, and reliable interpretation, through controlled vocabularies organized into meaningful and computer-readable knowledge domains called ontologies. There is currently no full set of recommended ontologies for agricultural research, so data scientists, data managers, and database developers struggle to find validated terminology. The Ontologies Community of Practice of the CGIAR Platform for Big Data in Agriculture harnesses international expertise in knowledge representation and ontology development to produce missing ontologies, identifies best practices, and guides data labeling by teams managing multidisciplinary information platforms to release the FAIR data underpinning the evidence of research impact. The deployment of digital technology in Agriculture and Food Science accelerates the production of large quantities of multidisciplinary data. The Ontologies Community of Practice (CoP) of the CGIAR Platform for Big Data in Agriculture harnesses the international ontology expertise that can guide teams managing multidisciplinary agricultural information platforms to increase the data interoperability and reusability. The CoP develops and promotes ontologies to support quality data labeling across domains, e.g., Agronomy Ontology, Crop Ontology, Environment Ontology, Plant Ontology, and Socio-Economic Ontology.
Volume
1
Issue
7
Language
English
OCDE Knowledge area
Agricultura
Subjects
Scopus EID
2-s2.0-85102967785
Source
Patterns
ISSN of the container
2666-3899
Sponsor(s)
The Ontologies CoP and Socio-Economic Data CoP are financially supported by the CGIAR Platform for Big Data in Agriculture that is mainly supported by the CGIAR Trust Fund, (https://www.cgiar.org/funders/) and UKAID. The Crop Ontology is currently supported by the CGIAR Platform for Big Data in Agriculture and the CGIAR Research Programs on Roots, Tubers, and Bananas; Wheat, Maize, and Rice Programs; Grain Legumes and Dryland Cereals (CRP-GLDC); and by each CGIAR Center for its mandate crops. The rice example is based on data generated by the International Rice Research Institute (IRRI) for the RICE Research program. The Planteome Project, led by P.J. (Oregon State University), is funded by the National Science Foundation, USA (IOS:1340112 award). The coordinator of the Environment Ontology and SDG Interface Ontology is funded by the Frontiers in Arctic Marine Monitoring (FRAM) program of the Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, Helmholtz Centre for Polar and Marine Research (AWI). COPO was initially funded by a BBSRC Biological and Bioinformatics Resources (BBR) grant (BB/L024055/1, BB/L024101/1, and BB/L024071/1) and is now funded by the BBSRC Core Strategic Program grant awarded to the Earlham Institute (BBS/E/T/000PR9817). COPO is hosted within the CyVerse UK academic cloud, funded by BBSRC (BB/M018431/1 and BB/R000662/1). S.L. is funded by the Alan Turing Institute under the EPSRC grant EP/N510129/1. The Elixir and Emphasis contribution to the Crop Ontology and its adoption have been supported by the Infrastructure Biologie Santé “Phenome-FPPN′” supported by the French National Research Agency (ANR-11-INBS-0012), the TransPLANT project (EU 7th Framework Program, contract no. 283496), the H2020 ELIXIR-EXCELERATE project (funded by the European Commission within the Research Infrastructures program of Horizon 2020, grant agreement no. 676559), and the “Investments for the Future program” (PIA) (ANR-11-INBS-0012) as well as by INRAe. Developments of wheat, protein crops, rapeseed, and miscanthus ontologies have been supported by the Breedwheat (ANR-10-BTBR-03), BFF (11-BTBR-0006), Rapsodyn (11-BTBR-0004), and Peamust (11-BTBR-0002) PIA projects. We acknowledge the contribution of Kate Dreher, data steward at CIMMYT for actively supporting discussions on semantics within the Data Management Working Group of the CGIAR Excellence in Breeding Platform. Aman Sidhu, consultant, for formatting and facilitating the CoP webinars. Olga Spellman, The Alliance Bioversity International-CIAT, for paper technical review and English editing. E. Arnaud, who oversees and leads the Ontologies Community of Practice (CoP) activity planning and execution, wrote the manuscript,. P.J. leads the Planteome project and secured funding for the work. C.A. E. Antezana, P.L.B. L.C. P.J. G.K. S.L. S.K. M.-A.L. J.M. and C.J.M. who lead investigation activities, develop ontologies and recommendations, contributed to the manuscript. B.M. provided the rice data example, performed the annotation, and contributed to the manuscript. A.A. G.B. R.D.D. J.P. V.H. J.M. N.M. C.P. A.R. R.S. and R.D. actively contribute to the development and curation of the ontologies for agriculture. S.A. E.B. B.C. I.D. E.D.Y. H.J. A.L. D.L. L.A.M. O.O. F.S. and K.S. are data managers and IT developers of ontology-supported tools and repositories. P.L.B. and C.J.M. provide expert advice to the CoP. M.D. and B.K. lead modules in the CGIAR Big Data Platform and actively and financially support the CoP. G.K. leads the Socio-Economic Data CoP. S.L. and L.A.M. are supportive project leaders. All authors are members of the Ontologies Community of Practice (CoP). The authors declare no competing interests.
In 2008, CGIAR initiated the development of the CO ( http://www.cropontology.org ) in response to the need of breeding data management systems and field books to have access to valid lists of defined breeders' traits and variables. Currently, the CO comprises 4,235 traits and 6,151 variables for 31 plant species. By providing descriptions of agronomic, morphological, physiological, quality, and stress traits along with a standard for composing the variables, the CO enables digital capture and aggregation of crop trait data, as well as comparison across projects and locations. 7 The CO was integrated into the Planteome's ontology project funded by the National Science Foundation, US (IOS:1340112 award; http://planteome.org ) and was successfully adopted by the CGIAR Integrated Breeding Platform ( https://www.integratedbreeding.net/ ) and by the Boyce Thompson Institute's Breedbase ( https://breedbase.org/ ), both of which are comprehensive breeding management systems and analysis software, and by national databases, such as GnpIS ( https://urgi.versailles.inra.fr/Tools/GnpIS ) 9 in France, or international projects, such as Emphasis (European Plant Phenotyping Infrastructures; https://emphasis.plant-phenotyping.eu/ ). Both the Minimum Information About a Plant Phenotype Experiment ( https://www.miappe.org/ ) metadata schema (MIAPPE), 10 , 11 and the Breeding Application Programming Interface (BrAPI) ( https://brapi.org/ ), 12 which enable the extraction of genotype and phenotype data across databases, are compliant with the CO format.
The Ontologies CoP and Socio-Economic Data CoP are financially supported by the CGIAR Platform for Big Data in Agriculture that is mainly supported by the CGIAR Trust Fund, ( https://www.cgiar.org/funders/ ) and UKAID. The Crop Ontology is currently supported by the CGIAR Platform for Big Data in Agriculture and the CGIAR Research Programs on Roots, Tubers, and Bananas; Wheat, Maize, and Rice Programs; Grain Legumes and Dryland Cereals (CRP-GLDC); and by each CGIAR Center for its mandate crops. The rice example is based on data generated by the International Rice Research Institute (IRRI) for the RICE Research program. The Planteome Project, led by P.J. (Oregon State University), is funded by the National Science Foundation , USA (IOS:1340112 award). The coordinator of the Environment Ontology and SDG Interface Ontology is funded by the Frontiers in Arctic Marine Monitoring (FRAM) program of the Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research , Helmholtz Centre for Polar and Marine Research (AWI). COPO was initially funded by a BBSRC Biological and Bioinformatics Resources (BBR) grant ( BB/L024055/1 , BB/L024101/1 , and BB/L024071/1 ) and is now funded by the BBSRC Core Strategic Program grant awarded to the Earlham Institute ( BBS/E/T/000PR9817 ). COPO is hosted within the CyVerse UK academic cloud, funded by BBSRC ( BB/M018431/1 and BB/R000662/1 ). S.L. is funded by the Alan Turing Institute under the EPSRC grant EP/N510129/1 . The Elixir and Emphasis contribution to the Crop Ontology and its adoption have been supported by the Infrastructure Biologie Santé “Phenome-FPPN′” supported by the French National Research Agency ( ANR-11-INBS-0012 ), the TransPLANT project (EU 7th Framework Program, contract no. 283496), the H2020 ELIXIR-EXCELERATE project (funded by the European Commission within the Research Infrastructures program of Horizon 2020 , grant agreement no. 676559 ), and the “Investments for the Future program” (PIA) ( ANR-11-INBS-0012 ) as well as by INRAe . Developments of wheat, protein crops, rapeseed, and miscanthus ontologies have been supported by the Breedwheat ( ANR-10-BTBR-03 ), BFF ( 11-BTBR-0006 ), Rapsodyn ( 11-BTBR-0004 ), and Peamust ( 11-BTBR-0002 ) PIA projects.
Sources of information:
Directorio de Producción Científica
Scopus