NeoLeg, a graph database for translational research in pulses

Legumes, and especially pulses, are an important source of protein for food and feed, and are appreciated or their positive impact on the “one health”. However their yields are sometimes unstable and their tolerance to some biotic and abiotic stresses highlight the need for varietal improvement to increase the cultivated area and stabilize the production.

With the advent of sequencing technologies, a large pool of genetic and genomic resources, heterogeneous at the inter- and intra-species scale, is emerging. Thus, it is now time to capitalize on these scattered heterogeneous data to develop translationnal research and boost breeding projects.

To meet this need, we undertook the development of NeoLeg, a graph-based database using Neo4j and dedicated for translational research between legumes. Starting from genome sequences and annotation files, we inferred orthologous relationships between genes, and proposed associated syntenic blocks between the chromosomes of four pulse species, namely Pisum sativum, Vicia faba, Lens culinaris and Vigna radiata, and model legume Medicago truncatula. Available information on quantitative trait loci (QTL) for multiple traits are being included. The proposed modeling was tested in basic case studies and for other scenarios related to the identification of the genetic determinants of resistance to an insect pest. The main achievements as well as remaining challenges and perspectives will be discussed.