AskOmics, a Semantic-Web application to integrate and query meaningful biological datasets

AskOmics is a visual SPARQL query builder software, providing a web interface to upload and integrate heterogeneous data files (GFF, BED, and tabulated formats) into  RDF to support the integration of the data into a directed labeled graph. Askomics first relies on the raw data internal structures (headers or gff attributes) to construct an abstraction of link between class of data. Raw data are automatically interpreted into triples and stored in a Sparql endpoint according to the abstraction of relations. AskOmics then offers a SPARQL query builder which allows users  to compose, execute, and share expressive and semantically-rich queries. Shared queries can be tailored by other users to better suit their research needs. Results can then be downloaded locally, or exported to an existing Galaxy server for further processing. To reduce the strain of massive data integration on the local infrastructure, federated queries on external SPARQL endpoints are also available. Askomics helps integrative analysis on heterogeneous datasets related to plant resistance to biotic and abiotic factors. Such analyses depend on the ability of merging genetics, phenotypic, or molecular data obtained and compared in a lot of different conditions  (tissues, development stages, stress). Integrated datasets can include epigenomes, transcriptomes, proteomes, and metabolomes for many genotypes. This data is supplemented by physiological characterization (yield, earliness, response to stress) in various environments, and in recent metagenomics studies, the description of microbial communities of the soil to the data. BBIP’s Askomics aims to integrate this high volume of heterogeneous information, and extract information through complex queries, using the FALDO ontology to further refine queries related to genes.

Another current use-case is data FAIRification, a topical issue with several challenges, such as the need to label data with searchable, rich metadata. Due to the heterogeneity of data formats in the Life science community, and the ongoing process of designing rich metadata formats, maintaining technical solutions to deposit and query these metadata is a time-consuming task. Since Askomics generates the query form on the fly using integrated data, it is well-suited to provide an up-to-date interface even for metadata with evolving formats, without requiring to be updated. By syncing with the GenOuest CeSGO project, one existing instance is able to provide a direct download link to any dataset stored on disk, creating a functional data catalog for all users.

The development of AskOmics is still ongoing, with planned features including the management of ontological terms (such as selecting ancestors or descendants of a term), the creation of modular subqueries to improve queries management and reusability, improvements to the current system of federated queries on external AskOmics instances, and enhancements to the user experience as a whole.