Opportunities and pitfalls of the Semantic Web in life sciences

Semantic Web was introduced by Tim Berners-Lee in 1999 with the aim to connect and describe every document on the Web. However, facing the rapid expansion of the Web, the Web Semantic technologies failed to scale to their original purpose (the Web) but offer today an efficient way to represent and integrate heterogeneous data in life science. Data are represented as statements in a graph of entities and relations, while being also semantically described in dedicated vocabularies. The resulting Knowledge Graphs are powering SPARQL request engines, allowing to extract direct or inferred relations between entities by taking advantages of their semantic descriptions. We built a Knowledge Graph containing several billion of statements, gathering and connecting data from major providers in life sciences (PubChem, MeSH, PubMed, etc.) and exploit complex paths of relations to extract relevant associations using SPARQL requests. Despite some limitations, the opportunities offered by linked data can direct benefit users by supporting the interaction and communication between data providers that usually remain isolated from each other.