ARA-LD, a RDF knowledge-based system containing Arabidopsis gene annotation and interaction data

Expression quantitative trait loci (eQTL) are genomic regions associated with variation in gene expression. Identifying regulatory genes underlying an eQTL region would improve our understanding of the role of genetic polymorphisms in gene expression regulation. However, finding such genes is difficult because of the often large eQTL interval, harboring tens to hundreds of candidate regulatory genes. Using genomic data can help list candidates for further validation, but doing this manually requires considerable effort due to the heterogeneous data from different databases. Here, we developed a Resource Description Framework (RDF) knowledge-based system containing Arabidopsis gene annotation and interaction data using Semantic Web technologies. This system currently includes protein-protein interaction, gene annotation, and transcription factor binding-site data from various sources. In addition to using SPARQL queries, we also developed a simple and interactive method to retrieve the data in RDF graphs and visualize the result as an interaction network. This method is incorporated into a previously developed AraQTL workbench for eQTL studies, allowing easy exploration of candidate regulatory genes.