| Recent technological and scientific advances have resulted in an abundance of data that describe and model phenomena as graphs. Graphs have been used for modelling biological pathways, chemical compounds, protein structures, social networks, and (RDF) ontologies. This proposal explores the field of graph databases, in particular the problem of querying large databases of sparse graphs, as found in these domains. Graph queries may mix value- or keyword-based predicates with graph predicates, both using the query modality of finding exact answers as well as ranked top-N retrieval. The predicates of interest are: finding subgraphs, paths and closures. The specific complication addressed here is that during data exploration, a user is likely to continuously transform the graph database (i.e. shrink the graph by contracting certain kinds of edges or expand the graph by creating new edges based on e.g. value-based joins formulated in query languages such as SQL, SPARQL and XQuery). This problem challenges the use of pre-computed graph database indexing schemes, as used by state-of-the-art in graph data management. The main topic of this research is thus to find truly scalable graph data structures/algorithms and query optimization strategies that can handle large datasets (>>RAM), yet can cope with graph transformations without requiring full recomputation of graph indexing structures. The solution direction taken is to devise new techniques to store and query large amounts graph data in relational database structures and investigate opportunities for graph-specific query optimization and index re-use strategies. |