1
Web Data ManagementAdvanced Database Presentation
By:
Navid Sedighpour
Professor :
Dr. Alireza Bagheri
Nevember 2015
2
InterestLack of schema
Data is unstructured or at best “semi-structured”Missing data, additional attributes, similar data but not identical
VolatilityMay confirm to one schema now, but not later
ScaleHow to capture everything?
Querying DifficultyWhat is the user language? What are the primitives?Aren’t Search Engines sufficient?
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
3
Fusion Tables Users contribute data in spreadsheetPossible joins between multiple data setsExtensive visualization
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
More Recent Approaches to Web Querying
4
More Recent Approaches to Web QueryingXML
Data exchange languageTree based structure
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
5
More Recent Approaches to Web QueryingRDF
W3C RecommendationSimple, self-descriptive model
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
6
RDF Data Volumes90% of world's data generated over last two years
Data are growing fast
Size almost doubling every year
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
7
RDF Data Volumes March 2009 – 89 Datasets
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
8
RDF Data Volumes September 2010 – 203 datasets
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
9
RDF Data Volumes September 2011 – 295 Datasets
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
10
RDF Data VolumesApril 2014 – 1091 Datasets
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
11
RDF IntroductionEverything is an uniquely named resource
Prefixes can be used to shorten names
Properties of resources can be defined
Relationships with other resources can be defined
Resource description can be contributed by different people/groups and can be located anywhere in the webIntegrated web “database”
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
12
RDF Data ModelTriple : Subject, Predicate (Property) , Object
Subject : The entity that is described (URI or Blank Node)
Predicate : a feature of the entity
Object : value of the feature
Set of RDF Triples is called “RDF Graph”
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
13
RDF Example Instance
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
14
RDF Graph
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
15
SPARQL Queries
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
16
Naïve Triple Store Design
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
17
Naïve Triple Store Design
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
Easy to ImplementBut
Too Many self-joins
18
Property TablesGrouping by Entities
Types :Clustered Property TablesProperty Class Tables
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
19
Clustered Property TablesGroup together the properties that tend to occur in the same (or similar) subjects
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
20
Property Class TablesCluster the subjects with the same type of property into one property table
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
21
Property TablesAdvantages :
Fewer Joins
Disadvantages :Lots of NULLsClustering is not trivialMulti-valued properties are complicated
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
22
Binary TablesGrouping by Properties: for each property build a two column table containing both subject and object, ordered by subjects
Also called “Vertically Partitioned Approach”
N two column tables (n is the number of unique properties in the data)
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
23
Binary TablesAdvantages :
Support multi-valued PropertiesNo NULLsNo ClusteringGood performance for subject-subject joins
Disadvantages:Not useful for subject-subject joinsExpensive inserts
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
24
Graph-Based ApproachAnswering SPARQL query = Subgraph Matching
gStore
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
25
Two steps need to be done :1. For each node of Q* get the lists of nodes in G* that include that node2. Do a multi-way join to get the candidate list
Alternatives :Sequential scan of G*
Both steps are inefficientS-Tree
Height Balanced Tree over signatures Run an inclusion query for each node of Q* and get lists of nodes in G* that include that node (q & s = q)
VS-Tree Support both steps efficiently Grouping by vertices
Graph-Based Approach
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
26
S-Tree
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
Pruning
27
S-Tree
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
28
S-Tree
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
29
S-Tree
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
30
S-Tree
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
31
VS-Tree
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
32
VS-Tree
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
33
ConclusionRDF Data seem to have considerable promise for web data management
We talked about four approaches to web data management including Naïve triple store design, Property Tables, Binary Tables and Graph-Based approach
VS-Tree has the best performance in Graph-Base approaches
gStore is more efficient than other approaches
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
34
References
Introduction Naïve Triple Store Design
Property Tables Binary Tables Graph-Based Conclusion
[1] D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach, "Scalable semantic web data management using vertical partitioning," in Proceedings of the 33rd international conference on Very large data bases, 2007, pp. 411-422.
[2] L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao, "gStore: answering SPARQL queries via subgraph matching," Proceedings of the VLDB Endowment, vol. 4, pp. 482-493, 2011.
[3] L. Zou, M. T. Özsu, L. Chen, X. Shen, R. Huang, and D. Zhao, "gStore: a graph-based SPARQL query engine," The VLDB Journal—The International Journal on Very Large Data Bases, vol. 23, pp. 565-590, 2014.
[4] X. Shen, L. Zou, M. T. Ozsu, L. Chen, Y. Li, S. Han, et al., "A Graph-based RDF Triple Store."