19
Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

Embed Size (px)

Citation preview

Page 1: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

Validating an Access Cost Model for Wide Area Applications

Louiqa Raschid

University of Maryland

CoopIS 2001

Co-authors V. Zadorozhny, T. Zhan and L. Bright

Page 2: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Scalable Wide-Area ApplicationsProblems Wide area environment is dynamic (noisy) Wide variability in latency (end-to-end delay) Network and server workloads are unknown Time and Day dependencies impact latency Dynamic environment - constantly monitored

Research Objective:Use query feedback to monitor and learn behavior and to predict access cost distributions that may be Time and Day dependent

Page 3: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Talk Outline Architecture for Wide Area Applications

WebPT: Tool to predict access costs

WebPT based Access Cost Catalog

Grouping of WebSources based on observable WebSource characteristics

Hypothesis to test WebPT based Catalog -- High Prediction Accuracy versus Low Prediction Accuracy

Validation based on experimental case study

Page 4: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Architecture for WebPT based Catalog

Page 5: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Predicting Response Times for Accessing WebSources

Problem: Difficulty in determining evaluation costs Physical implementation details unknown Load on network and WebSource unknown

Objective: •Use query feedback to learn access costs•Exploit Time of day, Day of week etc., to predict costs•Identify easily observable WebSource characteristics Determine prediction accuracy for WebSources based on WebSource characteristics

Page 6: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Metrics in WebPT Access Cost Model WebSource and Network Costs

Query Processing at WebSource Downloading data from WebSource (extraction cost)

Wrapper Statistics Number of Pages Accessed Cardinality of Result

Statistics may be dependent on value of query binding WebPT - a tool for learning using query feedback and

predicting access cost based on parameters such as Day, Time, Qty of data , Cardinality, etc.

Page 7: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

WebPT Learning

Page 8: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

WebPT based Prediction WebPT is configured for some hierarchy of dimensions

Quantity, Day,Time, Cardinality WebPT Learning algorithm

Cell splitting Smoothing Estimate response time and confidence Similar to CART (regression versus heuristics) Cell merging

Heuristics used in calibration of each cell Dimension - min/ max/ scale Allowed deviation Confidence window

Page 9: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Prediction Accuracy of WebPT based Cost Model is strongly correlated with the following:

Observable WebSource Characteristics Significance of Time and Day in predicting

workload at the server and on the network Variance (noise) in accessing server

Quality of available statistics - cardinality Random bindings - large variance in cardinality Fixed bindings - better estimation of cardinality

Page 10: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Case Study: Data gathering and Experiment 6 data sources in the public domain Data gathered for several weeks in 1999, 2000 Queries submitted to WebSources periodically Recorded TTF TTL Query bindings affected result cardinality

Random bindings - >50 bindings Fixed bindings - 2 bindings each for [S,M,L]

Mediator queries - simple scan to complex 5 way join over data in 5 WebSources (not reported)

Page 11: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Observable WebSource Characteristics

Page 12: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Grouping of WebSources based on Characteristics

•G1: T and D significant; Noise can vary•G2: Noise High•G3: T, D not significant; Noise Low - EMPTY

Page 13: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Hypothesis to test WebPT based Access Cost Catalog H1: High prediction Accuracy for the following

T, D, are significant and Low Noise Sources are in G1 but not in G2

H2: Catalog will improve prediction accuracy for the following WebSources T, D are significant independent of noise Group G1

H3: Statistics may be dependent on value of query binding Prediction accuracy improves with learning on fixed bindings Sources in both groups

Page 14: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Prediction Accuracy for WebSources

WebPT(Lo) - Random bindings

Page 15: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

WebSource Characteristics and CorrelationWith Prediction Accuracy

Page 16: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Groupings of WebSources and Correlationwith Prediction Accuracy

G1: T and D significantG2: Noise HighGNIS: High Pred Accuracy G1 AND G2 FAA; FishBase: Low Pred Accuracy while in G1; Noisy

Page 17: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Quantile Plots of Relative Error of Prediction for ACM, Aircraft

Page 18: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Quantile Plot of Relative Error of Prediction for FAA, GNIS

Page 19: Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Summary + Impact Unique Case Study: WebPT based Access Cost

Catalog and Cost distributions Grouping of WebSources based on observable

WebSource characteristics High Prediction Accuracy for some sources in G1 (T,D

significant) with low noise High Prediction Accuracy for some sources in G1 and

in G2 (High Noise) Similar results for Mediator cost model and complex

N-way joins over multiple WebSources