View
100
Download
0
Category
Tags:
Preview:
DESCRIPTION
Scorpion Explaining Away Outliers in Aggregate Queries . eugene wu and sam madden MIT . http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html. Table. Split. Visualize. Aggregate. SELECT sum(cost) FROMexpenses GROUPBY country. Expenses. USA. China. Italy. - PowerPoint PPT Presentation
Citation preview
ScorpionExplaining Away Outliers in Aggregate Queries
eugene wu and sam maddenMIT
http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html
USA ItalyChina
GivenOutlier and normal results
Understand Why
Expe
nses
SELECT sum(cost)FROM expensesGROUPBY country
GivenOutlier and normal results
caused the outliers?
most caused the outliers?
caused outliers but didn’t affect normal outputs?
USA ItalyChina
What input properties
Expe
nses
SELECT sum(cost)FROM expensesGROUPBY country
Provenance
FacetingDimensionality :(Dealing with multiple outliers?
http://www.perceptualedge.com/articles/Whitepapers/Three_Blind_Men.pdf
USA ItalyChina
Predicates correlated with outliers
Find
GivenOutlier and normal results
Desc = “toilets”
Expe
nses
USA ItalyChina
Removing predicate from inputs “fixes” outliers & maintains normal results
Predicates correlated with outliers
Find
s.t.
GivenOutlier and normal results
Expe
nses
Desc = “toilets”
USA ItalyChina
Removing predicate from inputs “fixes” outliers & maintains normal results
Predicates correlated with outliers
Find
s.t.
GivenOutlier and normal results
Expe
nses
Desc = “toilets”
Removing predicate from inputs “fixes” outliers & maintains normal results
Predicates correlated with outliers
Find
s.t.
USA ItalyChina
GivenOutlier and normal results
Expe
nses
Desc = “toilets”
Δoutput V |p(T)|
Δoutput V |p(T)|c
ΔOutput
“High vs Low”
|p(T)|
ΔNormal
Multiple Outputs
Δoutput|p(T)|
Δoutlier V |p(T)|c ΔNormal
Δoutput V |p(T)|c
-
ΔOutput
“High vs Low”
|p(T)|
ΔNormal
Multiple Outputs
Δoutput|p(T)|
Δoutput V |p(T)|
Δoutlier V |p(T)|c ΔNormal-
ΔOutput
“High vs Low”
|p(T)|
ΔNormal
Multiple Outputs
Δoutlier V |p(T)|cmean ΔNormal max
outlier normal-
Δoutput|p(T)|
Δoutput V |p(T)|c
Δoutput V |p(T)|
Δoutlier V |P(T)|c ΔHold-out
Δoutlier|P(T)|
Δoutlier V |P(T)|
Δoutlier V |P(T)|c
-
Δoutput
“High vs Low”
|P(T)|
ΔNormal
Multiple Outputs
Δoutlier V |P(T)|cmean ΔHold-out max
outlier normal-
influence(p)
influence(p)argmaxp ∈ predicatesp* =
O(exponential) O(agg(p(T)))
Operator PropertiesIncrementally removable
influence(p)argmaxp ∈ predicatesp* =
O(exponential) O(agg(p(T)))Incrementally removable
SUM({1,2,3,4,5}) = 15
p
influence(p)argmaxp ∈ predicatesp* =
O(exponential) O(agg(p(T)))Incrementally removable
15 - SUM({ 4,5}) = 6SUM({1,2,3,4,5}) = 15
p
influence(p)argmaxp ∈ predicatesp* =
O(exponential) O(agg(p(T)))
SUMCOUNTAVGSTDDEV
Incrementally removable
influence(p)argmaxp ∈ predicatesp* =
O(exponential) O(agg(p(T)))
SUMCOUNTAVGSTDDEVMEDIAN
MODE
Incrementally removable
influence(p)argmaxp ∈ predicatesp* =
IndependentIncrementally removable
O(agg(p(T)))O(exponential)
Leastinfluence
Mostinfluence
influence(p)argmaxp ∈ predicatesp* =
IndependentIncrementally removable
O(agg(p(T)))O(exponential)
Leastinfluence
Mostinfluence
influence(p)argmaxp ∈ predicatesp* =
IndependentIncrementally removable
O(agg(p(T)))O(exponential)
Leastinfluence
Mostinfluence
influence(p)argmaxp ∈ predicatesp* =
IndependentIncrementally removable
O(agg(p(T)))O(exponential)
Leastinfluence
Mostinfluence
influence(p)argmaxp ∈ predicatesp* =
Top DownIndependent
Incrementally removable
O(agg(p(T)))O(exponential)
influence(p)argmaxp ∈ predicatesp* =
Top DownIndependent
Incrementally removable
O(agg(p(T)))O(exponential)
influence(p)argmaxp ∈ predicatesp* =
Top DownIndependent
Incrementally removable
O(agg(p(T)))O(exponential)
influence(p)argmaxp ∈ predicatesp* =
Top DownIndependent
Incrementally removable
O(agg(p(T)))O(exponential)
influence(p)argmaxp ∈ predicatesp* =
Top DownIndependent
Incrementally removable
O(agg(p(T)))O(exponential)
Anti-monotonic
influence(p)argmaxp ∈ predicatesp* =
Top DownIndependent
Incrementally removable
O(agg(p(T)))O(exponential)
Anti-monotonic
p’⊂p
influence(p)argmaxp ∈ predicatesp* =
Top DownIndependent
Incrementally removable
O(agg(p(T)))O(exponential)
Anti-monotonic
p’⊂p
influence(p’) ≤ influence(p)
influence(p)argmaxp ∈ predicatesp* =
Top DownIndependent
Incrementally removable
O(agg(p(T)))O(exponential)
Bottom Up Anti-monotonic
influence(p)argmaxp ∈ predicatesp* =
Top DownIndependent
Incrementally removable
O(agg(p(T)))O(exponential)
Bottom Up Anti-monotonic
influence(p)argmaxp ∈ predicatesp* =
Top DownIndependent
Incrementally removable
O(agg(p(T)))O(exponential)
Bottom Up Anti-monotonic
influence metric that is
accessible to end-usersfor
Data cleaningData explorationProvenance reduction
scorpion
http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html
eugenewu@mit.edu
scorpion
http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html
eugenewu@mit.edu
scorpion
http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html
eugenewu@mit.edu
Recommended