Upload
tommy96
View
868
Download
0
Tags:
Embed Size (px)
Citation preview
STING: A Statistical Information Grid STING: A Statistical Information Grid Approach to Spatial Data MiningApproach to Spatial Data Mining
Presentation 2(Group 14)Presentation 2(Group 14)
CSE 590 Data MiningProf. Anita Wasilewska
SUNY Stony Brook
Presented By:Tejas SomaniNikhil Pujari
STING: A Statistical STING: A Statistical Information Grid Approach to Information Grid Approach to
Spatial Data MiningSpatial Data MiningPaper by:
Wei WangDepartment of Computer
ScienceUniversity of California, Los
AngelesCA 90095, U.S.A.
Jiong Yang
Department of Computer Science
University of California, Los
Angeles
CA 90095, U.S.A.
Richard Muntz
Department of Computer Science
University of California, Los
Angeles
CA 90095, U.S.A.
VLDB Conference Athens, Greece, 1997VLDB Conference Athens, Greece, 1997
ReferencesReferenceshttp://georges.gardarin.free.fr/Cours_X
MLDM_Master2/Sting.PDFhttp://www.webopedia.com/TERM/S/sp
atial_data.htmlJiawei Han and Michelle Kamber. Data
Mining Concept and Techniques (Chapter8). Morgan Kaufman, 2002
Using Grid-clustering Methods in Data Classification by Peter Grabusts and Arkady Borisov @Riga Technical University
What is Spatial Data??What is Spatial Data??Spatial data may be thought of as features
located on or referenced to the Earth's surface, such as roads, streams, political boundaries, schools, land use classifications, property ownership parcels, drinking water intakes, pollution discharge sites - in short, anything that can be mapped.
Spatial Area: The area that encompasses the locations of
all the spatial data is called spatial area.
http://www.webopedia.com/TERM/S/spatial_data.html
STING The OverviewSTING The Overview
• STING is a grid based method to efficiently process many common region oriented queries on a set of points
• A set of points satisfying some criterion defines a Region
• It is a hierarchical Method. The idea is to capture statistical information associated with spatial cells in such a manner that the whole classes of queries can be answered without referring to the individual objects.
We want to cluster the records that are in a spatial table in terms of location.
Placement of a record in a grid cell is completely determined by its physical location.
http://georges.gardarin.free.fr/Cours_XMLDM_Master2/Sting.PDF
Grid Cell HierarchyGrid Cell Hierarchy
Spatial Area is divided into rectangular cells
Each cell has a hierarchical structure.
Each cell at a higher level is partitioned into
number of cells of the next lower level (here
4)i.e., A cell in level i corresponds to the union
of the areas of its children at level i + 1The size of the leaf level cells is dependent
on the density of objects.http://georges.gardarin.free.fr/Cours_XMLDM_Master2/Sting.PDF
Hierarchical Structure for STING Hierarchical Structure for STING ClusteringClustering
Data Mining: Concepts and Techniques by by Jiawei Han, Micheline Kamber
Statistical ParametersStatistical ParametersFor each cell we have attribute-dependent
and attribute-independent parametersThe attribute independent parameter is
number of objects in a cell-nFor attribute dependent parameters it is
assumed that for each object its attributes have numerical values.
For each Numerical attribute we have the following five parameters
Statistical Parameters..Statistical Parameters..m- mean of all values in this cells- standard deviation of all values
in this cellmin-the minimum value of the
attribute in this cellmax-the minimum value of the
attribute in this celldistribution-the type of
distribution this cell follows. Data Mining: Concepts and Techniques by by Jiawei Han, Micheline Kamber
Statistical Parameters..Statistical Parameters..Statistical information regarding the
attributes in each grid cell, for each layer are pre-computed and stored before hand.
The statistical parameters for the cells in the lowest layer is computed directly from the values that are present in the table, when data are loaded into the database.
The Statistical parameters for the cells in all the other levels are computed from their respective children cells that are in the lower level.
Query Types and Query Query Types and Query ProcessingProcessing1)Query Types SQL like Language used to describe queries Two types of common queries found: one is to
find region specifying certain constraints and other take in a region and return some attribute of the region
2) Query Processing:We use a top-down approach to answer
spatial data queries.
Start from a pre-selected layer-typically with a small number of cells.
Query Processing..Query Processing..
The pre-selected layer does not have to be the top most layer.
For each cell in the current layer compute the confidence interval (or estimated range of probability) reflecting the cells relevance to the given query
The confidence interval is calculated by using the statistical parameters of each cell.
From the interval calculated we label the cells as relevant or irrelevant for this query
Remove irrelevant cells from further consideration.
Query Processing..Query Processing.. When finished with the current layer, proceed to
the next lower level.
Processing of the next lower level examines only the remaining relevant cells.
Repeat this process until the bottom layer is reached.
At this time if query specifications are met, the regions of relevant cells that satisfy the query are returned
Otherwise, the data that fall into the relevant cells are retrieved and further processed until they meet the requirement of the query
Different Grid Levels during Different Grid Levels during Query ProcessingQuery Processing
http://georges.gardarin.free.fr/Cours_XMLDM_Master2/Sting.PDF
Finally..Finally..Strength and Weakness of Strength and Weakness of STINGSTINGStrength:Grid structure facilitates parallel processing and
incremental updating Is very efficient as the computational cost is
O(g) where g is the total number of grid cells at the lowest level (much smaller than n, total number of objects)
Is query independent as statistical information stored in cells is summary information of data
Weakness:All Cluster boundaries are either horizontal or
vertical, and no diagonal boundary is selected.
Thank You
All the BEST for FINALS!!!