15
Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Embed Size (px)

Citation preview

Page 1: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia NittelUniversity of California, Los Angeles

Scientific Data Mining in ESP2Net

Page 2: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

Overview

• Motivation• What is scientific data mining ?• Examples of scientific data mining at

UCLA• CS interests in scientific data mining

– Tools – Collaboration paradigms– Interoperability

Page 3: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

Motivation

• The advent of the computer has brought with it the ability to generate and store huge amounts of data.• Business data (DBs)• Scientific Data

• What is it ? The process of extracting useful information has become more formalized and the term Data Mining has been coined for it.

Satellite dish

Satellite

Satellite

Satellite

Satellite dish

Satellite dish

Page 4: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

What is data mining ?

• Definition:Data mining is the process of extracting previously unknown, comprehensible, valid and actionable information from large data stores (and using it to make crucial business decisions).

• There are two approaches: – verification driven, whose aim is to validate a hypothesis

postulated by a user, or – discovery driven, which is the automatic discovery of

information by the use of appropriate tools.

• The discovery driven approach depends on a more sophisticated and structured search of the data for associations, patterns, rules or functions, and then having the analyst review them for value.

Page 5: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

Process

Page 6: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

Page 7: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

What is scientific data mining ?• Data mining started with “simple info” (business data)

like in DBMS; this is called OLAP (online analytical processing).

• Scientific data mining:– Data is more complex.– Data is much larger.– Often discovery-oriented approach used.

• Medicine, Biology, Physics, Weather…

• Principles of a science method:– observation-hypothesis-experiment cycle

• Data mining for science: – “observation-hypothesis” supported by discovery driven mining– “hypothesis-experiment” supported by verification driven mining

Page 8: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

Example: Farming Environment• Goal:

– optimization of crop yield while minimizing the resources supplied.

– How: identify what factors affect the crop yield, • One analysis looked at over 64 separate items

measured over a number of years to extract the items that were significant.

• Initially analysis: discovery driven mining – To attempt to find what parameters were significant, either

by themselves or in conjunction with others. – Use of statistical methods to determine the parameters that

are significant and their relative influence. – Result: derive equation of interdependence

• Later on: verify equation via verification driven mining against new datasets.

Page 9: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

Example: Global Climate Change• Often a verification driven mining approach.

– Climate data has been collected for many centuries.– It is extended into the more distant past through such

activities as analysis of ice core samples from the Antarctic.– At the same time, a number of different predictive models

have been proposed for future climatic conditions.

• Use predicitive model:– Use sample data from the past– Verify the predictive models by

• Using them on historical data then compared the results with the sample data.

– From this, the models can then be refined further and used for another round of verification driven mining.

Page 10: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

Scientific Data Mining at UCLA• Project scope:

– ESP2Net: Earth Science Partners’ Private Network• Computer science: UCLA, HRL,

• Earth science: JPL, Scripps, U Arizona

• Scientific data mining:

– Verification driven approach

– Large amounts of raster satellite data

Page 11: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

1 “Warm pool” develops in tropical Pacific ocean

3 Vigorous convection produces very high cold

clouds

4 Storm systems push “moisture flare” Eastward

2 Warm moist air rapidly rises 5 Heavy rainfall over

Southwest U.S.

VPN

Hypothesis: Coastal rainfall correlated with remote convective events in tropical Pacific

ISCCP DX, CL

UACluster operators

Matching operators

JPL

TOVS, NVAP, MLS

Tracking operatorsStatistical operators

Scripps

Precipitation

Correlation operatorsGLINT operators

Scientific Data Mining at UCLA

Page 12: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

Visualization

• Convective cloud cluster motion– ISCCP CL, March 8-21 1993 (UA)

• Water vapor motion in the atmosphere– NVAP, March 1-31 1993 (Scripps)

• Different perspective reveals new info– NVAP stacking and slicing (JPL)Cloud movie Water vapor movie

Page 13: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

Challenges of Scientific Data MiningChallenges:• Distributed collaboration

– share results (passive)

– share analysis processes (active)

• Leverage partners expertise and efforts

• Re-use core analysis tools (operators)

• Large datasets, decadal time spans (> ½ TB data)

Project goal:

• Build a flexible and extensible framework for scientific investigations which are

• Distributed and internet-based,

• provide reusable, extensible, efficient tools,

• address interoperability and collaboration

Page 14: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

UCLA Support of Scientific Data Mining• Re-usable Tools:

– Conquest (CONcurrent Queries in Space and Time)

• Collaboration Support:– Scientific Markup Language (SEML):

XML-based Scientific Experiment Logbook– Conquest (Distributed Queries)– Secure Collaboration (Virtual Private Networks)

• Interoperability– OpenGIS standard to represent data– CORBA– Java

Page 15: Silvia Nittel University of California, Los Angeles Scientific Data Mining in ESP2Net

Silvia Nittel GeoSKI 2000 24. Februar 2000

Summary

• Scientific data mining is a relatively new research area (first conference in 1994, KKD)

Science (hypothesis)

Statistics (methods)

Computer Science (visualization, animation)