12
SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack David Ryan, David Valentine, Ilya Zaslavsky, Matt Rodriguez

SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

Embed Size (px)

Citation preview

Page 1: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA:

INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS

Thomas Whitenack

David Ryan, David Valentine, Ilya Zaslavsky, Matt Rodriguez

Page 2: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

USGS Instantaneous water data services

• 15 minute intervals• 10,000+ sites (7,000+ have dischage)• Upto 60 days of data available• http://waterservices.usgs.gov/WOF/InstantaneousValues• Data provided using CUAHSI WaterML

Page 3: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

Open Source Data Turbine (Ring Buffered Network Bus)

•DataTurbine is a robust open-source streaming data middleware system, designed for sensor based systems.•Co-developed by our UCSD / Calit2 colleagues. •Solution for accessing both streaming and static data, from different vendor systems, via a common interface.• Released under Apache 2.0 Open Source License• Provides real high performance data streaming, 10+MB/sec, 1000 frames/sec

Page 4: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

Open Source DataTurbine

• Supported by NASA SBIR, 15 years in development

• Supports multiple types of streams: real-time monitoring, video and multimedia, telemetry, instant messages, etc. etc.

• Scalable: DataTurbine servers can be interconnected to handle large streams

• Can manipulate the streams: fast forward or slow motion playback (TiVo-like)

Page 5: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

Goal of Integrating Data Turbine with CUAHSI HIS

• Get the two systems to work together. • Maintain an up-to-date view of a large volume of

near real time data, in house. • Store data locally beyond the 60 days it is made

available. • Enable viewing of the NWIS Instantaneous data

in the Realtime Data Viewer (RDV).

Page 6: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

Challenges of Project

• Integrate CUAHSI HIS with the data turbine• CUAHIS HIS perspective:

• Consuming waterML from Java environment• Obtain and store NWIS 15 minute data beyond 60 days.

• Data Turbine Perspective• Cuahsi data represented unusual challenges

– Pulling data.– Time stamps have to set for each value.

• 7,000 “Channels” needed to be organized for the RDV client– Visualizing / navigating mass volumes of data.

Page 7: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

CUAHSI –> Data Turbine

Page 8: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

OSDT Custom Source

• Each source is a separate connection• 7000 sources was too many for OSDT.

• Sources can have multiple channels and sub-channels• Sites were organized by state and county to make it

navigatible • 50GB Disk cache: ~ 1 year of 15 minute data for 7000

sites. • Cycling through 7,000+ getValues request takes ~18

hours for the iteration, or upon restart.• Subsequent iterations still can complete in under 8 hours.

Page 9: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

Realtime Data Viewer (RDV)

Page 10: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

OSDT Custom “Sink”

• Is essentially a custom client connection to DataTurbine (RDV is a sink process).

• Pulls data and writes it to SQL batch files for batch inserts.

• Used to update local ODM instance of NWIS instantaneous data.

Page 11: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

Conclusions• CUAHSI HIS WaterML can be used in Java/ non windows

environments successfully. • Displaying near realtime data in RDV is very fast and is a valuable

visualization tool. • Data turbine is designed to ingest much more data than this.

• Capable of 10MB/Second – We’re feeding it < 1K/second.• Updating 7000+ data channels worked, but is well beyond what the

OSDT developers had in mind when designing it. • Organizing 7000+ channels in a viewer display represents

organizational challenges.

Page 12: SAN DIEGO SUPERCOMPUTER CENTER NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA: INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS Thomas Whitenack

SAN DIEGO SUPERCOMPUTER CENTER

Questions?

[email protected]

• http://www.dataturbine.org