Upload
kurtis-pickerell
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
SAN DIEGO SUPERCOMPUTER CENTER
NEAR REAL TIME VISUALIZATION OF USGS INSTANTANEOUS DATA:
INTEGRATION OF OPEN SOURCE DATA TURBINE IN CUAHSI HIS
Thomas Whitenack
David Ryan, David Valentine, Ilya Zaslavsky, Matt Rodriguez
SAN DIEGO SUPERCOMPUTER CENTER
USGS Instantaneous water data services
• 15 minute intervals• 10,000+ sites (7,000+ have dischage)• Upto 60 days of data available• http://waterservices.usgs.gov/WOF/InstantaneousValues• Data provided using CUAHSI WaterML
SAN DIEGO SUPERCOMPUTER CENTER
Open Source Data Turbine (Ring Buffered Network Bus)
•DataTurbine is a robust open-source streaming data middleware system, designed for sensor based systems.•Co-developed by our UCSD / Calit2 colleagues. •Solution for accessing both streaming and static data, from different vendor systems, via a common interface.• Released under Apache 2.0 Open Source License• Provides real high performance data streaming, 10+MB/sec, 1000 frames/sec
SAN DIEGO SUPERCOMPUTER CENTER
Open Source DataTurbine
• Supported by NASA SBIR, 15 years in development
• Supports multiple types of streams: real-time monitoring, video and multimedia, telemetry, instant messages, etc. etc.
• Scalable: DataTurbine servers can be interconnected to handle large streams
• Can manipulate the streams: fast forward or slow motion playback (TiVo-like)
SAN DIEGO SUPERCOMPUTER CENTER
Goal of Integrating Data Turbine with CUAHSI HIS
• Get the two systems to work together. • Maintain an up-to-date view of a large volume of
near real time data, in house. • Store data locally beyond the 60 days it is made
available. • Enable viewing of the NWIS Instantaneous data
in the Realtime Data Viewer (RDV).
SAN DIEGO SUPERCOMPUTER CENTER
Challenges of Project
• Integrate CUAHSI HIS with the data turbine• CUAHIS HIS perspective:
• Consuming waterML from Java environment• Obtain and store NWIS 15 minute data beyond 60 days.
• Data Turbine Perspective• Cuahsi data represented unusual challenges
– Pulling data.– Time stamps have to set for each value.
• 7,000 “Channels” needed to be organized for the RDV client– Visualizing / navigating mass volumes of data.
SAN DIEGO SUPERCOMPUTER CENTER
CUAHSI –> Data Turbine
SAN DIEGO SUPERCOMPUTER CENTER
OSDT Custom Source
• Each source is a separate connection• 7000 sources was too many for OSDT.
• Sources can have multiple channels and sub-channels• Sites were organized by state and county to make it
navigatible • 50GB Disk cache: ~ 1 year of 15 minute data for 7000
sites. • Cycling through 7,000+ getValues request takes ~18
hours for the iteration, or upon restart.• Subsequent iterations still can complete in under 8 hours.
SAN DIEGO SUPERCOMPUTER CENTER
Realtime Data Viewer (RDV)
SAN DIEGO SUPERCOMPUTER CENTER
OSDT Custom “Sink”
• Is essentially a custom client connection to DataTurbine (RDV is a sink process).
• Pulls data and writes it to SQL batch files for batch inserts.
• Used to update local ODM instance of NWIS instantaneous data.
SAN DIEGO SUPERCOMPUTER CENTER
Conclusions• CUAHSI HIS WaterML can be used in Java/ non windows
environments successfully. • Displaying near realtime data in RDV is very fast and is a valuable
visualization tool. • Data turbine is designed to ingest much more data than this.
• Capable of 10MB/Second – We’re feeding it < 1K/second.• Updating 7000+ data channels worked, but is well beyond what the
OSDT developers had in mind when designing it. • Organizing 7000+ channels in a viewer display represents
organizational challenges.
SAN DIEGO SUPERCOMPUTER CENTER
Questions?
• http://www.dataturbine.org