Upload
thomasina-murphy
View
214
Download
0
Embed Size (px)
Citation preview
Marine Institute of Ireland: Streams architecture and deployment deep diveKrishna Mamidipaka, [email protected]
Roger Rea, [email protected]
Housekeeping
• We value your feedback - don't forget to complete your evaluation for each session you attend and hand it tothe room monitors at the end of each session
• Overall Conference Evaluation will be providedat the General Session on Friday
• Visit the Expo Solutions Centre
• Please remember this is a 'non-smoking' venue!
• Please switch off your mobile phones
• Please remember to wear your badge at all times
Disclaimer
The Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Agenda
• Solution overview• Why InfoSphere Streams?• Solution details
Analytics & Sensors
Advanced Acoustical Analytics
InfoSphere Streams
Filter wind & wave noise
Model Marine Mammal environment
Correlate to Galway Bay ecosystem
Real Time Marine Mammal Position
+ + =
IBM InfoSphere Streams v1.2
Development Environment
Runtime Environment
Toolkits & Adapters
Front Office 3.0
RHEL v5.3 or v5.4x86 multicore hardwareInfiniBand supportUp to 125 servers
Eclipse IDEStreamSightStream Debugger
Connectors to data sourcesOperator LibraryFinancial ToolkitMining Toolkit
Streams Programming Model
Streams Processing Language
Input OutputProcess
Platform optimized compilation
X86 Box
X86 Blade
CellBlade
X86 Blade
FPGABlade
X86 Processor
X86 Processor
X86Processor
X86 Processor
X86Processor
Streams Runtime Illustrated
TransportStreams Data Fabric
Processing Element
Container
Processing Element
Container
Processing Element
Container
Processing Element
Container
Processing Element
Container
Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation
Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation
Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters
Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters
X86 Processor
X86 Processor
X86Processor
X86 Processor
X86Processor
Transport Streams Data Fabric
Processing Element
Container
Processing Element
Container
Processing Element
Container
Processing Element
Container
Processing Element
Container
Can adapt to changes in resources, workload, data ratesCan adapt to changes in resources, workload, data rates
Streams Runtime Illustrated
SmartBay Overview
Multi year joint development between the Marine Institute, IBM and others Project 1:Next generation integrated cyberphysical environment for sensors in
environmental monitoring and management
Project 2 : Integrated data and information environment with innovative human interface and advanced visualization capabilities supporting multidisciplinary users in environmental monitoring management and sustainable energy
Project 3 : Advanced device monitoring and management for remote sensors and data collection/aggregation platforms
Project 4: Real-time distributed stream analytical fabric for environmental monitoring and management
Project 4 – Cetacean Streaming Analytics Goals
Real-time acoustic analysis of hydrophone data
Initial goals to use echo location clicks to identify Cetacean Species (and sub-species) Count Distance
Extended Goal Individual animal detection & recognition
Hydrophone Audio• High frequency
• Requirement to sample up to 300 KHz• Bottlenose dolphins produce directional, broadband clicks in
sequence. Each click lasts about 50 to 128 microseconds. Peak frequencies of echolocation clicks are about 40 to 130 kHz.
• Medium resolution (16bit mono) but can be higher
• Contains environmental (natural and artificial) noise• sound of weather on ocean surface, sea bed activity• artificial noise from marine traffic (propellers)• Seismic surveys
• Not normalised (significant drift)
Just over .5 second of data
Click periodicity varies based on activity e.g. navigation, hunting
Species Identification
In simpler terms “Click Detection” and “Click Profiling”
Three primary stages in operation Pre-click detection
establishing a click “hint”
Clean Up / Dynamic Filtering isolating key frequencies
Frequency and Time Based Click Profiling / Detection Arriving at a decision over species based on matching to known
characteristics
Dynamic Pre-Click Detection – a click “hint”
An algorithm to detect “potential” clicks in the acoustic data
By calculating a rolling average of the incoming sound pressure / intensity level we can then dynamically change the max / min thresholds used to for checking for potential clicks, holding above a given low threshold for a certain duration
Max
Min
Pre-Click Detection
Red – unfiltered, Green – Filtered, Blue – Click “hints”
Clean Up / Dynamic Filtering Number of factors effecting the intensity / sound pressure of the
received acoustic signal Water temperature
can change significantly above / below any thermocline monitored via the deployment buoy
Salinity maps are available can also be monitored but does not shift significantly
SpeciesMost significantly the Distance of source from the hydrophone
Necessary to have understanding of the sound pressure level and “species hint” (based on mean frequency) to apply appropriate filter
Specific filter applied based on industry recognised sound pressure lookup adjusted for distance
Click Detection & Profiling
Moving from a species “hint” to a more robust approach requires a series of checks to confirm a click and try to identify the species*
Comparison of the Relative energy in two different frequency bands
Peak spectral frequency in the click
The width of the main frequency peak (based on 80% of the energy)
The duration of the click * Credit Marjolaine Caillat, SMRU, 2005
FFT
Most of these techniques require that we provide both time domain and frequency domain for the input signal
FFT normally computed across a “window” of samples returning the frequency distribution of the time based signal
Window size is governed by the number of samples in a click rounded up to the nearest factor of 2
There is a computational overhead in running FFT on each click so minimising the window size is of benefit
Peak spectral frequency in the click
Pre-click Detector
Click Detection
Click Profiling and Detection – 4 CriteriaDynamic Filtering
WAV Decoder
High PassFilter (remove
< 10 kHz)
FFT (4096 Window)
Calc. mean frequency
“Species Hint” and split
PorpoiseFo 131 kHz
Calc Sound Pressure
Level (inv. sq.)
Common Dolphin
Fo = 80 kHz
Calc Sound Pressure
Level (inv. sq.)
175 dB
Low order BP Filter
161 dB
Medium order BP Filter
151 dB
High order BP Filter
230 dB
Low order BP Filter
216 dB
Low order BP Filter
210 dB
Low order BP Filter
SPL in dB is calculated using
the inverse square Law
High = 1 meterMedium= 5 meterLow = 20 meters
FFT (512 Window)
Mean Frequency
Band Energy
Peak Position and Width
Click Length
Click Counter
Dynamically Change -Click
Detector. Parameters
Depending on average SPL then Assign
Click #
Low PassFilter (remove
> 10 kHz)
*To Do Low Frequency
Species Click Detection
Determining Counts
Animal counts are based on building click trains through correlation
Correlate from click to click based on relative energy and peak frequency
Additional rules need applying to account for “re-visits”
Credit - Josefin Starkhammara, Johan Nilssona, Mats Amundinb, Tomas Janssona, Monica Almqvista, Hans W. Perssona
Visualisation - Spectrogram There are a number of metrics planned for the acoustic
work through the SmartBay portal Species breakdown Species count Distance from hydrophone
A “basic” spectrogram (sound intensity by frequency and time) is shown below
Performance
Our initial approach was to treat each audio sample (int) as a single tuple (300k per second)
Allowed us to implement all of the Streams operators quickly to validate the end to end processing
Recently moved to integer list of 128 samples per tuple up to the pre-click detector Followed by integer list of the total click per tuple And Integer list of the frequency spectrum for a complete click per tuple
Moving to multiple samples per tuple has resulted in approx. 20x performance improvement (faster than real time)
Currently testing on cluster of 4 x xSeries 3950 (8GB Ram, 2xQuad Core)
Next Steps
Implement development GUI to allow tuning of the detection values (by species)
Reference dB, salinity, temperature in addition to intensity to calculate distance Additional challenges down to frequency of the echo
location click as produced by each species – high frequencies don't tend to travel very far
Variations in the echo location click spread Porpoise click is very narrow and directional from
the melon and out in front of the animal
Add in the ability to check for low frequency species such as Fin Whales, Hump Back Whales etc.
Futures – Large Scale Ocean Energy Impact Analysis
IBM InfoSphere Streams directions WebSphereBusiness
Events
Existing business information
Data in motion
InfoSphere Warehouse IBM
MashupHub
8BI
ToolsStreams Studio enhancementsVideo/audio analyticsText/unstructured analyticsStreams Processing Language v2Native XML support
RuntimeHigh Availability Security enhancementsUnicode supportInstallation enhancements
AdaptersCognos NowWebSphere MQRSS feedsMashup HubWebSphere Business EventsOracleSQL Server
Millions of events per
second
Millisecond Latency
Cognos
Front Office
All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Any reliance on these statements are at the relying party's sole risk and will not create any liability or obligation for IBM.
InfoSphere Streams sessionsTime Session Title Location
Thursday May 2010:45 AM - 11:35 AM
3666A InfoSphere Streams for Real Time Analytics in Financial Services Industry
Marriott Park Hotel, Room 14
Friday May 2109:00 AM – 09:50 AM
3661A InfoSphere Streams helps Stockholm build Ver 2.0 Traffic Control System
Marriott Park Hotel, Room 13
Friday May 2111:30 AM - 12:30 PM
3692A InfoSphere Streams at Marine Institute of Ireland: Deep Dive
Marriott Park Hotel, IOD Mini Theatre 3
Wednesday 10AM - 6PMThursday 10AM - 5PMFriday 9AM - 2PM
Demo Room
InfoSphere Streams Demonstrations Marriott Park Hotel, IOD Demo Room Station 19
Wednesday 10:30 – 11:30Thursday 12:30 – 13:00Thursday 16:30 – 17:00
Mini Theater on Expo Floor
InfoSphere Streams in TelcoInfoSphere Streams Business InsightLeverage Warehouse, SPSS with Streams
Marriott Park Hotel, InfoSphere Mini Theater Expo Floor