29
Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, [email protected] Roger Rea, [email protected]

Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, [email protected] Roger Rea, [email protected]

Embed Size (px)

Citation preview

Page 1: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Marine Institute of Ireland: Streams architecture and deployment deep diveKrishna Mamidipaka, [email protected]

Roger Rea, [email protected]

Page 2: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Housekeeping

• We value your feedback - don't forget to complete your evaluation for each session you attend and hand it tothe room monitors at the end of each session

• Overall Conference Evaluation will be providedat the General Session on Friday

• Visit the Expo Solutions Centre

• Please remember this is a 'non-smoking' venue!

• Please switch off your mobile phones

• Please remember to wear your badge at all times

Page 3: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Disclaimer

The Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Page 4: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Agenda

• Solution overview• Why InfoSphere Streams?• Solution details

Page 6: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

IBM InfoSphere Streams v1.2

Development Environment

Runtime Environment

Toolkits & Adapters

Front Office 3.0

RHEL v5.3 or v5.4x86 multicore hardwareInfiniBand supportUp to 125 servers

Eclipse IDEStreamSightStream Debugger

Connectors to data sourcesOperator LibraryFinancial ToolkitMining Toolkit

Page 7: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Streams Programming Model

Streams Processing Language

Input OutputProcess

Platform optimized compilation

Page 8: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

X86 Box

X86 Blade

CellBlade

X86 Blade

FPGABlade

X86 Processor

X86 Processor

X86Processor

X86 Processor

X86Processor

Streams Runtime Illustrated

TransportStreams Data Fabric

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation

Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation

Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters

Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters

Page 9: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

X86 Processor

X86 Processor

X86Processor

X86 Processor

X86Processor

Transport Streams Data Fabric

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Can adapt to changes in resources, workload, data ratesCan adapt to changes in resources, workload, data rates

Streams Runtime Illustrated

Page 10: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

SmartBay Overview

Multi year joint development between the Marine Institute, IBM and others Project 1:Next generation integrated cyberphysical environment for sensors in

environmental monitoring and management

Project 2 : Integrated data and information environment with innovative human interface and advanced visualization capabilities supporting multidisciplinary users in environmental monitoring management and sustainable energy

Project 3 : Advanced device monitoring and management for remote sensors and data collection/aggregation platforms

Project 4: Real-time distributed stream analytical fabric for environmental monitoring and management

Page 11: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com
Page 12: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Project 4 – Cetacean Streaming Analytics Goals

Real-time acoustic analysis of hydrophone data

Initial goals to use echo location clicks to identify Cetacean Species (and sub-species) Count Distance

Extended Goal Individual animal detection & recognition

Page 13: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Hydrophone Audio• High frequency

• Requirement to sample up to 300 KHz• Bottlenose dolphins produce directional, broadband clicks in

sequence. Each click lasts about 50 to 128 microseconds. Peak frequencies of echolocation clicks are about 40 to 130 kHz.

• Medium resolution (16bit mono) but can be higher

• Contains environmental (natural and artificial) noise• sound of weather on ocean surface, sea bed activity• artificial noise from marine traffic (propellers)• Seismic surveys

• Not normalised (significant drift)

Page 14: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Just over .5 second of data

Click periodicity varies based on activity e.g. navigation, hunting

Page 15: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Species Identification

In simpler terms “Click Detection” and “Click Profiling”

Three primary stages in operation Pre-click detection

establishing a click “hint”

Clean Up / Dynamic Filtering isolating key frequencies

Frequency and Time Based Click Profiling / Detection Arriving at a decision over species based on matching to known

characteristics

Page 16: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Dynamic Pre-Click Detection – a click “hint”

An algorithm to detect “potential” clicks in the acoustic data

By calculating a rolling average of the incoming sound pressure / intensity level we can then dynamically change the max / min thresholds used to for checking for potential clicks, holding above a given low threshold for a certain duration

Max

Min

Page 17: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Pre-Click Detection

Red – unfiltered, Green – Filtered, Blue – Click “hints”

Page 18: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Clean Up / Dynamic Filtering Number of factors effecting the intensity / sound pressure of the

received acoustic signal Water temperature

can change significantly above / below any thermocline monitored via the deployment buoy

Salinity maps are available can also be monitored but does not shift significantly

SpeciesMost significantly the Distance of source from the hydrophone

Necessary to have understanding of the sound pressure level and “species hint” (based on mean frequency) to apply appropriate filter

Specific filter applied based on industry recognised sound pressure lookup adjusted for distance

Page 19: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Click Detection & Profiling

Moving from a species “hint” to a more robust approach requires a series of checks to confirm a click and try to identify the species*

Comparison of the Relative energy in two different frequency bands

Peak spectral frequency in the click

The width of the main frequency peak (based on 80% of the energy)

The duration of the click * Credit Marjolaine Caillat, SMRU, 2005

Page 20: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

FFT

Most of these techniques require that we provide both time domain and frequency domain for the input signal

FFT normally computed across a “window” of samples returning the frequency distribution of the time based signal

Window size is governed by the number of samples in a click rounded up to the nearest factor of 2

There is a computational overhead in running FFT on each click so minimising the window size is of benefit

Page 21: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Peak spectral frequency in the click

Page 22: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Pre-click Detector

Click Detection

Click Profiling and Detection – 4 CriteriaDynamic Filtering

WAV Decoder

High PassFilter (remove

< 10 kHz)

FFT (4096 Window)

Calc. mean frequency

“Species Hint” and split

PorpoiseFo 131 kHz

Calc Sound Pressure

Level (inv. sq.)

Common Dolphin

Fo = 80 kHz

Calc Sound Pressure

Level (inv. sq.)

175 dB

Low order BP Filter

161 dB

Medium order BP Filter

151 dB

High order BP Filter

230 dB

Low order BP Filter

216 dB

Low order BP Filter

210 dB

Low order BP Filter

SPL in dB is calculated using

the inverse square Law

High = 1 meterMedium= 5 meterLow = 20 meters

FFT (512 Window)

Mean Frequency

Band Energy

Peak Position and Width

Click Length

Click Counter

Dynamically Change -Click

Detector. Parameters

Depending on average SPL then Assign

Click #

Low PassFilter (remove

> 10 kHz)

*To Do Low Frequency

Species Click Detection

Page 23: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Determining Counts

Animal counts are based on building click trains through correlation

Correlate from click to click based on relative energy and peak frequency

Additional rules need applying to account for “re-visits”

Credit - Josefin Starkhammara, Johan Nilssona, Mats Amundinb, Tomas Janssona, Monica Almqvista, Hans W. Perssona

Page 24: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Visualisation - Spectrogram There are a number of metrics planned for the acoustic

work through the SmartBay portal Species breakdown Species count Distance from hydrophone

A “basic” spectrogram (sound intensity by frequency and time) is shown below

Page 25: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Performance

Our initial approach was to treat each audio sample (int) as a single tuple (300k per second)

Allowed us to implement all of the Streams operators quickly to validate the end to end processing

Recently moved to integer list of 128 samples per tuple up to the pre-click detector Followed by integer list of the total click per tuple And Integer list of the frequency spectrum for a complete click per tuple

Moving to multiple samples per tuple has resulted in approx. 20x performance improvement (faster than real time)

Currently testing on cluster of 4 x xSeries 3950 (8GB Ram, 2xQuad Core)

Page 26: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Next Steps

Implement development GUI to allow tuning of the detection values (by species)

Reference dB, salinity, temperature in addition to intensity to calculate distance Additional challenges down to frequency of the echo

location click as produced by each species – high frequencies don't tend to travel very far

Variations in the echo location click spread Porpoise click is very narrow and directional from

the melon and out in front of the animal

Add in the ability to check for low frequency species such as Fin Whales, Hump Back Whales etc.

Page 27: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

Futures – Large Scale Ocean Energy Impact Analysis

Page 28: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

IBM InfoSphere Streams directions WebSphereBusiness

Events

Existing business information

Data in motion

InfoSphere Warehouse IBM

MashupHub

8BI

ToolsStreams Studio enhancementsVideo/audio analyticsText/unstructured analyticsStreams Processing Language v2Native XML support

RuntimeHigh Availability Security enhancementsUnicode supportInstallation enhancements

AdaptersCognos NowWebSphere MQRSS feedsMashup HubWebSphere Business EventsOracleSQL Server

Millions of events per

second

Millisecond Latency

Cognos

Front Office

All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice. Any reliance on these statements are at the relying party's sole risk and will not create any liability or obligation for IBM.

Page 29: Marine Institute of Ireland: Streams architecture and deployment deep dive Krishna Mamidipaka, krishnag@us.ibm.com Roger Rea, rrea@us.ibm.com

InfoSphere Streams sessionsTime Session Title Location

Thursday May 2010:45 AM - 11:35 AM

3666A InfoSphere Streams for Real Time Analytics in Financial Services Industry

Marriott Park Hotel, Room 14

Friday May 2109:00 AM – 09:50 AM

3661A InfoSphere Streams helps Stockholm build Ver 2.0 Traffic Control System

Marriott Park Hotel, Room 13

Friday May 2111:30 AM - 12:30 PM 

3692A InfoSphere Streams at Marine Institute of Ireland: Deep Dive

Marriott Park Hotel, IOD Mini Theatre 3

Wednesday 10AM - 6PMThursday 10AM - 5PMFriday 9AM - 2PM

Demo Room

InfoSphere Streams Demonstrations Marriott Park Hotel, IOD Demo Room Station 19

Wednesday 10:30 – 11:30Thursday 12:30 – 13:00Thursday 16:30 – 17:00

Mini Theater on Expo Floor

InfoSphere Streams in TelcoInfoSphere Streams Business InsightLeverage Warehouse, SPSS with Streams

Marriott Park Hotel, InfoSphere Mini Theater Expo Floor