62
Charlie Hull - Managing Director 20 th October 2015 Enterprise Search Europe [email protected] www.flax.co.uk/blog +44 (0) 8700 118334 Twitter: @FlaxSearch The Future of Search Fishing The Streams of Big Data

Enterprise Search Europe 2015: Fishing the big data streams - the future of search

Embed Size (px)

Citation preview

Page 1: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Charlie Hull - Managing Director20th October 2015Enterprise Search Europe

[email protected]/blog+44 (0) 8700 118334Twitter: @FlaxSearch

The Future of Search Fishing The Streams of Big Data

Page 2: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Building open source search applications since 2001

Independent, honest advice and analysis

Expert design & development, Apache Solr committers

Test-driven relevancy and performance tuning

Custom training & mentoring for your staff

Flexible support up to 24/7/365 with SLAs

Page 3: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Building open source search applications since 2001

Independent, honest advice and analysis

Expert design & development, Apache Solr committers

Test-driven relevancy and performance tuning

Custom training & mentoring for your staff

Flexible support up to 24/7/365 with SLAs

Come and join the open source search community (tonight?)

Page 4: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search
Page 5: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Fishing the streams...

10

@FlaxSearch

Page 6: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Where are we now?

Analytics based on search

Trends & predictions

Two new ideas

But what's this all got to do with search?

The future of search@FlaxSearch

Page 7: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

What sort of enterprise projects do we see at Flax?– Migrations – Greenfield – Speculative

Where are we now?@FlaxSearch

Page 8: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

What sort of enterprise projects do we see at Flax?– Migrations – Greenfield – Speculative

Unsurprisingly most are using open source

Where are we now?@FlaxSearch

Page 9: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

What sort of enterprise projects do we see at Flax?– Migrations – Greenfield – Speculative

Unsurprisingly most are using open source

….unless they're Sharepoint

Where are we now?@FlaxSearch

Page 10: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Two main stacks– Apache Lucene/Solr• Lucidworks (Fusion, SiLK)

– Elasticsearch• Elastic (ELK)

What's happening with open source?@FlaxSearch

Page 11: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Two main stacks– Apache Lucene/Solr• Lucidworks (Fusion, SiLK)

– Elasticsearch• Elastic (ELK)

Lots of other players (less for Elasticsearch)

What's happening with open source?@FlaxSearch

Page 12: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Two main stacks– Apache Lucene/Solr• Lucidworks (Fusion, SiLK)

– Elasticsearch• Elastic (ELK)

Lots of other players (less for Elasticsearch)

Focus on:– Log file analysis– Stability & scalability– Visualisation

What's happening with open source?@FlaxSearch

Page 13: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Analytics based on searchLog files, messages,transactions

@FlaxSearch

Page 14: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Analytics based on searchLog files, messages,transactions

@FlaxSearch

Page 15: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Analytics based on searchLog files, messages,transactions

@FlaxSearch

Page 16: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Analytics based on searchLog files, messages,transactions

@FlaxSearch

Page 17: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Analytics based on searchLog files, messages,transactions

@FlaxSearch

Page 18: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Analytics & visualisations

11

@FlaxSearch

Page 19: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Analytics & visualisations

11

@FlaxSearch

Page 20: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Big Data

Internet of Things

Cloud

Semantic Search

Mobile / Wearables

Trends @FlaxSearch

Page 21: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Big Data

Internet of Things

Cloud

Semantic Search

Mobile / Wearables

Trends @FlaxSearch

Page 22: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Charlie's All Purpose Big Data Graph

Data

Time

@FlaxSearch

Page 23: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Charlie's All Purpose Big Data Graph

Data

Time

@FlaxSearch

Page 24: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

“The IoT has the potential to connect 10X as many (28 billion) “things” to the Internet by 2020, ranging from bracelets to cars.” Goldman Sachs1

Scary scale@FlaxSearch

Page 25: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

“The IoT has the potential to connect 10X as many (28 billion) “things” to the Internet by 2020, ranging from bracelets to cars.” Goldman Sachs1

“IoT threatens to generate massive amounts of input data from sources that are globally distributed. Transferring the entirety of that data to a single location for processing will not be technically and economically viable” Gartner2

Scary scale@FlaxSearch

Page 26: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

“The IoT has the potential to connect 10X as many (28 billion) “things” to the Internet by 2020, ranging from bracelets to cars.” Goldman Sachs1

“IoT threatens to generate massive amounts of input data from sources that are globally distributed. Transferring the entirety of that data to a single location for processing will not be technically and economically viable” Gartner2

“Streaming analytics is anything but a sleepy, rearview mirror analysis of data. No, it is about knowing and acting on what’s happening in your business at this very moment — now.” Forrester3

Scary scale@FlaxSearch

Page 27: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

“We’re going to go from information in computers, like your supercomputer in your pocket, to us being surrounded by information. … The next information age will be an era of unbounded malignant complexity. Because there’s a lot of stuff. We have to get ready for that.”13

Autodesk research fellow Mickey McManus

Even more scary scale@FlaxSearch

Page 28: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

It will become increasingly difficult, if not impossible, to store data for later processing

Some predictions@FlaxSearch

Page 29: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

It will become increasingly difficult, if not impossible, to store data for later processing

It must therefore be processed as it appears, in real-time (Real Time Analytics)

Some predictions@FlaxSearch

Page 30: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

It will become increasingly difficult, if not impossible, to store data for later processing

It must therefore be processed as it appears, in real-time (Real Time Analytics)

Much of this data will be unstructured, noisy and badly formatted

Some predictions@FlaxSearch

Page 31: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

It will become increasingly difficult, if not impossible, to store data for later processing

It must therefore be processed as it appears, in real-time (Real Time Analytics)

Much of this data will be unstructured, noisy and badly formatted

There are many exciting applications - if this is done right

Some predictions@FlaxSearch

Page 32: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Unified Log

New ideas

4

5

@FlaxSearch

Page 33: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Unified Log

New ideas

4

5

“The Log: What every software engineer should know about real-time data's unifying abstraction” – Jay Kreps, LinkedIn (now Confluent)6

@FlaxSearch

Page 34: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Everything is a stream

More new ideas

“...think of a database as an always-growing collection of immutable facts. You can query it at some point in time — but that’s still old, imperative style thinking. A more fruitful approach is to take the streams of facts as they come in, and functionally process them in real-time”.- Martin Kleppman, LinkedIn 7

@FlaxSearch

Page 35: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Everything is a stream

More new ideas

8

“...think of a database as an always-growing collection of immutable facts. You can query it at some point in time — but that’s still old, imperative style thinking. A more fruitful approach is to take the streams of facts as they come in, and functionally process them in real-time”.- Martin Kleppman, LinkedIn 7

@FlaxSearch

Page 36: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Streaming data platforms

New technologies@FlaxSearch

Page 37: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Streaming data platforms

New technologies@FlaxSearch

Page 38: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Streaming data platforms

New technologies@FlaxSearch

Page 39: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Streaming data platforms

New technologies@FlaxSearch

Page 40: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Hadoop MapReduce is for batch processing, not stream processing

...although Storm etc. can run on HDFS

No elephant in the room?@FlaxSearch

Page 41: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

How can you currently process data in a stream?– SQL like– Regular Expressions– Machine Learning

So what's this got to do with Search? @FlaxSearch

Page 42: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

How can you currently process data in a stream?– SQL like– Regular Expressions– Machine Learning

Why not use full-text search?– Easier to create queries– Great with unstructured data– Handles noisy data

So what's this got to do with Search? @FlaxSearch

Page 43: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

How can you currently process data in a stream?– SQL like– Regular Expressions– Machine Learning

Why not use full-text search?– Easier to create queries– Great with unstructured data– Handles noisy data

But you can do search already!– But only near real time

So what's this got to do with Search? @FlaxSearch

Page 44: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

What's a stream?

1 2 3 4 5 6 7 8 9 10 11….. …..

append

– “..an append-only, totally ordered sequence of records (also called events or messages).” Martin Kleppman, LinkedIn 9

@FlaxSearch

Page 45: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

How do we search it?

1 2 3 4 5 6 7 8 9 10 11 12….. …..

append

?

Result

@FlaxSearch

Page 46: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

How do we search it?

1 2 3 4 5 6 7 8 9 10 11 12….. …..

append

?

Result

Oops!

@FlaxSearch

Page 47: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Search for Media Monitoring– Many complex stored profiles (searches)– High volume of news stories every day

Here's something we're doing already @FlaxSearch

Page 48: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Search for Media Monitoring– Many complex stored profiles (searches)– High volume of news stories every day

The solution:– Build an index of the stored profiles– Turn each news story into a query– Search the stored profiles to find which might match– Use these candidates to search the news story

Here's something we're doing already @FlaxSearch

Page 49: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Search for Media Monitoring– Many complex stored profiles (searches)– High volume of news stories every day

The solution:– Build an index of the stored profiles– Turn each news story into a query– Search the stored profiles to find which might match– Use these candidates to search the news story

We call it 'search turned upside down'– But it's effectively searching a stream

Here's something we're doing already @FlaxSearch

Page 50: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Like this...

(((";!MOBILE PHONE*"; OR ";PHONE MAST*"; OR ";HANDSET*"; OR ";CELL* PHONE*"; OR ";3G"; OR ";GPRS"; OR ";G.P.R.S"; OR ";!GENERAL !RADIO PACKET SERVICE*"; OR ";GSM"; OR ";G.S.M"; OR ";!GLOBAL SYSTEM FOR !MOBILE COMM*"; OR ";HSDPA"; OR ";H.S.D.P.A"; OR ";HIGH SPEED DOWNLINK !PACKET ACCESS"; OR ";HSUPA"; OR ";H.S.U.P.A"; OR ";HIGH SPEED !UPLINK !PACKET ACCESS"; OR ";UMTS"; OR ";U.M.T.S"; OR ";MVNO"; OR ";M.V.N.O"; OR ";SMS"; OR ";SHORT MESSAGE !SERVICE*"; OR ";MMS"; OR ";!MULTIMEDIA MESSAGE !SERVICE*"; OR ";!MOBILES"; OR ";!CELLPHONE*"; OR ";!TELECOM*"; OR ";!LANDLINE*"; OR ";!TELEPHONE*"; OR ";PHONE*"; OR ";!TELEKOM*"; OR ";TELCO*"; OR ";VODAFONE"; OR ";T-MOBILE"; OR ";TMOBILE"; OR ";!TELEFONICA"; OR ";BT"; OR ";!MOBILE USER*"; OR ";TEXT MESSAG*"; OR ";SMARTPHONE*"; OR ";!VIRGIN !MEDIA*"; OR ";CABLE & !WIRELESS"; OR ";CABLE AND !WIRELESS";) W/48 ((";PROFIT*"; OR ";LOSS*"; OR ";BAN"; OR ";BANNED"; OR ";PREMIUM RATE*"; OR ";FINANC*"; OR ";!REFINANC*"; OR ";OFFICE OF FAIR TRADING"; OR ";MERGER*"; OR ";!ACQUISIT*"; OR ";ACQUIR*"; OR ";TAKEOVER*"; OR ";BUYOUT*"; OR ";BUY-OUT*"; OR ";NEW PRODUCT*"; OR ";INVEST*"; OR ";SHARES"; OR ";MARKET*"; OR ";ACCOUNT*"; OR ";MONEY"; OR ";CASH*"; OR ";SECURIT*"; OR ";!ENTERPRIS*"; OR ";!BUSINESS*"; OR ";PRICE*"; OR ";JOINT*"; OR ";NEW VENTURE*"; OR ";PRICING"; OR ";COST*"; OR ";CHAIRM?N"; OR ";APPOINT*"; OR ";!EXECUTIVE"; OR ";SALE*"; OR ";SELL*"; OR ";FULL YEAR"; OR ";REGULAT*"; OR ";!DIRECTIVE*"; OR ";LAW"; OR ";LAWS"; OR ";!LEGISLAT*"; OR ";GREEN PAPER"; OR ";WHITE PAPER*"; OR ";!MEDIAWATCH"; OR ";MORAL*"; OR ";ETHIC*"; OR ";ADVERT*"; OR ";AD"; OR ";ADS"; OR ";MARKETING"; OR ";!COMPLAIN*"; OR ";MIS-SOLD"; OR ";MIS-SELL*"; OR ";SPONSOR"; OR ";COSTCUT*"; OR ";COST CUT*"; OR ";CUT* COST*"; OR ";FIBRE OPTIC*"; OR ";TAX"; OR ";TAXES"; OR ";TAXED"; OR ";EXPAND*"; OR ";!EXPANSION"; OR ";EMPLOY*"; OR ";STAFF"; OR ";WORKER*"; OR ";SPOKESM?N"; OR ";DEBUT"; OR ";BRAND*"; OR ";DIRECTOR*";) OR ((";FAIR"; OR ";UNFAIR"; OR ";%UNSCRUPULOUS"; OR ";NOT FAIR"; OR ";UNJUST*"; OR ";!PENALISE*";) W/12 (";CHARG*"; OR ";TARIFF*"; OR ";PRICE PLAN*"; OR ";GLOBAL";)))) AND NOT (";EXPRESS OFFER"; OR ";TIMES OFFER"; OR ";READER OFFER"; OR ((";CALLS COST";) W/6 (";FROM A LANDLINE"; OR ";FROM LANDLINE*"; OR ";BT LANDLINE*";))))

…and that's an easy one!

@FlaxSearch

Page 51: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Turning search upside down..

Docs

QueryQueryStoredQueries 1.

Pre

1 million profilesSome 250k longComplex rules

Doc 1 million news items a day

@FlaxSearch

QuerySubset

~200

Page 52: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Turning search upside down..

Docs

QueryQueryStoredQueries 1.

Pre

QuerySubset

Result

1 million profilesSome 250k longComplex rules

~200

2.Search

Doc 1 million news items a day

“this news item is of interest to my client”

@FlaxSearch

Page 53: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Turning search upside down..

Docs

QueryQueryStoredQueries 1.

Pre

QuerySubset

Result

1 million profilesSome 250k longComplex rules

~200

2.Search

Doc 1 million news items a day

“this news item is of interest to my client”

@FlaxSearch

Stream

Stream

Stream

Page 54: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Flax solution (Luwak) scales to 1m queries over 1m items/day

We need to be faster:– Network monitoring – up to 1m items/second– Restaurant reservations/reviews – 840m messages/day– IoT

So we built a proof of concept...

Searching streaming data at scale@FlaxSearch

Page 55: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Real-time scalable search for streams

1 2 3 4 5 6 7

…..

…..

A B C D E F G

….. …..

Which queries match?

N N N Y N….. …..

Documents

Queries

Parse

In-memory 1-doc index

Index of queries

Matches

https://github.com/romseygeek/samza-luwak

@FlaxSearch

Page 56: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Build queries using a standard search box, facets etc.

Set them to run on a stream of data - matches appear as another stream

What could you do with this?@FlaxSearch

Page 57: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Build queries using a standard search box, facets etc.

Set them to run on a stream of data - matches appear as another stream

Use cases– Monitor a network– Watch social media– Check IoT for faults, errors, trends– Look for patterns in customer interactions– Distributed web search– Combine with analytics & visualisations

What could you do with this?@FlaxSearch

Page 58: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Big Data....is about to get a lot bigger

In conclusion....@FlaxSearch

Page 59: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Big Data....is about to get a lot bigger

We'll need to react to it in real time

In conclusion....@FlaxSearch

Page 60: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Big Data....is about to get a lot bigger

We'll need to react to it in real time

Search can help!

In conclusion....@FlaxSearch

Page 61: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

Thankyou!

Any [email protected]/blog+44 (0) 8700 118334Twitter: @FlaxSearch

Page 62: Enterprise Search Europe 2015:  Fishing the big data streams - the future of search

1.Internet of Things - HorizonWatch 2015 Trend Report, Bill Chamberlin, IBM 2.Gartner Says the Internet of Things Will Transform the Data Center http://www.gartner.com/newsroom/id/2684915 3.Big Data Forrester Wave 2014 4.Unified Logging Layer: Turning Data into Action, Kiyoto Tamura, Fluentd 5.Unified Log Processing, Alexander Dean 6.The Log: What every software engineer should know about real-time data's unifying abstraction, Jay Kreps, LinkedIn/Confluent 7.Turning the database inside-out with Apache Samza, Martin Kleppman 8.Designing Data-Intensive Applications, Martin Kleppmann 9.Realtime Full-text Search with Luwak and Samza, Martin Kleppmann 10.Copyright Richard Masoner under a Creative Commons license

11. 12. Demonstrations by Mark Harwood of Elastic ()13. Trillions: Thriving in the Emerging Information Ecology (Wiley 2012), Mickey McManus

References