Upload
reda-benzair
View
305
Download
0
Embed Size (px)
Citation preview
A Better Rich Media Experience & Video Analytics at Arkena with Apache Hadoop
Welcome to today’s webinar.We will begin shortly.
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Today’s Presenters
Reda BenzairVP Technical Development, Arkena
John KreisaVP International Marketing, Hortonworks
A Better Rich Media Experience & Video Analytics at Arkena with Apache Hadoop
Reda Benzair VP Technical Development
Feb 2016
AGENDA
WHO WE ARECDN / OTT Business
1
AGENDA
WHO WE ARECDN / OTT Business
Why Media Experience & Video Analytics is so Important for CDN.
21
AGENDA
WHO WE ARECDN / OTT Business
Why Media Experience & Video Analytics is so Important for CDN.
Video Analytics Challenge & Difficulties
321
AGENDA
WHO WE ARECDN / OTT Business
Why Media Experience & Video Analytics is so Important for CDN.
Video Analytics Challenge & Difficulties
Why we chosen Hadoop Technology.
321
4
AGENDA
WHO WE ARECDN / OTT Business
Why Media Experience & Video Analytics is so Important for CDN.
Video Analytics Challenge & Difficulties
Why we chosen Hadoop Technology.
Architecture & Result
3
5
21
4
AGENDA
WHO WE ARECDN / OTT Business
Why Media Experience & Video Analytics is so Important for CDN.
Video Analytics Challenge & Difficulties
Why we chosen Hadoop Technology.
Architecture & Result Why we selected Hortonworks
3
5
21
64
WHO WE AREYOUR TRUSTED MEDIA PARTNER
10
A TDF Group Business Unit
• 16 POPs CDN
• 1 Tbps connectivity
•400 live radios & 360 live TVs
•630 hours of On Demand video processed daily
• United Kingdom• Norway• USA
• Finland• Denmark• Poland
•France•Spain•Sweden
13 Offices in 9 Countries
A team of 400 employees
At a glance
CUSTOMERSREFERENCES
MEDIA COMPANIES & OPERATORSSOLUTIONS AND SERVICES
Cloud4Mediais our SaaS/PaaS
service that provides all the necessary tools for
managing and exchanging media
assets.
12
Cloud4Mediaa SaaS/PaaS service
that provides all the necessary tools for managing and exchanging
media assets.
PLAYOUToptimized for the audiovisual
industry, designed to distribute your live and
on-demand content.
OTT / CDNsolution with modular components
that enables content owners, telecom operators and broadcasters to provide video content to viewers
worldwide.
Video Platformprovides enterprises,
organizations and the public sector with all-in-one tool to
publish, manage and distribute video live or on-demand to every
device.
Playis part of the Arkena Video Platform and handles video
playback. Learn how to customize PLAY to fit your
needs.
Mobile Publisheris an add-on to the Arkena Video Platform that lets you
publish live broadcasts and on demand videos directly with
your Iphone.
CDN PLATFORM (CONTENT DELIVERY NETWORK)
ARKENA OTT / CDN A UNIQUE EUROPEAN PRESENCE, ESPECIALLY FRANCE AND NORDICS
14
We offer advanced CDN solutions for media Live & OnDemand Streaming First-class Origin Transmuxing service Ads Insertion (Audio: Triton, Radionomy,
Adswizz) Timeshifting and Catch-up services
Arkena will offer a set of new media analytics services Real Time Analytics Advanced Media Analytics
14
OTT SOLUTIONS (OVER-THE-TOP)
ARKENA OTT / CDN A UNIQUE EUROPEAN PRESENCE, ESPECIALLY FRANCE AND NORDICS
1616
Content management & animation• Metadata and catalog organization• Offer scheduling and promotions• Subscription, rental and purchase models• Automatic sorts and optimized API for OTT apps display
User accounts management• User ownership tracking • DRM entitlements• Device pairing and restrictions• Multiscreen favorites and resume
Content processing & protection• Adaptive streaming and download support• Multiple audio tracks and subtitles support• Smooth streaming with Playready and DASH with Marlin• Geoblocking and streaming limits
17
Arkena OTT / CDN Analytics Challenge
“Infrastructure capable of handling Millions of simultaneous connections/requests”.
CDN Architecture
Media specialized CDN with strong presence in Europe, especially France and Nordics: audiovisual media streaming-dedicated CDN
Video and Audio delivery, Live and On-Demand services Multiscreen workflow expertise and broadcast / IP convergence
18
We deliver your contents with optimal performance on all devices
NETWORK
IPRegional Network
CACHING SERVER
ORIGIN / TRANSMUX SERVICE
PoP
CACHING SERVER
PoP
CACHING SERVER
PoP
IPRegional Network
IPRegional Network
More than 300 CDN customers in Europe.
With 16 European PoPs, local to final end-users Capacity: 1Tbps, very high storage capacity (~PB) More than 1000 streaming servers.
CDN Architecture
Media specialized CDN with strong presence in Europe, especially France and Nordics: audiovisual media streaming-dedicated CDN
Video and Audio delivery, Live and On-Demand services Multiscreen workflow expertise and broadcast / IP convergence
19
We deliver your contents with optimal performance on all devices
NETWORK
IPRegional Network
CACHING SERVER
ORIGIN / TRANSMUX SERVICE
PoP
CACHING SERVER
PoP
CACHING SERVER
PoP
IPRegional Network
IPRegional Network
More than 300 CDN customers in Europe.
With 16 European PoPs, local to final end-users Capacity: 1Tbps, very high storage capacity (~PB) More than 1000 streaming servers.
CLUSTER
Logs trafic
Why Media Experience & Video Analytics is so Important
20
Customer Trust
Real Time AnalyticsAdvanced MetricsAdvanced Media Analytics
to monetize your audience.
Billing & PaymentReporting, Billing
20
Why we need an Efficient Analytics
System
21
Data overload every second
daily raw log size(uncompressed, no replication)
20 GB to 200 GB per
day
Video Analytics Challenge
a peak rate of 60K Events/Secon
d
keep raw logs for 3-9 months
average raw log data input rate
20 Mbpsto 120 Mbps
daily raw log size(uncompressed, no replication)
20 GB to 200 GB per
day
Video Analytics Challenge
We compute15 Metrics at every batch: Volume, Hits, Session duration, Concurrent sessions, Unique viewers...
All metrics are available over 15 Dimensions Country, City, User agent, Browser, HTTP status code...
Real time statistics should be provided in
3 mina peak rate of 60K
Events/Second
keep raw logs for 3-9 months
average raw log data input rate
20 Mbpsto 120 Mbps
daily raw log size(uncompressed, no replication)
20 GB to 200 GB per
day
Video Analytics Challenge
We compute15 Metrics at every batch: Volume, Hits, Session duration, Concurrent sessions, Unique viewers...
All metrics are available over 15 Dimensions Country, City, User agent, Browser, HTTP status code...
Real time statistics should be provided in
3 mina peak rate of 60K
Events/Second
keep raw logs for 3-9 months
1 CDN "Edge" server generatesan average of
15 – 22 Million Lines/Day
DASH Adaptative Bitrate Streaming
average raw log data input rate
20 Mbpsto 120 Mbps
1 Movie (HD, 1 hour) in DASH format with 8 Video Tracks1 Audio Track
4200 log events
25
Video Analytics Challenge
Difficulty & Challenge
26
Video Analytics Challenge
Safely Transport the data in Real time from differents POP to the DATA cluster.
TRANSPORT
27
Video Analytics Challenge
Make life easy for the Operation
OPERATION
28
Video Analytics Challenge
Store the data safely over a long period.Compute the Metrics in Real Time.Consolidate in Batch
STORE DATA
29
Video Analytics Challenge
Compute the Analytics metrics in Real Time.
Compute DATA in Real Time
30
Video Analytics Story
2012 20142013 2015
Arkena Analytics has built and developed In House.
There is a major problem in production with a significant downtime.
Home made Open Source
31
Video Analytics Story
2012 20142013 2015
Arkena Analytics has built and developed In House.
There is a major problem in production with a significant downtime.
Analysis of the market.Make or Buy
Launching of the project with the partners. Build the team (1 Project Manager, 1 Developer , 1 System Engineer)
Home made Open Source
POC
32
Video Analytics Story
2012 20142013 2015V 1
Arkena Analytics has built and developed In House.
There is a major problem in production with a significant downtime.
Analysis of the market.Make or Buy
Launching of the project with the partners. Build the team (1 Project Manager, 1 Developer , 1 System Engineer)
Release the Analytics Platform to the operation team and open the services to the customers.
Home made Open Source
POC
33
Video Analytics Challenge
Hadoop Technology
34
TRANSPORT
Flume Apache Flume is a distributed, reliable, and available service
for efficiently collecting, aggregating, and moving large amounts of streaming data into the HDP cluster.
Flume already Integrated in HDP: YARN coordinates data ingest from Apache Flume and other services that deliver raw data into an HDP cluster.
Rsyslog RSYSLOG is the rocket-fast system for log processing. It offers
high-performance, great security features and a modular design to transport data from our Edge.
We use the RELP protocol (The Reliable Event Logging Protocol). protocol to provide reliable delivery of event messages.
Transport Safe
35
STORE DATA
Shared data set In-house solution: can't query the whole data set HDP: single entry point from HDFS, can query and
cross-correlate everything from the beginning of times (almost).
Opportunities In-house solution: Rigid, A nightmare for the
operational teams. HDP: Give us new opportunities (Machine learning, new
metrics,…).
Stability & TrustHortonworks Data Platform In-house solution: add clusters to scale out (we had
3!) HDP: add nodes to scale out (storage + compute)
36
OPERATION
Reliability & Scalability
YARN View your cluster as a single Data Operating System Run multiple jobs on multiple processing engines High availability with Standby Resource Manager Easy scale-out by adding more YARN NodeManagers
Queue Management Make sure business-critical jobs never lack resources Separate operation tasks from business tasks Validate new jobs' versions with no production impact
37
OPERATION
Compute Real Time HDP Stack SPARK Streaming HDP packages and incorporates the most recent and hadoop
software technology in the same Stack (Spark, Hive,Tez,…). Apache Spark is a fast, in-memory data processing engine.
Process the data very 2 min. HDP YARN-based architecture provides the foundation that
enables Spark and other applications to share a common cluster and dataset while ensuring consistent levels of service and response.
38
OPERATION
Compute Real Time HDP Stack SPARK Streaming HDP packages and incorporates the most recent and hadoop
software technology in the same Stack (Spark, Hive,Tez,…). Apache Spark is a fast, in-memory data processing engine.
Process the data very 2 min. HDP YARN-based architecture provides the foundation that
enables Spark and other applications to share a common cluster and dataset while ensuring consistent levels of service and response.
Use Architecture Lambda• Processing real Time : Spark Streaming.• Synchronize the Data in the HDFS.• Consolidate the data with Hive/Tez.• Ingest in the ElasticSearch.
Events
Near Real Time
Store Batch
39
OPERATION
Reduce operational cost day-to-day
Easy To Use for the long run Easy Setup and Installation. Machine provisioning and capacity planning. Easier Provisioning and Faster Cluster Deployment
Ambari Expand clusters automatically as new nodes come
online Track cluster health, job progress and KPIs with alerts,
customizable views, customizable dashboards... REST API making deployment & configuration easy to
automate with modern conf management tools (Ansible)
ARKENA CDN : Analytics Cluster
Transport
ARKENA CDN : Analytics Cluster
HDP ComputeTransport
ARKENA CDN : Analytics Cluster
IndexingHDP Compute Customer Front-EndTransport
43
ARKENA CDN : HDP Cluster
1 2
3 45
Live ProcessingBatch Processing
Transport Multiple Processing Archivage Operations
44
ARKENA CDN : Hardware Cluster
A peak rate of 60K
Events/Secondkeep raw logs for
3-9 months
HDP Compute Cluster : We made the choice on DELL R730 the configuration we have
set with 16 Core, 128G RAM and 14 disk with 1To.
We attempted to respect the rule of thumb for Hadoop of (1 Disk -> 8G RAM -> 1 physical core) in order to optimize the
I/O performances with 10 file channel per machine and we kept 2 disk for the system.
ElasticSearch Cluster : We choice 5 machines M610 in order to have an odd number
for the redundancy and the failover
45
ARKENA CDN : Hardware Cluster
A peak rate of 60K
Events/Secondkeep raw logs for
3-9 months
Elastic Search Cluster 5 Machines
Cluster API 6 VM
HDP Cluster 8 Machines
HDP Compute Cluster : We made the choice on DELL R730 the configuration we have
set with 16 Core, 128G RAM and 14 disk with 1To.
We attempted to respect the rule of thumb for Hadoop of (1 Disk -> 8G RAM -> 1 physical core) in order to optimize the
I/O performances with 10 file channel per machine and we kept 2 disk for the system.
ElasticSearch Cluster : We choice 5 machines M610 in order to have an odd number
for the redundancy and the failover
46
ARKENA CDN : Transport
Form Edge to Cluster Rsyslog transport the logs from Edge to log
aggregator component. Feature available on Rsyslog : RELP protocol Native in Linux Disk Assit Queue beffuering
Ingest to HDFS Apache Flume is used to fetch the logs from
Rsyslog and push them to HDFS.
Transport Technology It’s not just how quickly you move data, but how
move safly from the Edge to the Cluster without losing any lines.
How we can have resilient solution : mixed 2 softwares.
47
ARKENA CDN : Transport
The log aggregator (with Rsyslog) The log aggregator is responsible for reliably
forwarding the logs to the Compute Cluster. If the compute cluster is unavailable or networks
issue, the logs are spooled on disk, and stay on the aggregators until the compute cluster comes back online.
Logs are sent from the edge servers to Log Aggregators. There is one aggregator per PoP.
log aggregators are not specific to any PoP, we could reproduce this setup on any PoP, hereby designated as "PoPx" or "PoPy", just by deploying generic log aggregators.
48
ARKENA CDN : Transport
The log aggregator (with Rsyslog) The log aggregator is responsible for reliably forwarding the
logs to the Compute Cluster. If the compute cluster is unavailable or networks issue, the
logs are spooled on disk, and stay on the aggregators until the compute cluster comes back online.
Logs are sent from the edge servers to Log Aggregators. There is one aggregator per PoP.
log aggregators are not specific to any PoP, we could reproduce this setup on any PoP, hereby designated as "PoPx" or "PoPy", just by deploying generic log aggregators.
49
ARKENA CDN : HDFS
Ingest into HDFS The logs are ingested in HDFS once the local
Rsyslog on each Hadoop node receives an event. Apache Flume is used to fetch the logs from
Rsyslog and push them to HDFS. The local Rsyslog forwards an event to the local
Flume agent (TCP connection to `localhost`) The Flume agent then proceeds to send the logs to
HDFS, while buffering them on disk for durability reasons.
50
ARKENA CDN : HDFS
Ingest into HDFS The logs are ingested in HDFS once the local
Rsyslog on each Hadoop node receives an event. Apache Flume is used to fetch the logs from
Rsyslog and push them to HDFS. The local Rsyslog forwards an event to the local
Flume agent (TCP connection to `localhost`) The Flume agent then proceeds to send the logs to
HDFS, while buffering them on disk for durability reasons.
51
ARKENA CDN : HDFS
Ingest into HDFS An SyslogSource, listening on a TCP socket, receives
the incoming rsyslog event a "FileChannel" listens for incoming events on the
rsyslog TCP source, and writes them locally to 10 different "datadirs" on 10 separate physical hard disk drives.
Each datadir acts as a FIFO. Load is balanced evenly from the single Rsyslog TCP source to the 10 datadirs
The "FileChannel" is plugged to 4 "HDFS Sinks". When enough events have been buffered in the channel, those events are sent to the 4 HDFS sinks in an evenly balanced fashion.
52
ARKENA CDN : Customer Front-End
53
ARKENA CDN : Spark Streaming
Events
Near Real Time
Store
Events
1
1
1
54
ARKENA CDN : Spark Streaming
Events
Near Real Time
Store Batch
Events
2
3
Arkena choose Hadoop (Hortonworks)
56
Why we selected Hortonworks
Avoid Vendor Lock In Hortonworks Data Platform is close to the open
source trunk as possible and is developed 100% in the open so you are never locked in.
Present a single, tested and completely open
Hadoop platform with no proprietary bolt-ons.
Transparency Price Model & Unlimited Support Throughout
our projects
“Hortonworks loves and lives open source innovation”, Arkena does as well!
57
Why we selected Hortonworks
Connect With the Community We employ a large number of Apache project
committers & innovators so that you are represented in the open source community.
Only Hortonworks can deliver the deepest level of support across all the components of the Hadoop platform.
Support from the Experts They provide the highest quality of support for
deploying at scale.
“Hortonworks loves and lives open source innovation”, Arkena as well.
58
59
What Happened after the Release
We have identified some improvement items after the production release.
60
What Happened after the Release
We have identified some improvement items after the production release.
Transport
61
What Happened after the Release
We have identified some improvement items after the production release.
Transport Operation
62
About The team
Reda Benzair
Projet Roles : Architect & Project management
Work ExperienceExecutive MBA, Graduate from Engineering School and Master of Advanced Study university (DEA). 15 years of experience in SmartJog SAS (become Arkena in 2013) TDF subsidiary. Since 2013 VP Technical development, leading technical development team located in Paris, Stockholm and Warsaw.
Projet Roles : Senior Software Engineer, Spark, System
Work ExperienceA passionate programmer with a strong interest in devops and software craftmanship, Erwan has been working on complex distributed architectures duringthe last 10 years. He joined Arkena as a general-purpose Analytics engineer, worked on the Hadoop data processing pipeline, developped a decent chunk
Erwan Queffelec Julien Girardin
Projet Roles : Senior System administrator and python developper.
Work ExperienceA passionate with Linux system with a strong interest in devops and python development. Strong experience with complex distributed architectures.
Page 63 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ON
LY 100open source
Apache Hadoop data platform
%Founded in 2011
HADOOP1STprovider to go public
IPO 4Q14 (NASDAQ: HDP)
subscriptioncustomers800+ employees across
~850
countries
technology partners1,600+ 16
TM
About Hortonworks
Fastest enterprise software company to reach $100 million in annual revenue
(Barclays research, 2015)
Page 64 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 64
Social Mapping
Payment Tracking
Factory Yields
Defect Detection
Call Analysis
Machine Data
Product Design M & A
Due Diligence
Next Product
Recs
Cyber Security
Risk Modeling
Ad Placement
Proactive Repair
Disaster Mitigation
Investment Planning
Inventory Predictions
Customer Support
Sentiment Analysis
Supply Chain
Ad Placement
Basket Analysis Segments
Cross-Sell
Customer Retention
Vendor Scorecards
Optimize Inventories
OPEX Reduction
Mainframe Offloads
Historical Records
Dataas a
Service
PublicData
Capture
Fraud Prevention
Device Data
Ingest
Rapid Reporting
Digital Protection
Hadoop Summit 2016 - DublinDate: Wednesday 13 – Thursday 14 April, 2016Venue: Convention Centre DublinWebsite: www.hadoopsummit.org
Why Should You Attend?
• Hadoop Summit is Europe’s premier industry event for Apache Hadoop users, developers and vendors• Two full days of practical and cutting edge education designed by the community – for the community• Over 90 sessions spanning 7 tracks dedicated to enabling the next generation data platform• A Community Showcase featuring the industries who’s who• Crash courses for those just beginning with Hadoop• Community driven meetups• Birds of a Feather (BoFs) meetings to promote collaboration • Comprehensive pre-event hands on classroom training • A social program which provides ample opportunity to network and make new industry connections• An amazing event party at the Guinness Storehouse Brewery
Plus much much more!
Register Now to take advantage of our Early Bird rates!
Page 66 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions & Next Steps
Attend our next webinarDownload the sandboxTry Hortonworks Data Flow
Page 66 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
67
Why we selected Hortonworks
Thank you for your attention!