12
Apache Hadoop Innovation Summit Don’t Be Afraid of the Elephant in the Room February 12 & 13, 2015 Westin San Diego, San Diego, CA #Hadoop15

Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

Embed Size (px)

Citation preview

Page 1: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

Apache Hadoop Innovation Summit

Don’t Be Afraid of the Elephant in the Room

February 12 & 13, 2015 Westin San Diego, San Diego, CA

#Hadoop15

Page 2: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

Confirmed Speakers

Confirmed Speakers

• Enterprise Engineer, Google• Big Data Engineer, Groupon• Senior Director, Data Solutions, The New York Times• Director, Consumer Science Engineering, Netflix• Lead Research Scientist, eBay• Director, Data Engineering, Wikia• Data Scientist, Live Nation• Software Architect, AOL• Manager, Business Analytics, LinkedIn• Enterprise Architect, Art.com• Director, Big Data, Sears• Data Informatics Leader, GE• Engineering Lead, Twitter• Engineering Manager, Etsy• Senior Director, Data Management, Time Warner Cabel• Principal Architect, Schneider Electric• Data Architect, Simmons Prepared Foods•Vice President, Data Platforms, ESPN• Architect, Salesforce.com

Page 3: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

Past Delegates include• Director, Analytics - Facebook• Director, Insight - Red Bull• Vice President - Google• Senior Director - Coca-Cola• Data Engineer - Blizzard Entertainment• Senior Vice President - Samsung

Who Will You Meet? There is no question that IE. provides the gold standard events in the industry and will connect you with decision makers within the analytics industry. You will be meeting senior level execut ives from major corporations and innovative small to medium size companies.

Job Title Of Attendees

President/Principal

SVP/VP

C-Level

Snr. Director/Director

Global Head/ Head

Snr. Manager/Manager

Academic (1%)

78%

1000+ Employees300-999 Employees50-299 EmployeesLess than 49 Employees

Company Size Of Attendees

8%

11%

25%56% 81%Attendees are

companies with at least 300

employees

3%

21%

12%

42%

13%

8%

Attendees are at Director level or above

Page 4: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

In the cutting edge market of Big Data, modern businesses are faced with the challenge of storage, management, analysis, visualization and security. New technologies, solutions and challenges are exploding outwards as Big Data continues to grow exponentially. 

Hadoop, a huge piece of the puzzle, continues to present both exciting opportunities and engineering challenges. Can you become cloud native? What new alternative paradigms are available with Hadoop? What are the limitations of sole Hadoop use? How can you use it for

machine learning. What about Integration? Corporate Accessibility? Ethics? These burning issues are what the summit looks to address.

The Apache Hadoop Innovation Summit is an industry-led event. In principle, this means that attendees are working in engineering, architectural and data science roles. In practice, this means less sales pitches and more in-depth discussion on what like-minded professionals are doing with their Big Data.

About The Summit

Confirmed Speaker Information

Sriram is an Engineering Manager on the Data Platform team at Twitter, where he leads a fantastic group of engineers building core big data processing frameworks such as Summingbird, Scalding, Spark, and Parquet. Prior to that, he was the tech lead of the Big Data Platform team at Netflix, where he built and open sourced Genie, which is Netflix’s Hadoop Platform as a Service. Sriram has a Ph.D. in Computer Science from Indiana University, and spent several years at the San Diego Supercomputer Center working on advanced cyberinfrastructures for science and engineering applications.

Data Platform at Twitter - Enabling Real-time & Batch Analytics at Scale

The data platform at Twitter supports engineers and data scientists running batch jobs on Hadoop clusters that are several 1000s of nodes, and real-time jobs on top of systems such as Storm. In this presentation, I will discuss the overall data platform stack at Twitter. In particular, I will talk about Scalding, which is a Scala DSL for batch jobs using MapReduce, Summingbird, which is a framework for combined real-time and batch processing, and Tsar, which is a framework for real-time time-series aggregations. I will also discuss our experience with Spark, and where it fits in the overall ecosystem.

Sriram KrishnanBig Data, Cloud, Distributed Systems Engineering LeaderTwitter

Gopal Krishnan is Director of Consumer Science Engineering at Netflix.  He leads many aspects of the AB testing innovation to help personalize and improve Netflix experience. Previously, he spent over a decade at Yahoo on high scale infrastructure including building the first the global Yahoo homepage.

Data Platform at Twitter - Enabling Real-time & Batch Analytics at Scale

Netflix is renowned for it’s use of big data to improve personalization for our members.   Previously, our personalization depended only on explicit user inputs like star ratings, taste preference, plays, etc.   We recently incorporated additional implicit user signals such as interactions on device like scrolling, navigation, and idle time.   This session will focus on the challenges of using these new high volume data sources with billions of events/day.  What are the challenges of maintaining data quality across hundreds of device types?   How do we scale efficient nearline systems to serve this data for algorithmic consumption close to real time?

Gopal KrishnanDirector, Consumer Science EngineeringNetflix

Page 5: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

Arek Kaczmarek is responsible for the company's data platform and implementation of a new data platform based on Big Data technologies. He previously worked at Intel, as a Senior Big Data Solutions Architect at the Data Center Group. His skills include among others knowledge on the Big Data ecosystem, Hadoop/Hive/Pig, NoSQL, ELK (ElasticSearch/Logstash/Kibana), Lambda architecture, Oracle, data warehousing, ETL, BI Analytics, systems architecture, PaaS and the cloud

The Next Enterprise Data Warehouse is a Hadoop Data Lake

As the data volumes and data generation velocity start growing, so does the value of all the enterprise data being generated.   At the New York Times, we have moved away from the traditional Enterprise Data Warehouse based on dimensional modeling and created a data lake where the time to market for data solutions and applications is much faster and much more robust than it ever was before.  This presentation will provide an overview of the data lake approach, how to get there, and why it makes sense for companies with growing data volumes.  The discussion will focus on cost, architecture, and time to market solutions.  

Arek KaczmarekSenior Director, Platform & Data SolutionsThe New York Times

Thanigai is an enterprise architect, technologist and innovator with over 14 years of progressive experience specializing in building large, highly scalable software systems. At Art.com, Thanigai is the lead architect responsible for defining and driving the technology roadmap initiatives for building the next generation technology vision and platform for the company. Thanigai’s interests and specialties include Hadoop/Big data, NoSQL, Distributed Systems, Enterprise Architecture, Scalability, etc. Prior to joining Art.com, Thanigai has worked in engineering roles at Sanmina and Flextronics.

L e v e r a g i n g H a d o o p i n P o l y g l o t Architectures

At art.com, we have a heterogeneous web stack (java, node.js and .net) to support our global brands and multiple websites. In this session, I will share our experience in leveraging the power of Hadoop to reach multiple business goals. The talk will also focus on the tools that help in addressing concerns related to polyglot architectures such as interoperability, multi-tenancy, schema evolution and standardization. I will also talk about some frameworks and packages that help in codifying best patterns and practices in integrating Hadoop with other systems such as traditional Business Intelligence systems, Web Analytics and other distributed computing technologies like Apache Spark.

Thanigai VelloreEnterprise ArchitectArt.com

Mike Lurye is Senior Director, Enterprise Data Management for Time Warner Cable. He and his team are responsible for shared data warehousing assets and functions that benefit multiple Business Intelligence (BI) teams and their customers. This includes creation of enterprise data assets, BI architecture, quality assurance, and data quality management. In addition, Mike and his team are responsible for evaluation and adoption of Big Data technologies.  Prior to joining TWC Mike held Product Management and Product Marketing positions with Amdocs, focused on decision automation, mobile content and personalization solutions. Mike’s prior experience includes senior roles at major analytical CRM & marketing services companies.

Offloading ELT Workloads to Hadoop - Time Warner Cable’s Journey

Shifting ELT workloads from the enterprise data warehouse (EDW) to Hadoop is gaining traction for reducing costs, incorporating new data faster, and freeing up EDW capacity for user-facing analytics and BI workloads. But, where do you start and what’s the best approach? This presentation outlines the framework and processes that Time Warner Cable used to:·   Evaluate potential use cases and architectural options for Hadoop·  Identify ELT offload as the first focus area·   Choose technology components for the next generation enterprise data integration solution·   Apply best practices to configure Hadoop environment for data integration

Michael LuryeSenior Director, Enterprise Data ManagementTime Warner Cable

Page 6: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

Weidong Zhang earned his Ph.D in Computation fluid dynamics. He has a nature and passion of the analytics, research and data driven decision-making. He spent 10+ years in the data warehouse field, and tends to leverage his knowledge with the business intelligence and the Hadoop massive data process capability to address business needs. Currently, he worked as a manager in Data Analytics Infrastructure team in LinkedIn and leads the marketing and customer service data warehouse vertical.

Releasing the Power of Hadoop

As a data driven company, LinkedIn has very strong analytical teams, and has many data engineers, data scientists, business analysts and business users, who focus on different domains and business of the company. These users have different kind usage types and needs.  Making them more productive and efficient is the key point to make the company success. This talk covers the ecosystem our Data Analytics Infrastructure (DAI) team built, which release the power of Hadoop and make it easy to use. This ecosystem contains several open sourced products, such as: Pinot, Cubert, and Gobblin(, for fast computation and real time reporting support), and some tools to automatic reports generations. I will also cover the roles of our data warehouse team and our mission.

Weidong ZhangManager, Business AnalyticsLinkedIn

Nazli Dereli is currently a data scientist in Live Nation   Userscoring team. She is working on realtime   classification   of users and detection of abusive actors that are stopping users from   buying tickets by holding the tickets. Before joining Live Nation, she was   working in Data Mining and Bioinformatics Lab in University of California,  Santa Barbara focusing on mining brain activity networks to discover   insights on human l e a r n i n g . H e r i n t e r e s t s i n c l u d e s o c i a l a n d biological   network analysis, and interesting problems on data and graph mining.

Detecting Abusive Actors in Hadoop Ecosystem

Live Nation is the global event ticketing leader with 400,000,000 tickets   sold and 180,000 events ticketed in 19 countries. However there is always   the threat of growing multibillion dollars secondary market that intends to  prevent users from buying primary tickets. This talk will explain how to   detect such abusive actors in Hadoop ecosystem using different   approaches from offline, semionline   and online learning. We will go over   the process of building our system starting with different Hadoopbased  approaches leading to our final decision to use Apache Storm for   realtime   classification built on top of Hadoop ecosystem.

Nazali DereliData ScientistLive Nation

Beena is the Data Science Informatics Leader at GE. She leads the data efforts to support data science at GE.  She works across the GE businesses to drive advanced analytics development leveraging big data technologies. She is passionate about data and analytics to aid cross functional teams to derive data insights, aid teams in articulating questions they did not know they had and help view data in more effective ways.   Beena has over 20 years’ experience in the data arena with a number of international organizations including   British Telecom, E*trade and Thomson Reuters. She holds a Masters in Computer Science and MBA in Finance.

Making Hadoop Relevant for the Industrial Internet

Data management and advanced analytics are core to GE’s recent success in delivering superior software-based services to customers across aviation, power generation, oil & gas, healthcare, and transportation.   The torrent of data generated from machines, networks, devices and data centers in industry verticals provide challenges and opportunities. The challenge is to make this machine data meaningful and actionable to deliver on opportunities around operational efficiencies. I will share real-world case studies, leveraging Hadoop to demonstrate tangible operational benefits - ranging from fuel savings to improving productivity to reducing unscheduled maintenance to enhancing on-time performance - by tightly integrating machines, networked sensors, industrial-strength data, and software to enable intelligent insights and affect measurable outcomes.

Beena AmmanathData Informatics LeaderGE

Page 7: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

Ranjan Sinha is a Lead Data Scientist at eBay Inc. where he has led   projects that significantly enhanced consumers’ shopping experiences.   Previously, Dr. Sinha was a research academic at the University of  Melbourne and holds a PHD in Computer Science from RMIT University,  Australia. He has over 25 publications in top-tier venues such as IEEE Big   Data, VLDB Journal, and ACM SIGMOD. He was awarded the Sort Benchmark   medals for JouleSort and PennySort and was amongst WSJ’s Top-12   Asia-Pacific Young Inventors. He is a regular speaker on Big Data and Data   Science and co-organizes the popular Bay Area Search Meetup.  

Ranjan SinhaLead Research ScientisteBay, inc.

Ameya is a lead engineer on Groupon’s deal relevance and personalization system working on big data technologies such as Hadoop and HBase. Earlier he also built scalable message bus system that now powers Groupon's global service oriented architecture handling hundreds of millions of messages. Before Groupon, he was Sr Software Engineer at LiveOps working with distributed systems. Ameya holds masters in Information Systems from Carnegie Mellon University and masters in Computer Science from Pune University.

Ameya KantikarBig Data Infrastructure EngineerGroupon

As an engineering leader, Ben am as comfortable with strategy documents and presentations as I am deep in the code. He uses his understanding of the bigger picture to make the best tactical choices for his team in an agile environment. Bens specialties include: technical writing, hadoop, SaaS applications, big data, parallel algorithms, distributed computing, high performance computing

Ben JacksonSoftware EngineerAOL

Valentino is a Solutions Architect with Google Cloud Platform, helping companies accelerate innovation. Valentino focuses on Big Data and Cloud Computing use cases for large Enterprises. Prior to Google, Valentino spent his time at several startups, ranging from Streaming Big Data to Cloud Monitoring and Financial Analytics, and he began his career as a trader and quant developer at an options trading firm in Chicago. 

Valentino TereshkoEnterprise Sales EngineerGoogle

Page 8: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

The Information

For larger groups or special requests contact Bola by calling +1 415 692 5378 or email [email protected]* Team discounts are applicable at the point of registration only.

Ways to Register

+1 415 692 5378 +1 323 446 7673

Group Discount Offers3 Silver Passes: $3000 ($1000 per attendee)5 Silver Passes: $4500 ($900 per attendee)3 Gold Passes: $3900 ($1300 per attendee)5 Gold Passes: $6000 ($1200 per attendee)3 Diamond Passes: $4500 ($1500 per attendee)5 Diamond Passes: $7000 ($1400 per attendee)

Registration Pricing

Apache Hadoop Innovation SummitDate: February 12 & 13, 2015Location: San Diego, CaliforniaVenue: Westin San Diego Accommodation: Click here for online reservations

Register Here

Silver Pass

$1495Access to all sessions &

networking events7 days access to presentations from the

summit via ieOnDemand

$1295Early Bird Price(before Dec 12)

Diamond Pass

$1995Access to all sessions, networking

events, annual subscription to all content on the Big Data & Analytics channels via

ieOnDemand

$1795Early Bird Price(before Dec 12)

Gold Pass

$1795Access to all sessions, networking

events & unlimited access to presentations from the summit via

ieOnDemand

$1595Early Bird Price(before Dec 12)

1 Day Pass

$795Full access to the sessions to your chosen day of the summit, 7 days

access to presentations from the summit via ieOnDemand

7 dayonline access to event materials

On-Demand Pass

$600Unlimited access to presentations from the summit via ieOnDemand,

including presentations, interviews & the ability to contact speakers

Unlimited access to summit presentations

via ieOnDemand

Access All Areas Pass

$2295Access to all sessions of the Apache

Hadoop Innovation Summit, Data Science Innovation Summit & Predictive

Analytics Innovation Summit

Annual subscription to content on the Big Data & Analytics

channels via ieOnDemand

Page 9: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

NAME OF EACH ATTENDEE

TITLE OF EACH ATTENDEE DEPARTMENT

COMPANY INDUSTRY

ADDRESS CITY

STATE/PROVINCE ZIP/POSTAL CODE COUNTRY

EMAIL OF EACH ATTENDEE BUSINESS PHONE NUMBER

1. Delegate Information...

2. Pass Types...Early Bird Pass Options until December 12, 2014

Early Bird Silver: $1295 Attendees ____ Early Bird Gold: $1595 Attendees ____ Early Bird Diamond: $1795 Attendees ____ Early Bird One Day: $795 Attendees ____

Regular Pass Options after December 12, 2014 Silver Pass: $1495 Attendees ____ Gold Pass: $1795 Attendees ____ Diamond Pass: $1995 Attendees ____ One Day: $995 Attendees ____

Group Discount Pass Options 3 Silver Passes $3000 ($1000 per attendee) 5 Silver Passes $4500 ($900 per attendee) 3 Gold Passes $3900 ($1300 per attendee) 5 Gold Passes $6000 ($1200 per attendee) 3 Diamond Passes $4500 ($1500 per attendee) 5 Diamond Passes $7000 ($1400 per attendee)

For larger groups or special requests contact Bola by calling +1 415 692 5378 or email [email protected] passes only available when all participants register together.

Pass Descriptions:Silver Pass: Access to all sessions & networking eventsGold Pass: Access to all sessions, networking events & unlimited access to the summit presentations via ieOnDemandDiamond Pass: Access to all sessions, networking events, annual subscription to all content on the Big Data & Analytics channels via ieOnDemandAccess All Areas Pass: Access to all sessions of the Apache Hadoop Innovation Summit, Data Science Innovation Summit & Predictive Analytics Innovation Summit, networking events, annual subscription to all content on the Big Data & Analytics channels via ieOnDemand

Check (Make checks payable to The Innovation Enterprise Ltd) Invoice me

Visa Mastercard American Express Diners Club Discover

CARD NUMBER EXPIRATION DATE SECURITY NO.

CARDHOLDERS NAME CARDHOLDER’S SIGNATURE

BILLING ADDRESS -(same as above) INDUSTRY

Prices are exclusive of VAT. Places are transferable without any charge to another Summit occurring within 12 months of the original purchase. Team discounts are applicable at the point of registration only. Any cancellations within a group registration will in turn incur an increase in registration fee for the remaining group participants. Cancellations before January 12, 2015 incur an administrative charge of 50%. If you cancel your registration after January 12, 2015 you will be charged the full fee. You must notify The Innovation Enterprise in writing of a cancellation, or you will be charged the full fee. The Innovation Enterprise reserve the right to make changes to the program without notice. NB: FULL PAYMENT MUST BE RECEIVED BEFORE THE EVENT.

Registration FormApache Hadoop Innovation SummitFebruary 12 & 13, 2015 | Westin San Diego | San Diego, CAFor registration or more information on the program, please call Bola on +1 415 692 5378, or fax this registration form to +1 (323) 446 7673

3. Payment Options...

Page 10: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

Schedule

Networking Drinks 17.00 - 19.00

February 13

Session One 08.30 - 10.00

Coffee Break 10.00 - 10.30

Session Two 10.30 - 12.00

Lunch 12.00 - 13.30

Session Three 13.30 - 15.00

Coffee Break 15.00 - 15.30

Session Four 15.30 - 17.00

Day Two

February 12Day One 08.30

10.00

10.30

12.00

13.30

15.00

15.30

17.00

19.00

08.30

10.00

10.30

12.00

13.30

15.00

15.30

17.00

Session Five 08.30 - 10.00

Coffee Break 10.00 - 10.30

Session Six 10.30 - 12.00

Lunch 12.00 - 13.30

Session Seven 13.30 - 15.00

Coffee Break 15.00 - 15.30

Session Eight 15.30 - 17.00

Page 12: Apache Hadoop Innovation Summit - The Innovation …ie.theinnovationenterprise.com/eb/HadoopSD2015.pdf · Apache Hadoop Innovation Summit Don ... • Lead Research Scientist, eBay

Partnership Opportunities: Giles Godwin-Brown | [email protected] | +1 415 692 5498Attendee Invitation: Sean Foreman | [email protected] | +1 415 692 5514

NovemberBig Data & Analyticsfor PharmaNovember 4 & 5, Philadelphia

Big Data & Marketing Innovation SummitNovember 4 & 5, Miami

Big Data for Finance November 11 & 12, Boston

Data VisualizationSummit November 11 & 12, London

Chief Data Of!cer Summit November 11 & 12, London

Big Data & Analytics Innovation Summit November 11 & 12, London

Big Data & Analytics Innovation SummitNovember 25 & 26, Beijing

DecemberBig Data & Analytics in Banking Summit December 2 & 3, New York

Chief Data Of!cer Summit December 2 & 3, New York

JanuaryBig Data Innovation SummitJanuary 22 & 23, Las Vegas

Cloud InnovationExpoJanuary 22 & 23, Las Vegas

FebruaryData Science InnovationSummitFebruary 12, San Diego

Apache Hadoop InnovationSummitFebruary 12 & 13, San Diego

The Digital Oil!eld Innovation SummitFebruary 19 & 20, Houston

Big Data & Analytics Innovation Summit February 27 & 28, Singapore

JuneBig Data & Analytics for PharmaJune 10 & 11, Philadelphia

Open Data Innovation SummitJune 10 & 11, Boston Big Data & Analytics for Retail SummitJune 17 & 18, Chicago

MayBig Data Innovation SummitMay 13 & 14, London

Big Data & Analytics in Healthcare May 13 & 14, Philadelphia

Chief Data Of!cer SummitMay 20 & 21, San Francisco

Women

FinanceCXO HealthcareExpected

Flagship GovernmentHigh Tech

Pharma

Hadoop

MarchBig Data & Analytics Innovation SummitMarch 25 & 26, Brazil

2015 Calendar

AprilBig Data Innovation SummitApril 15 & 16, Santa Clara

DataTalentApril 15 & 16, Santa Clara

Data Visualization SummitApril 15 & 16, Santa Clara

Big Data Innovation SummitApril 23 & 24, Hong Kong

SeptemberBig Data & Analytics Innovation SummitSeptember 17 & 18, Sydney

Data Visualization SummitSeptember 23 & 24, Boston

Oil & Gas

September Continued

Big Data Innovation SummitSeptember 23 & 24, Boston

AugustBig Data & Analytics InnovationAugust 5 & 6Kuala Lumpur

Big Data & Analytics Innovation SummitAugust 19 & 20, Brazil