Upload
vuongtram
View
221
Download
4
Embed Size (px)
Citation preview
Apache Hadoop Innovation Summit
Don’t Be Afraid of the Elephant in the Room
February 12 & 13, 2015 Westin San Diego, San Diego, CA
#Hadoop15
Confirmed Speakers
Confirmed Speakers
• Enterprise Engineer, Google• Big Data Engineer, Groupon• Senior Director, Data Solutions, The New York Times• Director, Consumer Science Engineering, Netflix• Lead Research Scientist, eBay• Director, Data Engineering, Wikia• Data Scientist, Live Nation• Software Architect, AOL• Manager, Business Analytics, LinkedIn• Enterprise Architect, Art.com• Director, Big Data, Sears• Data Informatics Leader, GE• Engineering Lead, Twitter• Engineering Manager, Etsy• Senior Director, Data Management, Time Warner Cabel• Principal Architect, Schneider Electric• Data Architect, Simmons Prepared Foods•Vice President, Data Platforms, ESPN• Architect, Salesforce.com
Past Delegates include• Director, Analytics - Facebook• Director, Insight - Red Bull• Vice President - Google• Senior Director - Coca-Cola• Data Engineer - Blizzard Entertainment• Senior Vice President - Samsung
Who Will You Meet? There is no question that IE. provides the gold standard events in the industry and will connect you with decision makers within the analytics industry. You will be meeting senior level execut ives from major corporations and innovative small to medium size companies.
Job Title Of Attendees
President/Principal
SVP/VP
C-Level
Snr. Director/Director
Global Head/ Head
Snr. Manager/Manager
Academic (1%)
78%
1000+ Employees300-999 Employees50-299 EmployeesLess than 49 Employees
Company Size Of Attendees
8%
11%
25%56% 81%Attendees are
companies with at least 300
employees
3%
21%
12%
42%
13%
8%
Attendees are at Director level or above
In the cutting edge market of Big Data, modern businesses are faced with the challenge of storage, management, analysis, visualization and security. New technologies, solutions and challenges are exploding outwards as Big Data continues to grow exponentially.
Hadoop, a huge piece of the puzzle, continues to present both exciting opportunities and engineering challenges. Can you become cloud native? What new alternative paradigms are available with Hadoop? What are the limitations of sole Hadoop use? How can you use it for
machine learning. What about Integration? Corporate Accessibility? Ethics? These burning issues are what the summit looks to address.
The Apache Hadoop Innovation Summit is an industry-led event. In principle, this means that attendees are working in engineering, architectural and data science roles. In practice, this means less sales pitches and more in-depth discussion on what like-minded professionals are doing with their Big Data.
About The Summit
Confirmed Speaker Information
Sriram is an Engineering Manager on the Data Platform team at Twitter, where he leads a fantastic group of engineers building core big data processing frameworks such as Summingbird, Scalding, Spark, and Parquet. Prior to that, he was the tech lead of the Big Data Platform team at Netflix, where he built and open sourced Genie, which is Netflix’s Hadoop Platform as a Service. Sriram has a Ph.D. in Computer Science from Indiana University, and spent several years at the San Diego Supercomputer Center working on advanced cyberinfrastructures for science and engineering applications.
Data Platform at Twitter - Enabling Real-time & Batch Analytics at Scale
The data platform at Twitter supports engineers and data scientists running batch jobs on Hadoop clusters that are several 1000s of nodes, and real-time jobs on top of systems such as Storm. In this presentation, I will discuss the overall data platform stack at Twitter. In particular, I will talk about Scalding, which is a Scala DSL for batch jobs using MapReduce, Summingbird, which is a framework for combined real-time and batch processing, and Tsar, which is a framework for real-time time-series aggregations. I will also discuss our experience with Spark, and where it fits in the overall ecosystem.
Sriram KrishnanBig Data, Cloud, Distributed Systems Engineering LeaderTwitter
Gopal Krishnan is Director of Consumer Science Engineering at Netflix. He leads many aspects of the AB testing innovation to help personalize and improve Netflix experience. Previously, he spent over a decade at Yahoo on high scale infrastructure including building the first the global Yahoo homepage.
Data Platform at Twitter - Enabling Real-time & Batch Analytics at Scale
Netflix is renowned for it’s use of big data to improve personalization for our members. Previously, our personalization depended only on explicit user inputs like star ratings, taste preference, plays, etc. We recently incorporated additional implicit user signals such as interactions on device like scrolling, navigation, and idle time. This session will focus on the challenges of using these new high volume data sources with billions of events/day. What are the challenges of maintaining data quality across hundreds of device types? How do we scale efficient nearline systems to serve this data for algorithmic consumption close to real time?
Gopal KrishnanDirector, Consumer Science EngineeringNetflix
Arek Kaczmarek is responsible for the company's data platform and implementation of a new data platform based on Big Data technologies. He previously worked at Intel, as a Senior Big Data Solutions Architect at the Data Center Group. His skills include among others knowledge on the Big Data ecosystem, Hadoop/Hive/Pig, NoSQL, ELK (ElasticSearch/Logstash/Kibana), Lambda architecture, Oracle, data warehousing, ETL, BI Analytics, systems architecture, PaaS and the cloud
The Next Enterprise Data Warehouse is a Hadoop Data Lake
As the data volumes and data generation velocity start growing, so does the value of all the enterprise data being generated. At the New York Times, we have moved away from the traditional Enterprise Data Warehouse based on dimensional modeling and created a data lake where the time to market for data solutions and applications is much faster and much more robust than it ever was before. This presentation will provide an overview of the data lake approach, how to get there, and why it makes sense for companies with growing data volumes. The discussion will focus on cost, architecture, and time to market solutions.
Arek KaczmarekSenior Director, Platform & Data SolutionsThe New York Times
Thanigai is an enterprise architect, technologist and innovator with over 14 years of progressive experience specializing in building large, highly scalable software systems. At Art.com, Thanigai is the lead architect responsible for defining and driving the technology roadmap initiatives for building the next generation technology vision and platform for the company. Thanigai’s interests and specialties include Hadoop/Big data, NoSQL, Distributed Systems, Enterprise Architecture, Scalability, etc. Prior to joining Art.com, Thanigai has worked in engineering roles at Sanmina and Flextronics.
L e v e r a g i n g H a d o o p i n P o l y g l o t Architectures
At art.com, we have a heterogeneous web stack (java, node.js and .net) to support our global brands and multiple websites. In this session, I will share our experience in leveraging the power of Hadoop to reach multiple business goals. The talk will also focus on the tools that help in addressing concerns related to polyglot architectures such as interoperability, multi-tenancy, schema evolution and standardization. I will also talk about some frameworks and packages that help in codifying best patterns and practices in integrating Hadoop with other systems such as traditional Business Intelligence systems, Web Analytics and other distributed computing technologies like Apache Spark.
Thanigai VelloreEnterprise ArchitectArt.com
Mike Lurye is Senior Director, Enterprise Data Management for Time Warner Cable. He and his team are responsible for shared data warehousing assets and functions that benefit multiple Business Intelligence (BI) teams and their customers. This includes creation of enterprise data assets, BI architecture, quality assurance, and data quality management. In addition, Mike and his team are responsible for evaluation and adoption of Big Data technologies. Prior to joining TWC Mike held Product Management and Product Marketing positions with Amdocs, focused on decision automation, mobile content and personalization solutions. Mike’s prior experience includes senior roles at major analytical CRM & marketing services companies.
Offloading ELT Workloads to Hadoop - Time Warner Cable’s Journey
Shifting ELT workloads from the enterprise data warehouse (EDW) to Hadoop is gaining traction for reducing costs, incorporating new data faster, and freeing up EDW capacity for user-facing analytics and BI workloads. But, where do you start and what’s the best approach? This presentation outlines the framework and processes that Time Warner Cable used to:· Evaluate potential use cases and architectural options for Hadoop· Identify ELT offload as the first focus area· Choose technology components for the next generation enterprise data integration solution· Apply best practices to configure Hadoop environment for data integration
Michael LuryeSenior Director, Enterprise Data ManagementTime Warner Cable
Weidong Zhang earned his Ph.D in Computation fluid dynamics. He has a nature and passion of the analytics, research and data driven decision-making. He spent 10+ years in the data warehouse field, and tends to leverage his knowledge with the business intelligence and the Hadoop massive data process capability to address business needs. Currently, he worked as a manager in Data Analytics Infrastructure team in LinkedIn and leads the marketing and customer service data warehouse vertical.
Releasing the Power of Hadoop
As a data driven company, LinkedIn has very strong analytical teams, and has many data engineers, data scientists, business analysts and business users, who focus on different domains and business of the company. These users have different kind usage types and needs. Making them more productive and efficient is the key point to make the company success. This talk covers the ecosystem our Data Analytics Infrastructure (DAI) team built, which release the power of Hadoop and make it easy to use. This ecosystem contains several open sourced products, such as: Pinot, Cubert, and Gobblin(, for fast computation and real time reporting support), and some tools to automatic reports generations. I will also cover the roles of our data warehouse team and our mission.
Weidong ZhangManager, Business AnalyticsLinkedIn
Nazli Dereli is currently a data scientist in Live Nation Userscoring team. She is working on realtime classification of users and detection of abusive actors that are stopping users from buying tickets by holding the tickets. Before joining Live Nation, she was working in Data Mining and Bioinformatics Lab in University of California, Santa Barbara focusing on mining brain activity networks to discover insights on human l e a r n i n g . H e r i n t e r e s t s i n c l u d e s o c i a l a n d biological network analysis, and interesting problems on data and graph mining.
Detecting Abusive Actors in Hadoop Ecosystem
Live Nation is the global event ticketing leader with 400,000,000 tickets sold and 180,000 events ticketed in 19 countries. However there is always the threat of growing multibillion dollars secondary market that intends to prevent users from buying primary tickets. This talk will explain how to detect such abusive actors in Hadoop ecosystem using different approaches from offline, semionline and online learning. We will go over the process of building our system starting with different Hadoopbased approaches leading to our final decision to use Apache Storm for realtime classification built on top of Hadoop ecosystem.
Nazali DereliData ScientistLive Nation
Beena is the Data Science Informatics Leader at GE. She leads the data efforts to support data science at GE. She works across the GE businesses to drive advanced analytics development leveraging big data technologies. She is passionate about data and analytics to aid cross functional teams to derive data insights, aid teams in articulating questions they did not know they had and help view data in more effective ways. Beena has over 20 years’ experience in the data arena with a number of international organizations including British Telecom, E*trade and Thomson Reuters. She holds a Masters in Computer Science and MBA in Finance.
Making Hadoop Relevant for the Industrial Internet
Data management and advanced analytics are core to GE’s recent success in delivering superior software-based services to customers across aviation, power generation, oil & gas, healthcare, and transportation. The torrent of data generated from machines, networks, devices and data centers in industry verticals provide challenges and opportunities. The challenge is to make this machine data meaningful and actionable to deliver on opportunities around operational efficiencies. I will share real-world case studies, leveraging Hadoop to demonstrate tangible operational benefits - ranging from fuel savings to improving productivity to reducing unscheduled maintenance to enhancing on-time performance - by tightly integrating machines, networked sensors, industrial-strength data, and software to enable intelligent insights and affect measurable outcomes.
Beena AmmanathData Informatics LeaderGE
Ranjan Sinha is a Lead Data Scientist at eBay Inc. where he has led projects that significantly enhanced consumers’ shopping experiences. Previously, Dr. Sinha was a research academic at the University of Melbourne and holds a PHD in Computer Science from RMIT University, Australia. He has over 25 publications in top-tier venues such as IEEE Big Data, VLDB Journal, and ACM SIGMOD. He was awarded the Sort Benchmark medals for JouleSort and PennySort and was amongst WSJ’s Top-12 Asia-Pacific Young Inventors. He is a regular speaker on Big Data and Data Science and co-organizes the popular Bay Area Search Meetup.
Ranjan SinhaLead Research ScientisteBay, inc.
Ameya is a lead engineer on Groupon’s deal relevance and personalization system working on big data technologies such as Hadoop and HBase. Earlier he also built scalable message bus system that now powers Groupon's global service oriented architecture handling hundreds of millions of messages. Before Groupon, he was Sr Software Engineer at LiveOps working with distributed systems. Ameya holds masters in Information Systems from Carnegie Mellon University and masters in Computer Science from Pune University.
Ameya KantikarBig Data Infrastructure EngineerGroupon
As an engineering leader, Ben am as comfortable with strategy documents and presentations as I am deep in the code. He uses his understanding of the bigger picture to make the best tactical choices for his team in an agile environment. Bens specialties include: technical writing, hadoop, SaaS applications, big data, parallel algorithms, distributed computing, high performance computing
Ben JacksonSoftware EngineerAOL
Valentino is a Solutions Architect with Google Cloud Platform, helping companies accelerate innovation. Valentino focuses on Big Data and Cloud Computing use cases for large Enterprises. Prior to Google, Valentino spent his time at several startups, ranging from Streaming Big Data to Cloud Monitoring and Financial Analytics, and he began his career as a trader and quant developer at an options trading firm in Chicago.
Valentino TereshkoEnterprise Sales EngineerGoogle
The Information
For larger groups or special requests contact Bola by calling +1 415 692 5378 or email [email protected]* Team discounts are applicable at the point of registration only.
Ways to Register
+1 415 692 5378 +1 323 446 7673
Group Discount Offers3 Silver Passes: $3000 ($1000 per attendee)5 Silver Passes: $4500 ($900 per attendee)3 Gold Passes: $3900 ($1300 per attendee)5 Gold Passes: $6000 ($1200 per attendee)3 Diamond Passes: $4500 ($1500 per attendee)5 Diamond Passes: $7000 ($1400 per attendee)
Registration Pricing
Apache Hadoop Innovation SummitDate: February 12 & 13, 2015Location: San Diego, CaliforniaVenue: Westin San Diego Accommodation: Click here for online reservations
Register Here
Silver Pass
$1495Access to all sessions &
networking events7 days access to presentations from the
summit via ieOnDemand
$1295Early Bird Price(before Dec 12)
Diamond Pass
$1995Access to all sessions, networking
events, annual subscription to all content on the Big Data & Analytics channels via
ieOnDemand
$1795Early Bird Price(before Dec 12)
Gold Pass
$1795Access to all sessions, networking
events & unlimited access to presentations from the summit via
ieOnDemand
$1595Early Bird Price(before Dec 12)
1 Day Pass
$795Full access to the sessions to your chosen day of the summit, 7 days
access to presentations from the summit via ieOnDemand
7 dayonline access to event materials
On-Demand Pass
$600Unlimited access to presentations from the summit via ieOnDemand,
including presentations, interviews & the ability to contact speakers
Unlimited access to summit presentations
via ieOnDemand
Access All Areas Pass
$2295Access to all sessions of the Apache
Hadoop Innovation Summit, Data Science Innovation Summit & Predictive
Analytics Innovation Summit
Annual subscription to content on the Big Data & Analytics
channels via ieOnDemand
NAME OF EACH ATTENDEE
TITLE OF EACH ATTENDEE DEPARTMENT
COMPANY INDUSTRY
ADDRESS CITY
STATE/PROVINCE ZIP/POSTAL CODE COUNTRY
EMAIL OF EACH ATTENDEE BUSINESS PHONE NUMBER
1. Delegate Information...
2. Pass Types...Early Bird Pass Options until December 12, 2014
Early Bird Silver: $1295 Attendees ____ Early Bird Gold: $1595 Attendees ____ Early Bird Diamond: $1795 Attendees ____ Early Bird One Day: $795 Attendees ____
Regular Pass Options after December 12, 2014 Silver Pass: $1495 Attendees ____ Gold Pass: $1795 Attendees ____ Diamond Pass: $1995 Attendees ____ One Day: $995 Attendees ____
Group Discount Pass Options 3 Silver Passes $3000 ($1000 per attendee) 5 Silver Passes $4500 ($900 per attendee) 3 Gold Passes $3900 ($1300 per attendee) 5 Gold Passes $6000 ($1200 per attendee) 3 Diamond Passes $4500 ($1500 per attendee) 5 Diamond Passes $7000 ($1400 per attendee)
For larger groups or special requests contact Bola by calling +1 415 692 5378 or email [email protected] passes only available when all participants register together.
Pass Descriptions:Silver Pass: Access to all sessions & networking eventsGold Pass: Access to all sessions, networking events & unlimited access to the summit presentations via ieOnDemandDiamond Pass: Access to all sessions, networking events, annual subscription to all content on the Big Data & Analytics channels via ieOnDemandAccess All Areas Pass: Access to all sessions of the Apache Hadoop Innovation Summit, Data Science Innovation Summit & Predictive Analytics Innovation Summit, networking events, annual subscription to all content on the Big Data & Analytics channels via ieOnDemand
Check (Make checks payable to The Innovation Enterprise Ltd) Invoice me
Visa Mastercard American Express Diners Club Discover
CARD NUMBER EXPIRATION DATE SECURITY NO.
CARDHOLDERS NAME CARDHOLDER’S SIGNATURE
BILLING ADDRESS -(same as above) INDUSTRY
Prices are exclusive of VAT. Places are transferable without any charge to another Summit occurring within 12 months of the original purchase. Team discounts are applicable at the point of registration only. Any cancellations within a group registration will in turn incur an increase in registration fee for the remaining group participants. Cancellations before January 12, 2015 incur an administrative charge of 50%. If you cancel your registration after January 12, 2015 you will be charged the full fee. You must notify The Innovation Enterprise in writing of a cancellation, or you will be charged the full fee. The Innovation Enterprise reserve the right to make changes to the program without notice. NB: FULL PAYMENT MUST BE RECEIVED BEFORE THE EVENT.
Registration FormApache Hadoop Innovation SummitFebruary 12 & 13, 2015 | Westin San Diego | San Diego, CAFor registration or more information on the program, please call Bola on +1 415 692 5378, or fax this registration form to +1 (323) 446 7673
3. Payment Options...
Schedule
Networking Drinks 17.00 - 19.00
February 13
Session One 08.30 - 10.00
Coffee Break 10.00 - 10.30
Session Two 10.30 - 12.00
Lunch 12.00 - 13.30
Session Three 13.30 - 15.00
Coffee Break 15.00 - 15.30
Session Four 15.30 - 17.00
Day Two
February 12Day One 08.30
10.00
10.30
12.00
13.30
15.00
15.30
17.00
19.00
08.30
10.00
10.30
12.00
13.30
15.00
15.30
17.00
Session Five 08.30 - 10.00
Coffee Break 10.00 - 10.30
Session Six 10.30 - 12.00
Lunch 12.00 - 13.30
Session Seven 13.30 - 15.00
Coffee Break 15.00 - 15.30
Session Eight 15.30 - 17.00
Sponsors
Platinum Sponsor
For sponsorship information contact Giles Godwin-Brown
Media PartnerMedia Partner
Partnership Opportunities: Giles Godwin-Brown | [email protected] | +1 415 692 5498Attendee Invitation: Sean Foreman | [email protected] | +1 415 692 5514
NovemberBig Data & Analyticsfor PharmaNovember 4 & 5, Philadelphia
Big Data & Marketing Innovation SummitNovember 4 & 5, Miami
Big Data for Finance November 11 & 12, Boston
Data VisualizationSummit November 11 & 12, London
Chief Data Of!cer Summit November 11 & 12, London
Big Data & Analytics Innovation Summit November 11 & 12, London
Big Data & Analytics Innovation SummitNovember 25 & 26, Beijing
DecemberBig Data & Analytics in Banking Summit December 2 & 3, New York
Chief Data Of!cer Summit December 2 & 3, New York
JanuaryBig Data Innovation SummitJanuary 22 & 23, Las Vegas
Cloud InnovationExpoJanuary 22 & 23, Las Vegas
FebruaryData Science InnovationSummitFebruary 12, San Diego
Apache Hadoop InnovationSummitFebruary 12 & 13, San Diego
The Digital Oil!eld Innovation SummitFebruary 19 & 20, Houston
Big Data & Analytics Innovation Summit February 27 & 28, Singapore
JuneBig Data & Analytics for PharmaJune 10 & 11, Philadelphia
Open Data Innovation SummitJune 10 & 11, Boston Big Data & Analytics for Retail SummitJune 17 & 18, Chicago
MayBig Data Innovation SummitMay 13 & 14, London
Big Data & Analytics in Healthcare May 13 & 14, Philadelphia
Chief Data Of!cer SummitMay 20 & 21, San Francisco
Women
FinanceCXO HealthcareExpected
Flagship GovernmentHigh Tech
Pharma
Hadoop
MarchBig Data & Analytics Innovation SummitMarch 25 & 26, Brazil
2015 Calendar
AprilBig Data Innovation SummitApril 15 & 16, Santa Clara
DataTalentApril 15 & 16, Santa Clara
Data Visualization SummitApril 15 & 16, Santa Clara
Big Data Innovation SummitApril 23 & 24, Hong Kong
SeptemberBig Data & Analytics Innovation SummitSeptember 17 & 18, Sydney
Data Visualization SummitSeptember 23 & 24, Boston
Oil & Gas
September Continued
Big Data Innovation SummitSeptember 23 & 24, Boston
AugustBig Data & Analytics InnovationAugust 5 & 6Kuala Lumpur
Big Data & Analytics Innovation SummitAugust 19 & 20, Brazil