Upload
seanseannery
View
3.027
Download
1
Embed Size (px)
Citation preview
MOVINGMOUNTAINS OF
PLAYER DATASEAN MALONEY
RIOT GAMES @SEAN_SEANNERY
SCALABLE INTERNET SERVICESUCLA/UCSB - NOV 2015
SEAN MALONEYBIG DATA ENGINEER
WHO IS THIS GUY?
Lead developer on Riot’s ETL tools
FUN FACT:Was a student in this class 4 years agoIntern at Appfolio
MOVING MOUNTAINS OF DATAINTRODUCTION1.
THE GAME PLATFORM: OUR MAIN DATA SOURCE2.
HOW WE INGEST AND QUERY DATA3.
HOW WE SCALE IN AWS4.
CONCLUSION - SEAN’S PRO TIPS5.
INTRODUCTION
WHAT IS LEAGUE OF LEGENDS?
2009LAUNCH
ONLINEMULTIPLAYER
WINDOWS / OSX
40-50 MIN GAMES
THETEAM
YOUR CHAMP
THE BATTLEGROUND
THE GAME PLATFORM
THE CLIENT.
CHAT
STORE AUDIT
Load Balancers and Firewalls
CHAT
ORACLE COHERENCE (IN MEMORY DB)
STORE AUDIT GAME ETC.
CHAT
CHAT
STORE AUDIT GAME ETC.
STORE AUDIT GAME ETC.
PRIMARY DB
HOT BACKUP DB
2nd BACKUP DB / ETL
OTHER DATA SOURCES
<REST>
DATA INGESTION
PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL- OLTP game data- External Data Sources
MASTER WAREHOUSE
HONU- Anything pushed to it- Server logs
DATA AUDITING
PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL- OLTP game data- External Data Sources
MASTER WAREHOUSE
HONU- Anything pushed to it- Server logs
DATA AUDITING
Distributed ETL Software written in Ruby.
Scales Horizontally
Same ETL applied to multiple regions / datacenters
Self-Service UI with SQL query templating.
NA Korea Russia
Create an ETL
Create an ETL
Amazon S3SQS(S)FTPHiveMicrosoft SQL ServerMySQLDynamoDBVerticaRedshiftREST websites
FUETL CAN
CONNECT TO
Create an ETL
Webapp
Core Libraries
Task Service
Tasks
Helper Service
Helpers
Environment Service
Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool
View - backbone.js - Bootstrap CSS
Task DAO Helper DAOEnvironment DAO
Env. Task DAO Env. Helper DAO
Webapp
Core Libraries
Task Service
Tasks
Helper Service
Helpers
Environment Service
Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool
View - backbone.js - Bootstrap CSS
Task DAO Helper DAOEnvironment DAO
Env. Task DAO Env. Helper DAO
Webapp
Core Libraries
Task Service
Tasks
Task DAO
Helper Service
Helpers
Helper DAO
Environment Service
Environment DAO
Scheduler Process Worker Process Task / Helper / Controllers
Env. Task DAO Env. Helper DAO
Command Line Tool
View - backbone.js - Bootstrap CSS
Webapp
Core Libraries
Task Service
Tasks
Helper Service
Helpers
Environment Service
Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool
View - backbone.js - Bootstrap CSS
Task DAO Helper DAOEnvironment DAO
Env. Task DAO Env. Helper DAO
FuETL STATISTICS
14 TBDATA MOVED DAILY
5213ACTIVE REGIONAL
ETLS
23125DAILY ETL RUNS
FuETL SCALING
FuETL SCALING
IdempotencyIdempotent - an operation that will produce the same results if executed once or multiple times
EXAMPLE:Non-Idempotent: - x = x * 5; - Submitting a purchaseIdempotent: - abs( abs(x) ) = abs(X) - Cancelling a purchase
Idempotent?In the transactional OLTP world….
INSERT INTO games_played(SELECT * FROM games_played_na WHERE date >= ‘2015-10-25’)
Idempotent?In the big data / OLAP world….
INSERT INTO games_played(SELECT * FROM games_played_na WHERE date >= ‘2015-10-25’)
KEEPING INTEGRITY
X
Message Queues
ETL2ETL3ETL4ETL5. . .ETLN
ETL1
X
XSCHEDULERakaPRODUCER
WORKER aka CONSUMER
Message Queues● REDUNDANCY● DELIVERY GUARANTEE● SCALABILITY● ASYCH. COMMUNICATION● ABSTRACTION / DECOUPLING
Message Queues● AMAZON SIMPLE QUEUE SERVICE● APACHE ACTIVEMQ● RABBITMQ● HORNETQ● MICROSOFT MQ (MSMQ)
PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL- OLTP game data- External Data Sources
MASTER WAREHOUSE
HONU- Anything pushed to it- Server logs
DATA AUDITING
Self Service, Custom HTTP Edge Service (Java)
0
Fronted by ELB in front of ~40 autoscaled m1.xlarge instances
Forwards JSON data indirectly to S3
Honu
The batches need to then be unpacked and converted into Hive tables
0
Custom Collector Infrastructure (Java) - Derived from Netflix Suro
0
Deployed in every data center worldwide and also AWS
Self Service, Custom HTTP Edge Service (Java API)
Honu
Honu =
Custom HTTP Edge Service (Java)0
DRADIS Fronted by ELB in front of ~40 m1.xlarge instances
Forwards data indirectly to S3 via Honu Collectors
Honu
JSONJSONJSONJSONJSONJSON
COLLECTORS
REST
ENDPOINT
JSONJSONJSONJSONJSONJSON
JSONJSONJSONJSONJSONJSON
JSONJSONJSONJSONJSONJSON
Honu
JSONJSONJSONJSONJSONJSON
COLLECTORS
REST
ENDPOINT
JSONJSONJSONJSONJSONJSON
JSONJSONJSONJSONJSONJSON
JSONJSONJSONJSONJSONJSON
batchid = 20150512
Honu
JSONJSONJSONJSONJSONJSON
COLLECTORS
REST
ENDPOINT
JSONJSONJSONJSONJSONJSON
GAM1GAM1GAM1GAMXGAM1GAM1
JSONJSONJSONJSONJSONJSON
IdempotencyUse application logic to make idempotentmsg = queue.pop;if (processed_games.contains( msg.game_id ) { return; //do nothingelse { process_game(msg);}
What’s in there?Data team doesn’t know everything that is submitted
ComplianceAre we violating international data laws?
Inconsistent data structureIts formatted however developer submits it
THE DOWN SIDE
User DocumentationNo one likes doing it, but it helps a lot.
Onboard trainingGet new coworkers in-the-know
Familiar ProtocolsUse REST or RPC so developers are on the same page
Focus on UXYour tools need to be easy for non-technical people to use.
SELF SERVICE HOW?
PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL- OLTP game data- External Data Sources
MASTER WAREHOUSE
HONU- Anything pushed to it- Server logs
DATA AUDITING
AMAZON S3
s3n://datawarehouse/ schema1/ table1/ env/ dt/ time/ table2/ table3/ schema2/
s3n://telemetrydata/ application1/ table1/ env/ dt/ table2/ application2/
AMAZON S3 STRUCTUREHIVE
‣ schema1 table1 env
dt time table2 table3
‣ schema2 table1 ...
‣ schema3‣ schema4
PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL- OLTP game data- External Data Sources
MASTER WAREHOUSE
HONU- Anything pushed to it- Server logs
DATA AUDITING
REST micro-service built with Java and docker.
Reports and visualizations we can use to find problems.
Source and target comparison.WarehouseAuditingServicePlatform
HOW TO AUDIT
VISUALIZING
VISUALIZING
HOW TO AUDIT
PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE QUERY / VIEWS VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL- OLTP game data- External Data Sources
MASTER WAREHOUSE
HONU- Anything pushed to it- Server logs
DATA AUDITING
BATCH OLAP POINT
SCALING IN AWS
RESOURCE CONTENTION
SCALING
RDS
AWS Infrastructure TodayEMR EC2 Storage
Data Science
Analytics / Hue
ETL Telemetry
PlatforaDynamoDB Loading
Auditing ETL
Telemetry collectors
Data dictionary
Rocana(real time
dashboard)
Solr (real time)
Point Data Service
Metastore
Data Science
Fraud
DYNAMODB
ETL App DB
Point Data Store
S3
Source of “Truth”
Networking
VPCAWS Direct
Connect
AWS Direct Connect
AWS Direct Connect
AWS Direct Connect
CONCLUSION
DON’T
SEAN’S PRO TIPS OF THE DAYDO
➔ Don’t wait. Create S3 permissions and naming standards early
➔ Get an auditing solution for DW accuracy
➔ Allocate time for tuning AWS infrastructure
➔ Don’t forget to track cost. AWS bills can surprise you
➔ Don’t underestimate simple problems in big data.
➔ Prepare for multiple data access patterns
➔ Keep idempotency in mind and use MQ architecture
➔ Don’t stop. Believing
Custom rewards for mastering different champions
Intensive query that spans every game that every player has played
Improves player engagement
CHAMPION MASTERY
Full copy of our data warehouse in DynamoDB
Hive->DynamoDB Dynamic Partition
Support can answer questions faster than ever.
PLAYER SUPPORT
Data science team queries all chat messages in game
Sentiment analysis and classification
Identifies negative, offensive players and mutes them automatically.
OFFENSIVE CHAT
DETECTION