21
Hey, where'd my money go? Building Airbnb's Financial Data Pipeline in Spark

Airbnb - Braavos - Whered My Money Go

Embed Size (px)

Citation preview

Page 1: Airbnb - Braavos - Whered My Money Go

Hey, where'd my money go? Building Airbnb's Financial Data Pipeline in Spark

Page 2: Airbnb - Braavos - Whered My Money Go

Mike Lewis Jiang-Ming Yang

Page 3: Airbnb - Braavos - Whered My Money Go

191 72 29

Countries Currencies Languages

Page 4: Airbnb - Braavos - Whered My Money Go

Receipts

Payables

VAT

Revenue TOTFinancial Data

Page 5: Airbnb - Braavos - Whered My Money Go

Alexandria ____________________________

Ruby on Rails dashboard app

Nightly Cron job

Dynamically create MySQL queries to ETL data

Page 6: Airbnb - Braavos - Whered My Money Go

The good

Served us faithfully from 2012-2015

SQL queries are very easy to hand-test and share

Simple data can be handled with simple queries but...

...data doesn't stay simple forever

Each product is handled in a different way

Unscalable performance as data grew

Difficult to refactor complex SQL

The bad

Ever maintained a 1,000 line SQL query?

Page 7: Airbnb - Braavos - Whered My Money Go

We needed: ____________________________

An actual programming language

A pipelined architecture

Distributed computing capabilities

Page 8: Airbnb - Braavos - Whered My Money Go

Goals

• Handle all products using a common flow (ie. business trip or cleaning)

• Infer events from data or consume in realtime from production

• Easily query-able subledger output

Page 9: Airbnb - Braavos - Whered My Money Go

Braavos ____________________________

Home of the Iron Bank

— Game of Thrones

Our next generation event-based financial data processing system

Page 10: Airbnb - Braavos - Whered My Money Go

Sub-ledger entries

• General accounts: receivable / payable / revenue / tax / etc.

• Debit/credit operation against a given account.

• Double-entry accounting rule

Page 11: Airbnb - Braavos - Whered My Money Go

Sub-ledger entries

• A reservation for $100 comes in • Platform events generated • Debit/credit to appropriate subledgers

Debit Credit

Receivable (Guest) 100

Deferred Revenue 10

Deferred Payable 90

Debit Credit

Cash 100

Receivable (Guest) 100

PaymentBook

Debit Credit

Deferred Revenue 10

Revenue (Guest) 10

Deferred Payable 90

Payable (Host) 90

Revenue Recognition

1 2 3

• Payment call made to processor • Payment broken into payment events • Appropriate subledger entries made

• Finance policy determines revenue recognition date

• Events generated to debit/credit appropriate subledgers

Page 12: Airbnb - Braavos - Whered My Money Go

Accounting entries Booking entries based on financial policy

Reports Simpler summary queries on subledgers.

Event generation Normalized platform events Normalized payment events

Page 13: Airbnb - Braavos - Whered My Money Go

Braavos pipeline

Page 14: Airbnb - Braavos - Whered My Money Go

Platform Events

Product Type

Product Id

Datetime

Guest Info

Host Info

Funding Sources

Itemized Pricing

Taxes (currency / amount / remittance currency)

Product Type

Product Id

Datetime

Transaction Id

Payment Info

Currency

Amount

Effective currency rate

Reconciled rate

Payment Events

Page 15: Airbnb - Braavos - Whered My Money Go

Build in Spark / Scala

Finance system is an offline component and don’t need to worry about the latency;

Developer can focus on the business logic and don’t need to worry about the scalability;

Performance (throughput): 7M events / min --conf spark.default.parallelism=200 \

--num-executors 50 \

--executor-cores 8 \

Page 16: Airbnb - Braavos - Whered My Money Go

Reports

Guest Receivable Future Host Payout

SELECT sum(IF(Operation = `Debit`, amount, -amount)) FROM subledger_entries WHERE account = ‘ReceivableAccount’ AND meta[‘Guest’] IS NOT NULL AND event_date < ‘2015-01-01’;

SELECT sum(IF(Operation = `credit`, amount, -amount)) FROM subledger_entries WHERE account = ‘PayableAccount’ AND meta[‘Host’] IS NOT NULL AND event_date < ‘2015-01-01’;

SELECT sum(IF(Operation = `credit`, amount, -amount)) FROM subledger_entries WHERE account = ‘DeferredRevenueAccount’ AND event_date < ‘2015-01-01’;

Deferred Revenue

Page 17: Airbnb - Braavos - Whered My Money Go

Migration Process

1. Generate all the platform events and payment events based on existing database account audit records;

2. Build reports based on Braavos to match up the existing reports;

3. Changing the upstream components to generate real events and compare with existing results;

4. Switch to use the real upstream events;

Page 18: Airbnb - Braavos - Whered My Money Go

Intercompany report

Airbnb transactions involve four entities: Airbnb Inc. / Airbnb Payment Inc. / Airbnb UK / Airbnb Ireland and the number is increasing.

Guest / Host may belong to the different entities and their payment entities might be different as well.

We need to report intercompany money movement across entities.

Page 19: Airbnb - Braavos - Whered My Money Go

Entity A Entity B

Intercompany

Inbound Outbound

Operation: Debit Amount: $150

Operation: Credit Amount: $200

Entity C

Operation: Credit Amount: $120

Entity D

Operation: Debit Amount: $170

Page 20: Airbnb - Braavos - Whered My Money Go

Future Work

Cash Reconciliation Tie out internal data with processor data

Automate Treasury Rebalancing Robo-trade currency and hedge

against market fluctuation

Automate Everything Build out financial back-office tools to give better insight into our business

Improve Monitoring/Alerting We should catch data issues in

minutes, not days

Stream-Based Processing Consume events in realtime from our production apps

Page 21: Airbnb - Braavos - Whered My Money Go

Questions?