35
Functional architectural patterns Lars Albertsson 1

Functional architectural patterns

Embed Size (px)

Citation preview

Functional architectural patterns

Lars Albertsson

1

Who’s talking?

Swedish Institute of Comp. Sc. (test tools)Sun Microsystems (very large machines)Google (Hangouts, productivity)Recorded Future (NLP startup)Cinnober Financial Tech. (trading systems)Spotify (data processing & modelling)Schibsted (data processing & modelling)

2

Why functional?

Verbs

... has made ... expanding ...

... flourishes ... merged ... has been unable to escape lingering .. built ...

... are ... placed ... say ... are ... to explode ...

.. are considering ... to reopen … to recall ...

3

Or object-oriented?

Nouns, pronouns

... bankruptcy ... government bailout ... automaker Chrysler ... comeback ... sales ... Jeep sport utility vehicles.

... Chrysler ... part ... Fiat Chrysler Automobiles, it ... concerns ... the safety ... Jeeps ...

... Jeeps ... gas tanks ... regulators ... safety advocates ... rear-end crash.

... regulators ... an investigation ... those Jeeps ... Fiat Chrysler’s agreement ... models.

4

Functional benefits? My version.

Matches a few problemsData processing

Matches a few computer propertiesConsistency through immutabilityDeterministic - replay for resilience

5

Local vs distributed properties

LocalHardware provides strong consistency

Faults -> death

6

DistributedEventual consistency

Faults must be survived

Architectural functional patterns

Personal anti-pattern experiences

Strive to look forImmutabilityReexecution

7

MapReduce

Discovered pattern, not inventionWell known, enough saidSucceeded by Spark RDD paradigm

8

Data flows

9

Users

Pageviews

SalesSales

reports

Views with demographics

Sales with demographics

Conversion analytics

Conversion analytics

Views with demographics

Dataset artifacts, typically files with date parameter.

Raw Derived

Anti-pattern - isolated batch jobs

Get data (more on that later)Cron an ETL batch job (function)

Output solidifies. Mostly.Steps in isolation - often different teams

What to do on ETL code changes?

10

Sales with demographics

Views with demographics

Pattern: data pipeline

End-to-end sequences/DAG of jobsNot only exist, but treated end-to-end

Input is raw, original dataSeparate raw data from generated

11

Users

Pageviews

Sales with demographics

Conversion analytics

Conversion analytics

Views with demographics

Lambda architecture, part 1

Save all collected data without preprocessingBut timestamp on generation, register,

arrivalRerun everything downstream on code change

Human fault tolerance

In conflict with privacy management?

12

Pipeline workflow orchestration

Ideally: Good old make + cluster + IDE + xUnitTest end-to-endRebuild on upstream changes (but not all)

State of practice: Luigi, Pinball, AzkabanDon’t take you all the way :-(

13

Lambda architecture, part 2

Parallel batch and real-time pipelinesBatch more accurate, overridesReal-time for window of recent data

14

Obtaining data

Log things. Conceptually stable, but collection is challenging at scale.

Have legacy code and master data in databases? Let us have a look.

15

Database dimensioned for online trafficHadoop = herd of elephants

Load spikeHeight = #mapper nodesArea = #users

Anti-pattern: direct dump

16

AP

I

Direct dumps in the trenches

Company successful - #users increasingMore Sqoop mappers - higher DB loadDaily dump jobs went to 25h

Devops firewalled off Hadoop to recover

17

Anti-pattern: dump through API

SOA/microservice cultureDB protected by throttling

API not used to elephantsQuery area is still large

Herd of elephants through gate - 1-2 weeks

18

AP

I

Anti-pattern: slave dump

Protect live service by mirroring to a dump slaveNo online service risk, good!Why anti-pattern?

19

All dumps are non-deterministic

HDFS down? Dump later.State is gone - dump not accurate

Slave replication down?Dump not accurate

20

Anti-pattern: deterministic mirror

Replay commit log until full day/hourDiscovered through archaeology :-)

Not scalable, point of failureHourly dump took 45 minutes, increasing...

2121

(Anti-)pattern: better dumping

Netflix AegisthusSnapshot Cassandra (fast, atomic,

reliable)Transfer SSTables to HDFSReplicate compaction in MapReduce

Other DBs? Depends on atomic snapshot.

22

All dumps are anti-patterns?

Typical use: Join activity events with user infoEvent time != dump time

Aggregation discards informationWhich users enabled X, tried, and disabled?

23

Pattern: Event source

All facts are events. Immutable, timestampedEvent stream is source of truthNo explicit “current state”

The functional data architecture?

24

Event source incarnated: unified log

Pour events into pub/sub bus, with long history.Kafka de-facto standard.

Tap from bus to HDFS/S3 in time buckets.Camus/Secor

Stream processing pipelines to dest topicsReplay on code changes

25

Unified log, practical considerations

Long history necessaryMust have time to fix stream process bugsUse 3+ months and use stream as temp

DBUnified log also useful for meta and control

Tweak Kafka for low latency

26

Event source + views

View = snapshot of aggregated state @ timeFor ETL, choice of hourly/daily aggregates or exact views

27

LogsView View

Event source + database

Business logic may demand “current state”Event stream is truth, keep DB in sync

28

Event source, synced database

A. Service interface generates events and DB transactions

B. Generate stream from DB commit log. Postgres, MySQL -> Kafka

C.Build DB with stream processing

29

AP

IA

PI

AP

I

Deployment & orchestration

System = many machinesDesired system state = code + configActual state = Orchestrator(current, desired)

30

Anti-pattern: stateful orchestration

Orchestrator = Puppet|Chef|Ansible {current.changeSomeProperties(desired)return current

// current.otherProperties unchanged}

31

Stateful orchestration in the trench

Desired = { case roleA: install(x,y) case roleB: install(z) }Current = x installed on roleB. Old x. Zombie woke up when B load decreased.Puppet+apt = No simple way to remove undesired state

32

Pattern: artifacts from source

Orchestrator = Docker|Packer {delete currentreturn Image(desired)

}

No state leak from existing state. Sort of.

33

Deterministic, predictable?

Image building leaky on purposeE.g. “apt-get update && apt-get install”Imports external state

Ephemeral databases preserve stateAbility to rebuild from unified log is

valuable

34

Jay Kreps, Confluent: Unified logMartin Kleppman: Unified log, Bottled WaterNathan Marz: LambdaSander Mak @ Jfokus: Event sourcingDatomic

Questions?

More?

35