13
AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks

Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

  • Upload
    others

  • View
    18

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

AGILE DATA PROCESSING PIPELINESKen Collier, PhD Director, Agile Analytics @theagilist #thoughtworks

Page 2: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

Conventional Architectures

Pull-based Batch Loads

Enterprise Data Models

Complex ETL Logic

Poorly Suited to

Non-Relational Data

Emergent design is difficult

Page 3: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

DESIGN PRINCIPLES

Enable cheap/easy data ingestion

Enable inexpensive scaling

Enable emergent design

Enable easy recreation of information

Drive logic closer to the application

Enable near real time presentation

Support polyglot persistence

Page 4: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

DATA CORE RAW FACTUAL DATA HISTORIZED EVENTS RETAIN BUSINESS KEYS DATA LINEAGE

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

Page 5: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

DATA INGESTION EVENT DRIVEN MESSAGE QUEUE TRICKLE FEED

Page 6: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

INFORMATION PUBLISHING TOPICAL QUEUES MDM CONCERNS DATA GOVERNANCE POST PROCESSING

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

Page 7: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

INFORMATION TIER PURPOSE BUILT DATA SUBSETS TRANSFORMATION POST PROCESSING

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

Page 8: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

PRESENTATION TIER BUSINESS VALUE APPLICATIONS DATA SERVICES AD HOC QUERYING WRITE BACK?

Page 9: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

Page 10: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence

Transformation Logic

Data Post Processing

Near Real Time Feed

Emergent Design &

Agile Delivery

Page 11: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional
Page 12: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

Apache KafkaApache Storm

Page 13: Agile Data Processing Pipelines - Software Engineering Radio · AGILE DATA PROCESSING PIPELINES Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks. Conventional

For questions or suggestions: !

Ken Collier [email protected]

Follow @theagilist @thoughtworks

THANK YOU