Upload
others
View
18
Download
1
Embed Size (px)
Citation preview
AGILE DATA PROCESSING PIPELINESKen Collier, PhD Director, Agile Analytics @theagilist #thoughtworks
Conventional Architectures
Pull-based Batch Loads
Enterprise Data Models
Complex ETL Logic
Poorly Suited to
Non-Relational Data
Emergent design is difficult
DESIGN PRINCIPLES
Enable cheap/easy data ingestion
Enable inexpensive scaling
Enable emergent design
Enable easy recreation of information
Drive logic closer to the application
Enable near real time presentation
Support polyglot persistence
DATA CORE RAW FACTUAL DATA HISTORIZED EVENTS RETAIN BUSINESS KEYS DATA LINEAGE
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
DATA INGESTION EVENT DRIVEN MESSAGE QUEUE TRICKLE FEED
INFORMATION PUBLISHING TOPICAL QUEUES MDM CONCERNS DATA GOVERNANCE POST PROCESSING
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
INFORMATION TIER PURPOSE BUILT DATA SUBSETS TRANSFORMATION POST PROCESSING
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
PRESENTATION TIER BUSINESS VALUE APPLICATIONS DATA SERVICES AD HOC QUERYING WRITE BACK?
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
Enable cheap/easy data ingestion Enable inexpensive scaling Enable emergent design Enable easy recreation of information Drive logic closer to the application Enable near real time presentation Support polyglot persistence
Transformation Logic
Data Post Processing
Near Real Time Feed
Emergent Design &
Agile Delivery
Apache KafkaApache Storm
For questions or suggestions: !
Ken Collier [email protected]
Follow @theagilist @thoughtworks
THANK YOU