Upload
shashank-gautam
View
212
Download
2
Embed Size (px)
Citation preview
Fabric Real-time stream processing framework
Shashank Gautam Sathish Kumar KS
What is Fabric?
Fabric is a scalable, practical and reliable real-time stream processing framework designed for easy operability and extension.
Fabric is proven to work very well for:
● High velocity multi-destination event ingestion with guaranteed persistence.
● Rules/Filter based real-time triggers for advertising/broadcast● Online Fraud detection● Real-time pattern matching● Streaming analytics
The Problem● Primary motivation
○ Streaming millions of messages per second○ Connectivity to different source - Kafka, MySql etc○ Write to different targets - DB, Queue, API or publish to other
high level applications○ Near real time
● Desirable properties from the framework ○ High Throughput - support of batching of events○ Data sanity - Avoiding datasets which makes no sense○ Make data available for other applications to consume○ Scalability and Data Reliability○ Provide easy development and deployment○ Resource effectiveness
Fabric Core Components
Fabric Compute and Executor Fabric Compute Framework
● Computation pipeline setup ● Batch event processing● Event passing among components● Acknowledgements
Fabric Compute and Executor continued...
Fabric Compute and Executor continued...
Fabric-executor
Responsible for :
● Launching, monitoring and managing deployed computations● 1:1 relation between 1 instance of computation : fabric executor
process● Fabric executor is single JVM process within a docker container
Fabric Terminologies● Compute Framework
○ Realtime event processing framework○ Core event orchestration○ Perform user-defined operations
● EventSet○ Collection(of configurable size) of events○ Basic transmission unit within the computation
● Computation/Topology○ Pipeline for data flow using fabric components created by user○ Components can be of two types, Source and Processor
Fabric Terminologies continued...
● Source○ Sources event sets into the computation○ Manages the Qos of the events ingested into the computation
● Processor○ Performs computation on an incoming event set ○ Emits an outgoing event set ○ Types:
■ Streaming Processor: Streaming Processor is triggered whenever and event set is sent to the processor.
■ Scheduled Processor: Scheduled Processor is triggered whenever a fixed period of time elapses in a periodic fashion.
Management And DeploymentFabric Manager
● Dropwizard Web Service and runs inside a docker container● Provides APIs to register components - sources and processors ● Provides APIs to perform CRUD on computations● Management APIs to deploy, scale, get, delete computations● Application resource exposes APIs for deployment related operations of computations. ● Deployment Env: Marathon and Mesos
Sample Resources
● Components. eg: POST /v1/components ○ Other APIs - get, search, register etc
● Computation. eg: POST /v1/computations/{tenant} ○ Other APIs - get, search, update, deactivate etc
● Application. eg: POST /v1/applications/{tenant}/{computation_name}○ Other APIs - get, scale, suspend etc
Fabric Sample
Fabric Components
Create Components using maven archetype
Maven archetype command -
mvn archetype:generate -DarchetypeGroupId=com.olacabs.fabric -DarchetypeArtifactId=fabric-processor-archetype -DarchetypeVersion=1.0.0-SNAPSHOT -DartifactId=<artifact_id_of_your_project> -DgroupId=<group_id_of_your_project> -DinteractiveMode=ture
Example -
mvn archetype:generate -DarchetypeGroupId=com.olacabs.fabric -DarchetypeArtifactId=fabric-processor-archetype -DarchetypeVersion=1.0.0-SNAPSHOT -DartifactId=fabric-my-processor -DgroupId=com.olacabs.fabric -DinteractiveMode=ture
What it does -
Creates the pom project for the processor with all the updated version of compute and other related jars.
Creates boilerplate code, with example, for scheduled and stream processor. You can modify the example java file as per your need.
Sample Fabric Source/*** A Sample Source Implementation which generates* Random sentences.*/@Source(namespace = "global", name = "random-sentence-source", version = "0.1", description = "Sample source", cpu = 0.1, memory = 64,requiredProperties = {}, optionalProperties = {"randomGeneratorSeed"})
public class RandomSentenceSource implements PipelineSource {
Random random;String[] sentences = { "A quick brown fox jumped over the lazy dog", "Life is what happens to you when you are busy making other plans" . . . . . .};
@Overridepublic void initialize(final String instanceName,final Properties global,final Properties local, final ProcessingContext processingContext, final ComponentMetadata componentMetadata) throws Exception {
int seed = ComponentPropertyReader.readInteger(local, global, "randomGeneratorSeed", instanceName, componentMetadata, 42); random = new Random(seed);
}
Sample Fabric Source continued...
@Overridepublic RawEventBundle getNewEvents() {
return RawEventBundle.builder().events( getSentences(5).stream().map(sentence -> Event.builder() .id(random.nextInt()) .data(sentence.toLowerCase()) .build()) .collect(Collectors.toCollection(ArrayList::new))) .meta(Collections.emptyMap()) .partitionId(Integer.MAX_VALUE) .transactionId(Integer.MAX_VALUE) .build();}
private List<String> getSentences(int n) { List<String> listOfSentences = new ArrayList<>(); for (int i = 0; i < n; i++) { listOfSentences.add(sentences[random.nextInt(sentences.length)]); } return listOfSentences;}
}
Sample Fabric Processor
/*** A sample Processor implementation which* Gets the data (sentences) and splits based on delim.*/
@Processor(namespace = "global", name = "splitter-processor", version = "0.1", cpu = 0.1, memory = 32, description = "A processor that splits sentences by a given delimiter", processorType = ProcessorType.EVENT_DRIVEN, requiredProperties = {}, optionalProperties = {"delimiter"})
public class SplitterProcessor extends StreamingProcessor { private String delimiter;
@Overridepublic void initialize(final String instanceName, final Properties global, final Properties local,
final ComponentMetadata componentMetadata) throws InitializationException {
delimiter = ComponentPropertyReader.readString(local, global, "delimiter", instanceName, componentMetadata, ",");}
Sample Fabric Processor continued...
@Override protected EventSet consume(final ProcessingContext processingContext, final EventSet eventSet) throws ProcessingException {
List<Event> events = new ArrayList<>(); eventSet.getEvents().stream() .forEach(event -> { String sentence = (String) event.getData();
String[] words = sentence.split(delimiter); events.add(Event.builder().data(words)id(Integer.MAX_VALUE).properties(Collections.emptyMap()).build());
});
return EventSet.eventFromEventBuilder() .partitionId(eventSet.getPartitionId()) .events(events) .build();
}
@Override public void destroy() { // do some cleanup if necessary }
}
Sample Computation / Topology
A sample topology -
● Select random sentence from in memory list● Split the sentence based on a delimiter● Counts the word● Prints the count on console
Sample Computation / Topology Spec continued...
{ "name": "word-count-print-topology", "sources": [
{ "id": "random-sentence-source", "meta": { // … meta for source} }, "properties": { //.. properties for source} ], "processors": [ { "id": "splitter-processor",
"meta": { // … meta for processor} "properties": { //.. properties for processor} }, { "id": "word-count-processor",
"meta": { // … meta for processor} "properties": { //.. properties for processor} }, { "id": "console-printer-processor",
"meta": { // … meta for processor} "properties": { //.. properties for processor} } ],
"connections": [ { "fromType": "SOURCE", "from": "random-sentence-source", "to": "splitter-processor" }, { "fromType": "PROCESSOR", "from": "splitter-processor", "to": "word-count-processor" }, { "fromType": "PROCESSOR", "from": "word-count-processor", "to": "console-print-processor" } ], "properties": {// … global properties
}}
Steps for Action
Fabric Implementation at Ola
Fabric At Ola
Fabric At Ola continued...
Artifact Registration View
Fabric At Ola continued...
Topology Creation View
Fabric At Ola continued...
Created Topology View
Fabric At Ola continued...
One click deployment
Fabric At Ola continued...
Marathon App
FabricNumbers
Fabric At Ola Stats
Ola is currently receiving ~2.5 million events per second from its end users - driver and customer apps as well as internally generated events. Multiple real-time use cases stem from the events which includes:
● Fraud detection and prevention● Just-in-time notifications● Security alerts● Real-time reporting● Generating user specific offers
Fabric has been in production at Ola for 10 months now and powering these applications apart from acting as raw event ingestion and pub-sub system.
Fabric At Ola Stats continued...
Key Stats -
● Event Streams Handled : 375+
● No of topologies live : 160+
● Ingestion rate : ~2.5 million per second on 10 nodes
● Node Config : C4.8x large machines
Fabric Summary Points
1. Developed in Java.2. Highly scalable and guaranteed availability3. Reliable - Framework level guarantees against message loss, support for replay, multiple
sources and complex tuple trees4. Event batching is supported at the core level.5. Source level event partitioning used as unit for scalability.6. Uses capabilities provided by docker to ensure strong application7. On the fly topology creation and deployment by dynamically assembling topologies using
components directly from artifactory8. Inbuilt support for custom metrics and custom code level healthchecks to catch application
failures right when they happen9. Easy development and deployment
And many more...
Links
Fabric is recently open sourced on github.
● Github link: https://github.com/olacabs/fabric
● Documentation: https://github.com/olacabs/fabric/blob/develop/README.md
Please Contribute…!
Thank You!
Shashank GautamSathish Kumar KS