26
Streaming Transformations Using Oracle Data Integration Michael Rainey | BIWA Summit 2017

Streaming with Oracle Data Integration

Embed Size (px)

Citation preview

Page 1: Streaming with Oracle Data Integration

StreamingTransformationsUsingOracleDataIntegration

MichaelRainey|BIWASummit2017

Page 2: Streaming with Oracle Data Integration

• MichaelRainey-TechnicalAdvisor• SpreadingthegoodwordaboutGluentproductswiththeworld

• OracleDataIntegrationexpertise• OracleACEDirector• mRainey.co

2

Introduction

we liberate enterprise data

Page 3: Streaming with Oracle Data Integration

Whatis“Streaming”

Page 4: Streaming with Oracle Data Integration

• Theprocessingandanalysisofstructuredor“unstructured”datainreal-time

• WhyStreaming?• Whenspeed(velocity)ofdataiskey• Streamingdataisprocessedin“timewindows”,inmemory,acrossaclusterofservers

• Examples:• Calculatingaretailbuyingopportunity• Real-timecostcalculations• IoTdataanalysis

4

Whatis“Streaming”

Page 5: Streaming with Oracle Data Integration

“Publish-subscribemessagingrethoughtasadistributedcommitlog”

5

Streamingdata-ApacheKafka

Image source: kafka.apache.org/

Page 6: Streaming with Oracle Data Integration

EnterpriseDataBus

6

Page 7: Streaming with Oracle Data Integration

EnterpriseDataBus

6

Page 8: Streaming with Oracle Data Integration

• Scalable,fault-tolerant,high-throughputstreamprocessing• SparkStreamingreceivesliveinputdatastreamsfromvarioussources• ContinuousstreamofdataisknownasadiscretizedstreamorDStream

• Dataisdividedintomini-batchesandprocessedbytheSparkengine• Operationssuchasjoin,filter,map,count,windowedcomputations,etcareusedtotransformdatain-flight

7

Streamprocessing-ApacheSpark

Page 9: Streaming with Oracle Data Integration

WhyOracleDataIntegration?

Page 10: Streaming with Oracle Data Integration

• EnterprisehasinvestedheavilyinODIand/orGoldenGate

• Gettingstartedwithdevelopmentlanguages(Python/pySpark,Java,etc)

• Centralizedmetadatamanagement• Integratewithotherdatasourcesusingasingleinterface

• Realizedcostsavings• AccordingtoGartner,200%increaseinmaintenancecostswhencustomcoding(https://www.gartner.com/doc/3432617/does-customcoded-data-integration-stack)

9

WhyOracleDataIntegration?

Page 11: Streaming with Oracle Data Integration

10

StreamingwithOracleDataIntegration

Page 12: Streaming with Oracle Data Integration

10

StreamingwithOracleDataIntegration

Real-timedatareplication

Streamingintegration:OGG->Kafka

Streamingintegration:Kafka->SparkStreaming

Page 13: Streaming with Oracle Data Integration

11

RelationaldatabasetransactionstoKafka

Page 14: Streaming with Oracle Data Integration

• GoldenGate• …isnon-invasive• …hascheckpointsforrecovery• …movesdataquickly• …iseasytosetup

12

WhyGoldenGatewithKafka?

Page 15: Streaming with Oracle Data Integration

• Heterogeneoussourcesandtargets• Builttointegratealldata

• Flexibility• Reusablecodetemplates(KnowledgeModules)

• ReusableMappings• ODIcanadapttoyourdatawarehouse-andnottheotherwayaround

• Flowbasedmappings

13

WhyOracleDataIntegratorwithSparkStreaming?

Page 16: Streaming with Oracle Data Integration

GettingstartedwithstreamingusingOracleDataIntegration

Page 17: Streaming with Oracle Data Integration

• StandardGoldenGateExtract/PumpprocessestocaptureRDBMSdata• ReplicatforJavaparameterfile&processgroupcreatedandsetup• KakfaProducerpropertiesandKafkaHandlerconfigurationsetup

15

OracleGoldenGateforBigData-KafkaHandlerSetup

Page 18: Streaming with Oracle Data Integration

• Kafkahandlerproperties• SetpropertiesforhowGoldenGateinteractswithKafka• Format,transactionvsoperationmode,etc

• Kafkaproducerconfiguration

16

GoldenGateforKafkasetup

http://mrainey.co/ogg-kafka-oow

Page 19: Streaming with Oracle Data Integration

17

KafkaandOracleDataIntegratorsetup

Page 20: Streaming with Oracle Data Integration

17

KafkaandOracleDataIntegratorsetup

Page 21: Streaming with Oracle Data Integration

• CreateModelusingKafkaLogicalSchema

• CreateDatastore• Similartostandard“File”datastore,definefileformatandsetupcolumns

• OnlysupportforCSV• FutureformatsmayincludeJSON,Avro,etc

• AddDatastoretomapping

18

KafkaandOracleDataIntegrator

Page 22: Streaming with Oracle Data Integration

• CreateSparkDataServer,Physical/LogicalSchema• SetHadoopDataServer• Addproperties,suchascheckpointing,asynchronousexecutionmode,etc• Additionalpropertiescanbeadded:http://spark.apache.org/docs/latest/configuration.html

• SparkServerissetupasStaginglocation• SourceDatastorefromKafka,OracleDB,etc• TargetDatastoreisCassandra,OracleDB,etc

• CodegeneratedbyKMispySpark• pySparkcodecanbeaddedtofilters,joins,othercomponentsfortransformations• Additionallanguages(Scala,Java)maybecomingsoon

19

SparkStreamingandOracleDataIntegrator

Page 23: Streaming with Oracle Data Integration

20

SparkStreamingandOracleDataIntegrator

EnabletheStreamingflaginthePhysicaldesignofamapping.

TogenerateSparkcode,settheExecuteOnHintoptiontousetheSparkdataserverasthestaginglocationforyourmapping

TargetIKMshouldnotbeset.Sparkgeneratedcodewillhandleintegrationandloadintotarget.

Page 24: Streaming with Oracle Data Integration

21

Trackingtheprocess

Whenexecuting,theprocesswillruncontinuouslyintheODIOperator.

IftheconnectionbetweentheODIAgentandSparkAgentislost,itwillreestablishitselfafterrecovery.

Page 25: Streaming with Oracle Data Integration

• Streamingisthe“velocity”indata.AKA“FastData”

• OracleDataIntegratorandOracleGoldenGateprovideaframeworkfordevelopmentandmanagementofdatastreamingprocesses• BigDataadd-onscontinuetosupportnewtechnologies

• BuildastreamingarchitectureusingGoldenGateandODI:• Metadatamanagement• IntegrationofRDBMSdatawith“schemaonread”data• Buildupontheskillsin-house

22

Recap

Page 26: Streaming with Oracle Data Integration

23

we liberate enterprise data

thank you!