Upload
dataworks-summit
View
364
Download
3
Embed Size (px)
Citation preview
ApacheNiFiandMiNiFi:EdgetoCoreAndyLoPresto-@yolopey
ApacheNiFiPMCDataWorksSummit2017-Sydney
19Sep2017
©HortonworksInc.2011–2016.AllRightsReserved2
AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiIoTChallengesApacheMiNiFiExplorationCommunity
©HortonworksInc.2011–2016.AllRightsReserved3
AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiIoTChallengesApacheMiNiFiExplorationCommunity
©HortonworksInc.2011–2017.AllRightsReserved
GaugingAudienceFamiliarityWithNiFi
“What’saNeeFee?”
NoexperiencewithdataflowNoexperiencewithNiFi
“Icanpickthisupprettyquickly”
SomeexperiencewithdataflowSomeexperiencewithNiFi
“IrefactoredtheAmbariintegrationendpointtoallowformutualauthenticationTLSduringmycoffeebreak”
ForgottenmoreaboutNiFithanmostofuswilleverknow
©HortonworksInc.2011–2017.AllRightsReserved5
Let’sConnectAtoBProducersA.K.AThings
AnythingAND
Everything
Internet!
Consumers• User• Storage• System• …MoreThings
©HortonworksInc.2011–2017.AllRightsReserved6
Movingdataeffectivelyishard
Standards:http://xkcd.com/927/
©HortonworksInc.2011–2017.AllRightsReserved7
Whyismovingdataeffectivelyhard?
⬢ Standards⬢ Formats⬢ “ExactlyOnce”Delivery⬢ Protocols⬢ VeracityofInformation⬢ ValidityofInformation⬢ EnsuringSecurity⬢ OvercomingSecurity
⬢ Compliance⬢ Schemas⬢ ConsumersChange⬢ CredentialManagement⬢ “That[person|team|group]”⬢ Network*⬢ “ExactlyOnce”Delivery
©HortonworksInc.2011–2017.AllRightsReserved8
ConnectingAtoBtoCEasyenoughwithBashscripts,Ruby/Python/Groovy,etc.
Logfiles
SQL
BigData
©HortonworksInc.2011–2017.AllRightsReserved9
Let’sConnectLotsofAstoBstoAstoCstoBstoΔstoCstoϕsLet’sconsidertheneedsofacourierservice
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter CoreDataCenteratHQ
ServerCluster
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:RigoPeter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/
©HortonworksInc.2011–2017.AllRightsReserved10
Great!Iamcollectingallthisdata!Let’suseit!Findingourneedlesinthehaystack
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter
Kafka
CoreDataCenteratHQ
ServerCluster
Others
Storm/Spark/Flink/Apex
Kafka
Storm/Spark/Flink/Apex
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck:CreativeStall,https://thenounproject.com/creativestall/Deliverer:RigoPeter,https://thenounproject.com/rigo/CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/HandScanner:EricPearson,https://thenounproject.com/epearson001/
©HortonworksInc.2011–2017.AllRightsReserved11
Let’sConnectLotsofAstoBstoAstoCstoBstoΔstoCstoϕsRaiseyourhandifyouwanttomaintainPythonscriptsfortherestofyourlife
©HortonworksInc.2011–2016.AllRightsReserved12
AgendaWhatisdataflowandwhatarethechallenges?ApacheNiFiIoTChallengesApacheMiNiFiExplorationCommunity
©HortonworksInc.2011–2017.AllRightsReserved13
NiFiisbasedonFlowBasedProgramming(FBP)
FBPTerm NiFiTerm DescriptionInformationPacket
FlowFile Eachobjectmovingthroughthesystem.
BlackBox FlowFileProcessor
Performsthework,doingsomecombinationofdatarouting,transformation,ormediationbetweensystems.
BoundedBuffer
Connection Thelinkagebetweenprocessors,actingasqueuesandallowingvariousprocessestointeractatdifferingrates.
Scheduler FlowController
Maintainstheknowledgeofhowprocessesareconnected,andmanagesthethreadsandallocationsthereofwhichallprocessesuse.
Subnet ProcessGroup
Asetofprocessesandtheirconnections,whichcanreceiveandsenddataviaports.Aprocessgroupallowscreationofentirelynewcomponentsimplybycompositionofitscomponents.
©HortonworksInc.2011–2017.AllRightsReserved14
ApacheNiFiKeyFeatures
• Guaranteeddelivery• Databuffering
- Backpressure- Pressurerelease
• Prioritizedqueuing• FlowspecificQoS
- Latencyvs.throughput- Losstolerance
• Dataprovenance• Supportspushandpull
models
• Recovery/recordingarollinglogoffine-grainedhistory
• Visualcommandandcontrol
• Flowtemplates• Pluggable,multi-tenant
security• Designedforextension• Clustering
©HortonworksInc.2011–2017.AllRightsReserved15
FlowFilesarelikeHTTPdataHTTPData FlowFile
HTTP/1.1200OKDate:Sun,10Oct201023:26:07GMTServer:Apache/2.2.8(CentOS)OpenSSL/0.9.8gLast-Modified:Sun,26Sep201022:04:35GMTETag:"45b6-834-49130cc1182c0"Accept-Ranges:bytesContent-Length:13Connection:closeContent-Type:text/html
Helloworld!
StandardFlowFileAttributesKey:'entryDate’ Value:'FriJun1717:15:04EDT2016'Key:'lineageStartDate’Value:'FriJun1717:15:04EDT2016'Key:'fileSize’ Value:'23609'FlowFileAttributeMapContentKey:'filename’ Value:'15650246997242'Key:'path’ Value:'./’
BinaryContent*
Header
Content
©HortonworksInc.2011–2017.AllRightsReserved18
DataProvenance
▪ Constrained▪ High-latency▪ Localizedcontext
▪ Hybrid–cloud/on-premises▪ Low-latency▪ Globalcontext
Origin–attributionReplay–recovery
EvolutionoftopologiesLongretention
TypesofLineage• Event• Configuration
©HortonworksInc.2011–2017.AllRightsReserved19
DeeperEcosystemIntegration:220+Processors
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
RouteContent
RouteContext
RouteText
ControlRate
DistributeLoad
GenerateTableFetch
JoltTransformJSON
PrioritizedDelivery
Encrypt
Tail
Evaluate
Execute
AllApacheprojectlogosaretrademarksoftheASFandtherespectiveprojects.
Fetch
HTTP
Syslog
HTML
Image
HL7
FTP
UDP
XML
SFTP
AMQP
WebSocket
©HortonworksInc.2011–2017.AllRightsReserved20
EdgeChallenges
⬢ Limitedcomputingcapability
⬢ Limitedpower/network
⬢ Restrictedsoftwarelibrary/platformavailability
⬢ NoUI
⬢ Physicallyinaccessible
⬢ Notfrequentlyupdated
⬢ Competingstandards/protocols
⬢ Scalability
⬢ Privacy&Security
©HortonworksInc.2011–2017.AllRightsReserved21
RecentExamples
⬢ WhentheMiraiattackhasitsownWikipediapage,that’snotgood
©HortonworksInc.2011–2017.AllRightsReserved22
NiFiSolvesEverything*
⬢ RunsonJVM
⬢ ProvidesUIforflowdesign&monitoring
⬢ Securitybuilt-in
⬢ TLS,authn/authz,encrypteddata
⬢ Handlespracticallyanyformat/protocol
©HortonworksInc.2011–2017.AllRightsReserved23
NiFiforIoT
⬢ NiFisupportsAMQP,MQTT,UDP,TCP,HTTP(S),CEF,JMS,(S)FTP,AWSIoT
⬢ Withalittlepruning,NiFicanrunonaRaspberryPi
©HortonworksInc.2011–2017.AllRightsReserved24
Example—SensorReadingsviaRP3B
⬢ TimSpann
⬢ SenseHatsensorattachment
⬢ Temp,humidity,pressure
⬢ 8x8LEDdisplay
⬢ PythonFlaskserverreadingsensorandpushingtoMQTT
⬢ NiFiconsumingMQTT
https://community.hortonworks.com/articles/55839/reading-sensor-data-from-remote-sensors-on-raspber.html
©HortonworksInc.2011–2017.AllRightsReserved25
SoWhyDoWeNeedADifferentSolution?
⬢ NiFiisdesignedto“ownthebox”
⬢ NiFi0.7.xstartedupinabout10-15minutesonRP3(593MB)
⬢ NiFi1.xstartedupinabout30minutesonRP3(760MB)
⬢ 33newprocessors
⬢ Rewriteformultitenantauthorization
⬢ CompleteUIoverhaul
©HortonworksInc.2011–2017.AllRightsReserved26
ApacheNiFiSubproject:MiNiFi
⬢ GetthekeypartsofNiFiclosetowheredatabeginsandprovidebidirectionalcommunication
⬢ NiFilivesinthedatacenter—giveitanenterpriseserveroraclusterofthem
⬢ MiNiFilivesasclosetowheredataisbornandisaguestonthatdeviceorsystem
⬢ IoT
⬢ Connectedcar
⬢ Legacyhardware
©HortonworksInc.2011–2017.AllRightsReserved27
WhybuildMiNiFi?
⬢ NiFiisbig
⬢ 1.3.0releaseis933MBcompressed
⬢ Canbemodifiedtoruninrestrictedenvironments,butrequiresmanualsurgery
⬢ ProvidesUI,provenancequery,etc.
⬢ Runsondedicatedmachines/clusters—“ownsthebox”
⬢ MiNiFilivesattheedge
⬢ NoUI
⬢ 0.1.0Javabinaryis45MB,C++binaryis746KB
⬢ “Goodguest”
©HortonworksInc.2011–2017.AllRightsReserved28
HowDoesMiNiFiInteractWithNiFi?
⬢ NiFi
⬢ Designflows
⬢ Aggregatedatafrommanysources
⬢ Performrouting/analysis/SEP
⬢ MiNiFi
⬢ Receiveflows
⬢ Collectdata
⬢ Sendforprocessing
©HortonworksInc.2011–2017.AllRightsReserved29
Let’sAddDimensionality
⬢ We’vebeenimaginingEDGEtoCOREasabi-directionallinearsystem
⬢ Let’sexpand thattotherealworld
©HortonworksInc.2011–2017.AllRightsReserved30
FlavorsofMiNiFi
⬢ MiNiFiJava(v0.2.0)
⬢ ModifiedversionofNiFi
⬢ NoUI
⬢ YAMLconfiguration
⬢ Reducedprocessorcount
⬢ 110bydefault,more availablewithadditionalNARs
⬢ MiNiFiC++(v0.2.0)
⬢ Writtenfromscratch
⬢ 10processorsbydefault
⬢ Bi-directionalsite-to-site&provenancedata
©HortonworksInc.2011–2017.AllRightsReserved31
NiFivsMiNiFiJavaProcesses
NiFiFramework
Components
MiNiFi
NiFiFramework
UserInterface
Components
NiFi
©HortonworksInc.2011–2017.AllRightsReserved32
NiFiJavaProcesses
Bootstrap
NiFi
UI
bootstrap.conf
nifi.properties
flow.xml.gzreads&modifies
reads
reads
starts
NiFi MiNiFi
©HortonworksInc.2011–2017.AllRightsReserved33
MiNiFiJavaProcesses
MiNiFi
Bootstrap
Configuration ChangeNotifier(s)
bootstrap.conf
nifi.properties
flow.xml.gzreads
reads
starts
config.ymltransforms
reads
into
NiFi MiNiFi
©HortonworksInc.2011–2017.AllRightsReserved34
WhatdoesMiNiFiprovide?
⬢ Datatagging/provenance
⬢ Governancefromedge(geopoliticalrestrictions)
⬢ Security(encryption,certificate-basedauthentication)
⬢ Lowlatency(immediatereactions&decision-making)
Connected Car Reference Platform Box
Tuner + DSRC CardConnectivity Card
©HortonworksInc.2011–2017.AllRightsReserved35
MiNiFionaConnectedCar
Comprehension
Collection
CANBus
Gateway
MCU MCU MCU
Ethernet/EthernetAVB
LocalInterconnectNetwork
Yettobeestablishedprotocol
ListenEthernet ListenLINListenCAN Listen<>
ParseCAN ParseEthernet ParseLIN Parse<>
Processing/Synthesis
Route
Transmit Execute PrioritizeFilter
©HortonworksInc.2011–2017.AllRightsReserved37
MiNiFiExfil
⬢ Site-to-Site
⬢ NiFiprotocol
⬢ Twoimplementations
⬢ Rawsocket
⬢ HTTP(S)(Javaonly)
⬢ SecuredwithmutualauthenticationTLS
⬢ HTTP(S),(S)FTP,JMS,Syslog,File,Email,Process(Javaonly)
©HortonworksInc.2011–2017.AllRightsReserved38
AdvancedTopics
⬢ NewfeaturesinApacheNiFi1.2.0&1.3.0
⬢ NewfeaturesinApacheMiNiFiJava0.2.0&C++0.2.0
⬢ NewsubprojectApacheNiFiRegistry
©HortonworksInc.2011–2017.AllRightsReserved39
Newin1.2.0/1.3.0
⬢ RecordParsing
⬢ EncryptedProvenanceRepository
©HortonworksInc.2011–2017.AllRightsReserved40
RecordParsing
⬢ Previously,datahadtobedividedintoindividualflowfilestoperformwork
⬢ CSVoutputwith50klineswouldneedtobesplit,operatedon,re-merged
⬢ 1+50k+50k+1flowfiles=100kflowfiles
©HortonworksInc.2011–2017.AllRightsReserved41
RecordParsing
⬢ Nowflowfilecontentcancontainmany“record”elements
⬢ Readandwritewith*Readerand*WriterControllerServices
⬢ Performlookups,routing,conversion,SQLqueries,validation,andmore…
⬢ 1+1flowfiles=2flowfiles
©HortonworksInc.2011–2017.AllRightsReserved42
EncryptedProvenanceRepository
⬢ EveryprovenanceeventrecordisencryptedwithAESG/CMbeforebeingpersistedtodisk
⬢ Decryptedondeserializationforretrieval/query
⬢ Randomaccessviaoffsetseek
⬢ Handleskeymigration&rotation
©HortonworksInc.2011–2017.AllRightsReserved43
MiNiFiJava0.2.0
⬢ UpgradingofcorecomponentdependenciestoNiFi1.2.0
⬢ Initialcommandandcontrolservercapabilities
⬢ IncreasedsupportforNiFifeaturesinconfigurationYAMLinclusiveof:
⬢ SupportforHTTPSitetoSiteProxyProperties
⬢ ControllerServices
⬢ Bindingsitetositetoaspecificnetworkinterface
©HortonworksInc.2011–2017.AllRightsReserved44
MiNiFiC++0.2.0
⬢ IncorporationofCatchtestingframeworkandGooglelintingforcodequalityandenhancedtestcoverage
⬢ ProvidingsupportforreportingtasksandaninitialimplementationofSitetoSiteProvenancereporting
⬢ NewProcessorsinclusiveofPutFile,LIstenHTTP
©HortonworksInc.2011–2017.AllRightsReserved45
MiNiFiFeatureProposals
⬢ FlowVersioning
⬢ DevelopflowsforclassofMiNiFiinstances
⬢ Command&Control(C2)API(inJavamaster)
⬢ FileChangeIngestor
⬢ RestAPIIngestor
⬢ PullHTTPIngestor
©HortonworksInc.2011–2017.AllRightsReserved46
ApacheNiFiRegistry
⬢ “…complementaryapplicationthatprovidesacentrallocationforstorageandmanagementofsharedresourcesacrossoneormoreinstancesofNiFiand/orMiNiFi.”
©HortonworksInc.2011–2017.AllRightsReserved47
ApacheNiFiRegistry-FlowRegistry
⬢ Flowregistrystores&managesversionedflowdefinitions
⬢ IntegratedwithNiFitoallowsave/retrieve/upgradeoperationsfromcanvas
⬢ Adminofusers,groups,andpolicies
©HortonworksInc.2011–2017.AllRightsReserved51
WhyNiFi&MiNiFi?
⬢ Movingdataismultifacetedinitschallengesandthesearepresentindifferentcontextsatvaryingscopes– Intervsintra,domestically,internationally
⬢ Providecommontoolingandextensionsthatareneededbutbeflexibleforextension– LeverageexistinglibrariesandexpansiveJavaecosystemforfunctionality– Alloworganizationstointegratewiththeirexistinginfrastructure
⬢ Empowerfolksmanagingyourinfrastructuretomakechangesandreasonaboutissuesthatareoccurring– DataProvenancetoshowcontextanddata’sjourney– UserInterface/Experienceakeycomponent
©HortonworksInc.2011–2017.AllRightsReserved53
Learnmoreandjoinus
Apache NiFi site https://nifi.apache.org
Subproject MiNiFi site https://nifi.apache.org/minifi/
Subscribe to and collaborate at [email protected] [email protected]
Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI
Follow us on Twitter @apachenifi
©HortonworksInc.2011–2017.AllRightsReserved54
LearnandshareatBirdsofaFeatherIOT,STREAMING&DATAFLOW
ThursdaySeptember216:00pm,C4.6
©HortonworksInc.2011–2017.AllRightsReserved
ThankYou
I’mstickingaroundfordiscussions/questions@yolopey/@[email protected]:70ECB3E598A65A3FD3C4BACE3C6EF65B2F7DEF69
55