24
Data Integration with Server Side Mashups Juergen Brendel Principal Software Engineer OSDC 2007, Brisbane

Data Integration with server side Mashups

Embed Size (px)

DESCRIPTION

The open source SnapLogic data integration framework. Overview, examples, screenshots.

Citation preview

Page 1: Data Integration with server side Mashups

Data Integration with Server Side Mashups

Juergen BrendelPrincipal Software Engineer

OSDC 2007, Brisbane

Page 2: Data Integration with server side Mashups

Slide 2

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Agenda

• The SnapLogic project• Client-side mashups• Problems and solutions• Data integration with SnapLogic

Page 3: Data Integration with server side Mashups

Slide 3

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

The SnapLogic project

• Founded 2005, data integration background• Vision:

– Reusable data integration resources– REST– Web-based GUI– Programmatic interface– Open Source

• Python... Why not?• www.snaplogic.com

Page 4: Data Integration with server side Mashups

Slide 4

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

What's a mashup?

• A 'Web 2.0 kind of thing'• Combine, aggregate, visualise

– Multiple sources– Multiple dimensions

• Typically on the client side– Browser– Ajax

Page 5: Data Integration with server side Mashups

Slide 5

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Self-made mashups

• Hand coded• Mashup editors

– GUI mashup-logic editor– Wiki-style– Hosted

Page 6: Data Integration with server side Mashups

Slide 6

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Benefits for the enterprise?

Yeah, right...

Enable knowledgeEnable knowledgeworkers !!!workers !!! Situat

ionalSituat

ional

applicatio

ns !

applicatio

ns !

Avoid theAvoid theIT bottleneck !!

IT bottleneck !!

Page 7: Data Integration with server side Mashups

Slide 7

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Problems with client-side mashups

• Skill• Internal data often not web-friendly• Maintenance• Security• Performance

Page 8: Data Integration with server side Mashups

Slide 8

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Solution: Server-side mashups

• Flexible access• Security• Performance

Page 9: Data Integration with server side Mashups

Slide 9

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

SnapLogic data integration philosophy

• Clearly defined, REST resources• Data reuse and integration• Pipelines• Framework for resource specific scripting• Open source and community

Page 10: Data Integration with server side Mashups

Slide 10

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Example: Resources

SnapLogic Server

ComponentHTTP

Resource Definition

Databases

Files

Applications

Atom / RSS

HTTP://server1.example.com/customer_list

Client HTTP Request and Response

• Resource Name• HTTP://server1.example.com/customer_list • SQL Query or filename • Credentials• Parameters

JSON

Page 11: Data Integration with server side Mashups

Slide 11

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Example: Pipelines

SnapLogic Server

Component HTTP

Resource Definition

HTTP://server1.example.com/processed_customer_list

Client HTTP Request and Response

Component

Resource Definition

Component

Resource Definition

Read Geocode Sort

Databases

Files

Applications

Atom / RSS

JSON

Page 12: Data Integration with server side Mashups

Slide 12

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

A simple pipeline: Filtering leads

Page 13: Data Integration with server side Mashups

Slide 13

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Linking fields in a pipeline

Page 14: Data Integration with server side Mashups

Slide 14

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Reusing a pipeline as a resource

Page 15: Data Integration with server side Mashups

Slide 15

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Reusing a pipeline as a resource

Page 16: Data Integration with server side Mashups

Slide 16

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Reusing a pipeline as a resource

Page 17: Data Integration with server side Mashups

Slide 17

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Adding new components

• For access logic• For data transformations• Independent of data format• Currently written in Python

Page 18: Data Integration with server side Mashups

Slide 18

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

A simple processing component

1: class IncreaseSalary(DataComponent):2: 3: def init(self):4: '''Called when the component is started.'''5: self.increase = float(self.moduleProperties['percent_increase'])6: 7: def processRecord(self, record):8: '''Called for every record.'''9: record.fields['salary'] *= (1 + self.increase/100)10: self.writeRecord(record)

Page 19: Data Integration with server side Mashups

Slide 19

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

An Apache log file reader1: class LogReader(DataComponent):2: 3: def startReading(self):4: '''Called when component does not have input stream.'''5: logfile = open(self._filename, 'rbU')6: format = self.moduleProperties['log_format']7: 8: if format == 'COMMON':9: p = apachelog.parser(apachelog.formats['common'])10: elif ...11: 12: # Read all lines in the logfile13: for line in logile:14: out_rec = Record(self.getSingleOutputView())15: raw_rec = p.parse(line)16: out_rec.fields['remote_host'] = raw_rec['%h']17: out_rec.fields['client_id'] = raw_rec['%l']18: out_rec.fields['user'] = raw_rec['%u']19: out_rec.fields['server_status'] = int(raw_rec['%>s'])20: out_rec.fields['bytes'] = int(raw_rec['%b'])21: ...22: 23: self.writeRecord(out_rec)

Page 20: Data Integration with server side Mashups

Slide 20

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Programmatic access

• GUI is nice, but still limiting• SnapScript: An API library• Python, PHP, more to come

Page 21: Data Integration with server side Mashups

Slide 21

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Creating a resource

1: # Create a new resource2: staff_res_def = Resource(component='SnapLogic.Components.CsvRead')3: staff_res_def.props.URI = '/SnapLogic/Resources/Staff'4: staff_res_def.props.description = 'Read the from the employee file'5: staff_res_def.props.title = 'Staff'6: staff_res_def.props.delimiter = '$?{DELIMITER}'7: staff_res_def.props.filename = '$?{INPUTFILE}'8: staff_res_def.props.parameters = (9: ('INPUTFILE', Param.Required, ''),10: ('DELIMITER', Param.Optional, ',')11: )12: 13: # Define the output view of the resource14: staff_res_def.props.outputview.output1 = (15: ('Last_Name', 'string', 'Employee last name'),16: ('First_Name', 'string', 'Employee first Name'),17: ('Salary', 'number', 'Annual income')18: )

Page 22: Data Integration with server side Mashups

Slide 22

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Creating a pipeline

1: # Create a new pipeline2: p = Pipeline()3: p.props.URI = '/SnapLogic/Pipelines/empl_salary_inc'4: p.props.title = 'Employee_Salary_Increase'5: 6: # Select the resources in the pipeline7: p.resources.Staff = staff_res_def.instance()8: p.resources.PayRaise = increase_salary_res_def.instance()9: 10: # Link the resources in the pipeline11: link = (12: ('Last_Name', 'last'),13: ('First_Name', 'first'),14: ('Salary', 'salary')15: )16: p.linkViews('Staff', 'output1', 'Salary_Increaser', 'input1', link)

Page 23: Data Integration with server side Mashups

Slide 23

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

Pipeline parameters

1: # Define the user-visible parameters of the pipeline2: p.props.parameters = (3: ('INCREASE', Param.Required, ''),4: )5: 6: # Map values to the parameters of the pipeline's resources7: p.props.parammap = (8: (Param.Parameter, 'INCREASE', 'PayRaise', 'PERC_INCREASE'),9: (Param.Constant, 'file://foo/staff.csv', 'Staff', 'INPUTFILE')10: )11: 12: # Confirm correctness and publish as a new resource13: p.check()14: p.saveToServer(connection)

Page 24: Data Integration with server side Mashups

Slide 24

Data Integration with Server Side Mashups

OSDC 2007, Brisbane

The end

Any questions?

[email protected]