77
Spring Batch Workshop Spring Batch Workshop Arnaud Cogolu` egnes Consultant with Zenika, Co-author Spring Batch in Action March 20, 2012

Spring Batch Workshop

  • Upload
    lyonjug

  • View
    5.931

  • Download
    1

Embed Size (px)

DESCRIPTION

Arnaud vous propose de découvrir le framework Spring Batch: du Hello World! jusqu'à l'exécution multi-threadée de batch, en passant par la lecture de fichiers CSV et la reprise sur erreur. Les techniques qu'utilise le framework pour lire et écrire efficacement de grands volumes de données e vous seront pas non plus épargnées ! La présentation se base sur une approche problème/solution, avec de nombreux exemples de code et des démos. A la suite de cette présentation, vous saurez si Spring Batch convient à vos problématiques et aurez toutes les cartes en mains pour l'intégrer à vos applications batch.

Citation preview

Page 1: Spring Batch Workshop

Spring Batch Workshop

Spring Batch Workshop

Arnaud Cogoluegnes

Consultant with Zenika, Co-author Spring Batch in Action

March 20, 2012

Page 2: Spring Batch Workshop

Spring Batch Workshop

OutlineOverview

IDE set up

Spring support in IDE

Spring Batch overview

Hello World

Chunk processing

Flat file reading

Skip

Dynamic job parameters

JDBC paging

Execution metadata

Scheduling

Item processor

Logging skipped items

Page 3: Spring Batch Workshop

Spring Batch Workshop

Overview

Overview

I This workshop highlights Spring Batch featuresI Problem/solution approach

I A few slides to cover the featureI A project to start from, just follow the TODOs

I PrerequisitesI Basics about Java and Java EEI Spring: dependency injection, enterprise support

I https://github.com/acogoluegnes/Spring-Batch-Workshop

Page 4: Spring Batch Workshop

Spring Batch Workshop

Overview

License

This work is licensed under the Creative CommonsAttribution-NonCommercial-ShareAlike 3.0 Unported License.To view a copy of this license, visithttp://creativecommons.org/licenses/by-nc-sa/3.0/ or send aletter to Creative Commons, 444 Castro Street, Suite 900,Mountain View, California, 94041, USA.

Page 5: Spring Batch Workshop

Spring Batch Workshop

IDE set up

Follow the TODOs

I Track the TODO in the *-start projects!

I It’s easier with support from the IDE

Page 6: Spring Batch Workshop

Spring Batch Workshop

IDE set up

TODO with EclipseI Window > Preferences > “tasks tag” in filter

Page 7: Spring Batch Workshop

Spring Batch Workshop

IDE set up

TODO with Eclipse

I Open the “Tasks” view

I click on the down arrow on the right

I “configure contents”

Page 8: Spring Batch Workshop

Spring Batch Workshop

IDE set up

TODO with Eclipse

I Check “TODOs” on the left

I Check “On any element in the same project” on the right(scope)

Page 9: Spring Batch Workshop

Spring Batch Workshop

Spring support in IDE

Spring support in IDE is a +

I e.g. code completion in SpringSource Tool Suite

?

Page 10: Spring Batch Workshop

Spring Batch Workshop

Spring support in IDE

Shortened package names

I Be careful, a package name can sometimes be shortened

<bean class="c.z.workshop.springbatch.HelloWorldTasklet" />

I Type the name of the class, code completion will do the rest...

<bean class="com.zenika.workshop.springbatch.HelloWorldTasklet" />

Page 11: Spring Batch Workshop

Spring Batch Workshop

Spring Batch overview

Basic features for batch applications

I Read – process – write large amounts of data, efficientlyI Ready-to-use components to read from/write to

I Flat/XML filesI Databases (JDBC, Hibernate, JPA, iBatis)I JMS queuesI Emails

I Numerous extension points/hooks

Page 12: Spring Batch Workshop

Spring Batch Workshop

Spring Batch overview

Advanced features for batch applications

I Configuration to skip/retry itemsI Execution metadata

I MonitoringI Restart after failure

I Scaling strategiesI Local/remoteI Partitioning, remote processing

Page 13: Spring Batch Workshop

Spring Batch Workshop

Hello World

I Problem: getting started with Spring Batch

I Solution: writing a simple “Hello World” job

Page 14: Spring Batch Workshop

Spring Batch Workshop

Hello World

Structure of a job

I A Spring Batch job is made of steps

I The Hello World job has one step

I The processing is implemented in a Tasklet

Page 15: Spring Batch Workshop

Spring Batch Workshop

Hello World

The Hello World Tasklet

public class HelloWorldTasklet implements Tasklet {

@Override

public RepeatStatus execute(

StepContribution contribution,

ChunkContext chunkContext) throws Exception {

System.out.println("Hello world!");

return RepeatStatus.FINISHED;

}

}

Page 16: Spring Batch Workshop

Spring Batch Workshop

Hello World

The configuration of the Hello World job

<batch:job id="helloWorldJob">

<batch:step id="helloWorldStep">

<batch:tasklet>

<bean class="c.z.workshop.springbatch.HelloWorldTasklet" />

</batch:tasklet>

</batch:step>

</batch:job>

I Notice the batch namespace

Page 17: Spring Batch Workshop

Spring Batch Workshop

Hello World

Spring Batch needs some infrastructure beans

I Let’s use the typical test configuration

<bean id="transactionManager"

class="o.s.b.support.transaction.ResourcelessTransactionManager"

/>

<bean id="jobRepository"

class="o.s.b.core.repository.support.MapJobRepositoryFactoryBean"

/>

<bean id="jobLauncher"

class="o.s.b.core.launch.support.SimpleJobLauncher">

<property name="jobRepository" ref="jobRepository" />

</bean>

Page 18: Spring Batch Workshop

Spring Batch Workshop

Hello World

Running the test in a JUnit test

@RunWith(SpringJUnit4ClassRunner.class)

@ContextConfiguration("/hello-world-job.xml")

public class HelloWorldJobTest {

@Autowired

private Job job;

@Autowired

private JobLauncher jobLauncher;

@Test public void helloWorld() throws Exception {

JobExecution execution = jobLauncher.run(job, new JobParameters());

assertEquals(ExitStatus.COMPLETED, execution.getExitStatus());

}

}

Page 19: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

I Problem: processing large amounts of data efficiently

I Solution: using chunk processing

Page 20: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

What is chunk processing?

I Batch jobs often read, process, and write itemsI e.g.

I Reading items from a fileI Then processing (converting) itemsI Writing items to a database

I Spring Batch calls this “chunk processing”

I a chunk = a set of items

Page 21: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

Chunk processing with Spring Batch

I Spring BatchI handles the iteration logicI uses a transaction for each chunkI lets you choose the chunk sizeI defines interfaces for each part of the processing

Page 22: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

The reading phase

I Spring Batch creates chunks of items by calling read()

I Reading ends when read() returns null

public interface ItemReader<T> {

T read() throws Exception, UnexpectedInputException,

ParseException, NonTransientResourceException;

}

Page 23: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

The processing phase

I Once a chunk is created, items are sent to the processor

I Optional

public interface ItemProcessor<I, O> {

O process(I item) throws Exception;

}

Page 24: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

The writing phase

I Receives all the items of the chunk

I Allows for batch update (more efficient)

public interface ItemWriter<T> {

void write(List<? extends T> items) throws Exception;

}

Page 25: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

An example

I Let’s implement a (too?) simple chunk-oriented step!

Page 26: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

The ItemReader

public class StringItemReader implements ItemReader<String> {

private List<String> list;

public StringItemReader() {

this.list = new ArrayList<String>(Arrays.asList(

"1","2","3","4","5","6","7")

);

}

@Override

public String read() throws Exception, UnexpectedInputException,

ParseException, NonTransientResourceException {

return !list.isEmpty() ? list.remove(0) : null;

}

}

Page 27: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

The ItemProcessor

public class StringItemProcessor

implements ItemProcessor<String, String> {

@Override

public String process(String item) throws Exception {

return "*** "+item+" ***";

}

}

Page 28: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

The ItemWriter

public class StringItemWriter implements ItemWriter<String> {

private static final Logger LOGGER =

LoggerFactory.getLogger(StringItemWriter.class);

@Override

public void write(List<? extends String> items) throws Exception {

for(String item : items) {

LOGGER.info("writing "+item);

}

}

}

Page 29: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

Configuring the job

<batch:job id="chunkProcessingJob">

<batch:step id="chunkProcessingStep">

<batch:tasklet>

<batch:chunk reader="reader" processor="processor"

writer="writer" commit-interval="3"

/>

</batch:tasklet>

</batch:step>

</batch:job>

<bean id="reader"

class="com.zenika.workshop.springbatch.StringItemReader" />

<bean id="processor"

class="com.zenika.workshop.springbatch.StringItemProcessor" />

<bean id="writer"

class="com.zenika.workshop.springbatch.StringItemWriter" />

Page 30: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

Considerations

I Do I always need to write myItemReader/Processor/Writer?

I No, Spring Batch provides ready-to-use components forcommon datastores

I Flat/XML files, databases, JMS, etc.

I As an application developer, youI Configure these componentsI Provides some logic (e.g. mapping a line with a domain object)

Page 31: Spring Batch Workshop

Spring Batch Workshop

Chunk processing

Going further...

I Reader/writer implementation for flat/XML files, database,JMS

I Skipping items when something goes wrong

I Listeners to react to the chunk processing

Page 32: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

I Problem: reading lines from a flat file and sending them toanother source (e.g. database)

I Solution: using the FlatFileItemReader

Page 33: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

Spring Batch’s support for flat file reading

I Spring Batch has built-in support for flat filesI Through the FlatFileItemReader for reading

I The FlatFileItemReader handles I/OI 2 main steps:

I Configuring the FlatFileItemReaderI Providing a line-to-object mapping strategy

Page 34: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

The usual suspects

Susy,Hauerstock,2010-03-04

De Anna,Raghunath,2010-03-04

Kiam,Whitehurst,2010-03-04

Alecia,Van Holst,2010-03-04

Hing,Senecal,2010-03-04

?

public class Contact {

private Long id;

private String firstname,lastname;

private Date birth;

(...)

}

Page 35: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

What do we need to read a flat file?

I How to tokenize a line

I How to map the line with a Java object

I Where to find the file to read

Page 36: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

The FlatFileItemReader configuration

<bean id="reader"

class="o.s.b.item.file.FlatFileItemReader">

<property name="lineMapper">

<bean class="o.s.b.item.file.mapping.DefaultLineMapper">

<property name="lineTokenizer">

<bean class="o.s.b.item.file.transform.DelimitedLineTokenizer">

<property name="names" value="firstname,lastname,birth" />

</bean>

</property>

<property name="fieldSetMapper">

<bean class="c.z.workshop.springbatch.ContactFieldSetMapper" />

</property>

</bean>

</property>

<property name="resource" value="classpath:contacts.txt" />

</bean>

Page 37: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

The FlatFileItemReader declaration

<bean id="reader"

class="o.s.b.item.file.FlatFileItemReader">

</bean>

Page 38: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

How to tokenize a line

<bean id="reader"

class="o.s.b.item.file.FlatFileItemReader">

<property name="lineMapper">

<bean class="o.s.b.item.file.mapping.DefaultLineMapper">

<property name="lineTokenizer">

<bean class="o.s.b.item.file.transform.DelimitedLineTokenizer">

<property name="names" value="firstname,lastname,birth" />

</bean>

</property>

</bean>

</property>

</bean>

Page 39: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

How to map the line with a Java object

<bean id="reader"

class="o.s.b.item.file.FlatFileItemReader">

<property name="lineMapper">

<bean class="o.s.b.item.file.mapping.DefaultLineMapper">

<property name="fieldSetMapper">

<bean class="c.z.workshop.springbatch.ContactFieldSetMapper" />

</property>

</bean>

</property>

</bean>

Page 40: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

Where to find the file to read

<bean id="reader"

class="o.s.b.item.file.FlatFileItemReader">

<property name="resource" value="classpath:contacts.txt" />

</bean>

Page 41: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

The FlatFileItemReader configuration

<bean id="reader"

class="o.s.b.item.file.FlatFileItemReader">

<property name="lineMapper">

<bean class="o.s.b.item.file.mapping.DefaultLineMapper">

<property name="lineTokenizer">

<bean class="o.s.b.item.file.transform.DelimitedLineTokenizer">

<property name="names" value="firstname,lastname,birth" />

</bean>

</property>

<property name="fieldSetMapper">

<bean class="c.z.workshop.springbatch.ContactFieldSetMapper" />

</property>

</bean>

</property>

<property name="resource" value="classpath:contacts.txt" />

</bean>

Page 42: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

The line-to-object mapping strategy

I A FieldSetMapper to map a line with an object

I More about business logic, so typically implemented bydeveloper

I Spring Batch provides straightforward implementations

Page 43: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

Custom FieldSetMapper implementation

package com.zenika.workshop.springbatch;

import org.springframework.batch.item.file.mapping.FieldSetMapper;

import org.springframework.batch.item.file.transform.FieldSet;

import org.springframework.validation.BindException;

public class ContactFieldSetMapper implements FieldSetMapper<Contact> {

@Override

public Contact mapFieldSet(FieldSet fieldSet) throws BindException {

return new Contact(

fieldSet.readString("firstname"),

fieldSet.readString("lastname"),

fieldSet.readDate("birth","yyyy-MM-dd")

);

}

}

Page 44: Spring Batch Workshop

Spring Batch Workshop

Flat file reading

Going further...

I FlatFileItemWriter to write flat file

I Fixed-length format (different tokenizer)

I Skipping badly formatted lines

Page 45: Spring Batch Workshop

Spring Batch Workshop

Skip

I Problem: my job fails miserably because of a tiny error in myinput file

I Solution: skipping lines without failing the whole execution

Page 46: Spring Batch Workshop

Spring Batch Workshop

Skip

A CSV file with a badly formatted line

I Alicia’s birth date doesn’t use the correct format:

Susy,Hauerstock,2010-03-04

De-Anna,Raghunath,2010-03-04

Kiam,Whitehurst,2010-03-04

Alecia,Van Holst,09-23-2010

Hing,Senecal,2010-03-04

Kannan,Pirkle,2010-03-04

Row,Maudrie,2010-03-04

Voort,Philbeck,2010-03-04

Page 47: Spring Batch Workshop

Spring Batch Workshop

Skip

Skip configuration

I Choose the exceptions to skip

I Set the max number of items to skip

<batch:job id="skipJob">

<batch:step id="skipStep">

<batch:tasklet>

<batch:chunk reader="reader" writer="writer" commit-interval="3"

skip-limit="10">

<batch:skippable-exception-classes>

<batch:include

class="o.s.b.item.file.FlatFileParseException"/>

</batch:skippable-exception-classes>

</batch:chunk>

</batch:tasklet>

</batch:step>

</batch:job>

Page 48: Spring Batch Workshop

Spring Batch Workshop

Skip

Going further...

I Logging skipped items with a SkipListener

I Setting a custom SkipPolicy

Page 49: Spring Batch Workshop

Spring Batch Workshop

Dynamic job parameters

I Problem: passing values to the configuration when launchinga job

I Solution: using job parameters and late binding

Page 50: Spring Batch Workshop

Spring Batch Workshop

Dynamic job parameters

Dynamically providing an input file to the item reader

I Launching the job with an input.file parameter:

JobParameters jobParameters = new JobParametersBuilder()

.addString("input.file", "file:./input/contacts-01.txt")

.toJobParameters();

JobExecution execution = jobLauncher.run(job, jobParameters);

I Referring to the input.file parameter in the configuration:

<bean id="reader"

class="org.springframework.batch.item.file.FlatFileItemReader"

scope="step">

<property name="resource" value="#{jobParameters[’input.file’]}" />

(...)

</bean>

Page 51: Spring Batch Workshop

Spring Batch Workshop

Dynamic job parameters

Going further...

I Spring Expression Language (SpEL)

I Step scope for partitioning

Page 52: Spring Batch Workshop

Spring Batch Workshop

JDBC paging

I Problem: reading large result sets from the database with astable memory footprint

I Solution: using the JdbcPagingItemReader, which usespaging to handle large result sets

Page 53: Spring Batch Workshop

Spring Batch Workshop

JDBC paging

JdbcPagingItemReader configuration

<bean id="reader"

class="o.s.b.item.database.JdbcPagingItemReader">

<property name="dataSource" ref="dataSource" />

<property name="pageSize" value="10" />

<property name="queryProvider">

<bean class="o.s.b.i.d.support.SqlPagingQueryProviderFactoryBean">

<property name="dataSource" ref="dataSource" />

<property name="selectClause"

value="select id,firstname,lastname,birth" />

<property name="fromClause" value="from contact" />

<property name="sortKey" value="id" />

</bean>

</property>

<property name="rowMapper">

<bean class="com.zenika.workshop.springbatch.ContactRowMapper" />

</property>

</bean>

Page 54: Spring Batch Workshop

Spring Batch Workshop

JDBC paging

Paging or cursors?

I By paging, you send multiple queries to the databaseI Alternative: cursor-based item reader

I Spring Batch “streams” the result set from the DBI Only one query

I Paging always works, cursor-based reader depends on driverimplementation and database

Page 55: Spring Batch Workshop

Spring Batch Workshop

JDBC paging

Going further...

I Paging readers for Hibernate, JPA, iBatis

I Cursor-based readers

Page 56: Spring Batch Workshop

Spring Batch Workshop

Execution metadata

I Problem: monitoring the execution of batch jobs

I Solution: letting Spring Batch storing execution metadata in adatabase

Page 57: Spring Batch Workshop

Spring Batch Workshop

Execution metadata

Why storing execution metadata?

I Spring Batch keeps track of batch executionI Enables:

I Monitoring by querying metadata tablesI Restarting after a failure

Page 58: Spring Batch Workshop

Spring Batch Workshop

Execution metadata

Job, job instance, and job execution

Page 59: Spring Batch Workshop

Spring Batch Workshop

Execution metadata

Job instance

I How to define a job instance?

I Thanks to job parameters

I Job parameters define the identity of the job instance

Page 60: Spring Batch Workshop

Spring Batch Workshop

Execution metadata

Where is the metadata stored?

I Metadata are stored in a databaseI In-memory implementation for test/development

I Monitoring tools can query metadata tables

Page 61: Spring Batch Workshop

Spring Batch Workshop

Execution metadata

Going further...

I Spring Batch Admin, the web console for Spring Batch

I JobExplorer and JobOperator interfaces

I Spring JMX support

Page 62: Spring Batch Workshop

Spring Batch Workshop

Scheduling

I Problem: scheduling a job to execute periodically

I Solution: using the scheduling support in Spring

Page 63: Spring Batch Workshop

Spring Batch Workshop

Scheduling

A class to launch the job

public class ImportLauncher {

public void launch() throws Exception {

JobExecution exec = jobLauncher.run(

job,

new JobParametersBuilder()

.addLong("time", System.currentTimeMillis())

.toJobParameters()

);

}

}

Page 64: Spring Batch Workshop

Spring Batch Workshop

Scheduling

Spring scheduling configuration

<bean id="importLauncher"

class="com.zenika.workshop.springbatch.ImportLauncher" />

<task:scheduled-tasks>

<task:scheduled ref="importLauncher" method="launch"

fixed-delay="1000" />

</task:scheduled-tasks>

I cron attribute available

Page 65: Spring Batch Workshop

Spring Batch Workshop

Scheduling

Going further...

I Threading settings in Spring Scheduler

I Spring support for Quartz

Page 66: Spring Batch Workshop

Spring Batch Workshop

Item processor

I Problem: I want to add some business logic before writing theitems I just read

I Solution: use an ItemProcessor to process/convert readitems before sending them to the ItemWriter

Page 67: Spring Batch Workshop

Spring Batch Workshop

Item processor

Use case

I Reading contacts from a flat fileI Registering them into the system

I This is the business logic

I Writing the registration confirmations to the database

Page 68: Spring Batch Workshop

Spring Batch Workshop

Item processor

The ItemProcessor interface

public interface ItemProcessor<I, O> {

O process(I item) throws Exception;

}

Page 69: Spring Batch Workshop

Spring Batch Workshop

Item processor

How to implement an ItemProcessor

I An ItemProcessor usually delegates to existing business code

public class ContactItemProcessor implements

ItemProcessor<Contact, RegistrationConfirmation> {

private RegistrationService registrationService;

@Override

public RegistrationConfirmation process(Contact item)

throws Exception {

return registrationService.process(item);

}

}

Page 70: Spring Batch Workshop

Spring Batch Workshop

Item processor

Registering the ItemProcessor

<batch:job id="itemProcessorJob">

<batch:step id="itemProcessorStep">

<batch:tasklet>

<batch:chunk reader="reader" processor="processor"

writer="writer" commit-interval="3"/>

</batch:tasklet>

</batch:step>

</batch:job>

<bean id="registrationService"

class="com.zenika.workshop.springbatch.RegistrationService" />

<bean id="processor"

class="com.zenika.workshop.springbatch.ContactItemProcessor">

<property name="registrationService" ref="registrationService" />

</bean>

Page 71: Spring Batch Workshop

Spring Batch Workshop

Item processor

Going further...

I Available ItemProcessor implementationsI Adapter, validator

I The ItemProcessor can filter items

Page 72: Spring Batch Workshop

Spring Batch Workshop

Logging skipped items

I Problem: logging skipped items

I Solution: using a SkipListener

Page 73: Spring Batch Workshop

Spring Batch Workshop

Logging skipped items

2 steps to log skipped items

I Writing the SkipListener (and the logging code)

I Registering the listener on the step

Page 74: Spring Batch Workshop

Spring Batch Workshop

Logging skipped items

Writing the SkipListener

package com.zenika.workshop.springbatch;

import org.slf4j.Logger;

import org.slf4j.LoggerFactory;

import org.springframework.batch.core.listener.SkipListenerSupport;

public class Slf4jSkipListener<T,S> extends SkipListenerSupport<T, S> {

private static final Logger LOG = LoggerFactory.getLogger(

Slf4jSkipListener.class);

@Override

public void onSkipInRead(Throwable t) {

LOG.warn("skipped item: {}",t.toString());

}

}

Page 75: Spring Batch Workshop

Spring Batch Workshop

Logging skipped items

Registering the SkipListener

<batch:job id="loggingSkippedItemsJob">

<batch:step id="loggingSkippedItemsStep">

<batch:tasklet>

<batch:chunk reader="reader" writer="writer" commit-interval="3"

skip-limit="10">

<batch:skippable-exception-classes>

<batch:include

class="o.s.b.item.file.FlatFileParseException"/>

</batch:skippable-exception-classes>

</batch:chunk>

<batch:listeners>

<batch:listener ref="skipListener" />

</batch:listeners>

</batch:tasklet>

</batch:step>

</batch:job>

<bean id="skipListener" class="c.z.w.springbatch.Slf4jSkipListener" />

Page 76: Spring Batch Workshop

Spring Batch Workshop

Logging skipped items

Going further...

I Other listeners in Spring BatchI ChunkListener, Item(Read/Process/Write)Listener,

ItemStream, StepExecutionListener,JobExecutionListener

Page 77: Spring Batch Workshop

Spring Batch Workshop

Logging skipped items

This is the end... or not!

I Check out the advanced workshop!

I See you and thanks!