50
Vinicius Carvalho - Advisory Platform Architect @Pivotal @vccarvalho http://github.com/viniciusccarvalho/ Schema Evolution for Data Microservices

Schema Evolution for Resilient Data microservices

Embed Size (px)

Citation preview

Vinicius Carvalho - Advisory Platform Architect @Pivotal @vccarvalhohttp://github.com/viniciusccarvalho/

Schema Evolution for Data Microservices

The way we build software

Servlet JSP Struts JSF Spring GWT Angular

timeCustomer

DB

JSONXMLJava Serialization

Format evolution

What happens when data evolves?

How do we handle versioning?

Monolithic architectures

Account

User

Product

Order

Common jar

<dependency> <groupId>com.acme</groupId>

<artifactId>common-domain</artifactId> <version>1.1.3</version>

</dependency>

RECOMMENDATION

SEARCHCATALOG

Updates?

1.1.3 1.1.1

0.9.5

Enterprise Service Bus

RECOMMENDATION

SEARCH CATALOGCanonical Message Format

Haven’t we solved this already?

Yes, But …

• Majority of ESB systems uses XML as the Canonical Model

• XML is good for structure, but it has no notion of evolution

• It’s heavy

And then there’s this µService thing

Bounded Contexts

Contexts Maps

Aggregates

Value Objects

Anti corruption Layer

Ownership

Sam Newman’s Building Microservices

Who owns this?

MSDN CQRS Journey ebook

CQRS?

Data evolution…

• Data evolution is a hard problem to grasp

• Even in known territories such as traditional RDBMS is a hard problem to tackle

Schema Evolution

“The problem of evolving a data schema to adapt it to a change in modeled reality”

Services Evolution

behavioral

structural

New functions are added to the system

Information model changes over time

Backward compatibility

• Newer version can read old version

• Challenges:

Field renaming

V1

V2

Forward compatibility

• Older version can read new version

• Challenges:

Field renaming

Field removal

V1

V2

Request / Response

RECOMMENDATIONUser

GET /v1/…

RECOMMENDATIONUser

GET /v2/…

Data Streaming

▪@EnableBindings(Source.class) ▪one output

▪@EnableBindings(Sink.class) ▪one input

▪@EnableBinding(Processor.class) ▪one input and one output

▪@EnableBinding(MyOrderHandler.class) ▪custom interfaces with as many inputs and outputs

▪@EnableRxJavaProcessor ▪OOTB support for RxJava with one input and one output

@Enable All the things

Binder SPI

FormatsChoosing the right one

Structure

Adaptability

Guarantees a contract between users of the

model

How flexible the format is for changes

on it’s structure

BenchmarkingBecause … we love it

Format Structure Adaptability

CSV Positional, no type definition

Possible if appending new columns

XMLflexible,

strong typedAppend, remove only via version (no standard)

supports defaults

JSON flexible, untyped

Append and remove are handled by parser

no support for defaults

Avro flexible, strong typed

Append, Removal supports defaults

Version is built in

Payload

public class Sensor { private String id; private float temperature; private float velocity; private float acceleration; private float[] accelerometer; private float[] magneticField; private float[] orientation; }

How much do you weight?

0

150

300

450

600

Payload Size (bytes)

514

237

93

Avro JSON XML

0

1000

2000

3000

4000

Read Write

3,4833,433

1,3001,333

Avro JSON

How fast?

Features

✓Compact

✓Strongly typed

✓Adaptable

✓Versioned

✓Centralized

ImplementationFinally we get to see something concrete

Spring Cloud Stream

Source

spring: cloud: stream: bindings: output: destination: sensor-topic contentType: “avro/binary”

WORK

IN

PROG

RESS

Activates the converter

Avro Converter• Scans the classpath for *.avsc files and register them

• During writes, infer the schema from payload (SpecificDatum, GenericDatum, Reflection)

• During reads uses message headers to discover the schema being used

Source Sink

Content-Type: avro/binaryX-Schema-Name: acme.User

Headers

Avro Converter

- Demo -

Avro Converter

Good, but …

Avro Converter• Each component still needs the avsc file

• Avro versioning only works if both writer and reader schemas are available

• Transmitting the schema with the message is an overhead

Schema registry• Centralized store for schemas

• Idempotent registration (same schema payload always return the same id)

• Compatibility test

• Schema utilization

Idempotent registration

Schema Registry

user

user

user

version: 1

user

user

version: 2

Schema registry• Allows developers to check if new schemas can break existing ones in the registry

• BACKWARD: new schema can read old versions

• FORWARD: Old schema can read new version

• FULL: BACKWARD + FORWARD

Schema utilization

{ "registrations" : [ {"application-name":"user-producer", "type" : "source" }, {"application-name":"user-enricher", "type" : "processor" }, {"application-name":"user-filter", "type" : "processor" } ] }

GET /schemas/user/{version}

Schema registry

- Demo -

Sink

Content-Type: avro/binaryX-Schema-Id: 17

Headers

Writer’s schema

spring: cloud: stream: bindings: input: destination: sensor-topic schema: “org.acme.Sensor”

Reader’s schema

Source Processor Sink

1. Register and obtain schema id

Payload

2. Reads headers fetch writer’s

schema

Schema Registry

Stream

Content-Type: avro/binaryX-Schema-Id: 17

Headers

References• Martin Kleppmann Schema Evolution in avro, thrift and protobufers: https://

martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html

• http://dataintensive.net/ - Martin Kleppmann

• The CQRS Journey: https://msdn.microsoft.com/en-us/library/jj554200.aspx

• Oracle Datastore schema evolution : https://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/schemaevolution.html

• Building Microservices by Sam Newman: http://samnewman.io/books/building_microservices/

• Apache Avro: https://avro.apache.org/docs/1.7.7/gettingstartedjava.html

• https://github.com/viniciusccarvalho/schema-evolution-samples