28
Architecture of Wemlin Hub 22 December 2013 – Ognen Ivanovski & Goran Cvetkoski

Architecture of Wemlin Hub

Embed Size (px)

DESCRIPTION

Presentation about the architecture of Wemlin Hub, part of the Wemlin product.

Citation preview

Page 1: Architecture of Wemlin Hub

Architecture of Wemlin Hub

22 December 2013 – Ognen Ivanovski & Goran Cvetkoski

Page 2: Architecture of Wemlin Hub

Netcetera | 2

Wemlin provides access to public transport information – easy, fast and independent of time and place

What’s Wemlin?

iOS Android Windows Phone Web

Page 3: Architecture of Wemlin Hub

Netcetera | 3

Wemlin Hub in a nutshell

Page 4: Architecture of Wemlin Hub

Netcetera | 4

Wemlin Hub – non-functional requirements

- Wemlin Hub shall be a high performance parallelized message processing system

- low latency – good processing speed- throughput – number of messages we can process - available - zero downtime!- good disaster recovery- scalability (horizontal, vertical)- modular – component based- extensible – 80% usage pattern- flexible – adapt to any infrastructure with minimal effort

Page 5: Architecture of Wemlin Hub

Netcetera | 5

What is Software Architecture?

?

Page 6: Architecture of Wemlin Hub

Netcetera | 6

Is this Software Architecture?

Page 7: Architecture of Wemlin Hub

Netcetera | 7

Is this Software Architecture?

Page 8: Architecture of Wemlin Hub

Netcetera | 8

Software Architecture is…

The decisions about software that are hard to change

E.g. Use of Jodatime vs. java.util.Date What kind of Database will you use GWT vs. Angular JS

Encapsulation

Page 9: Architecture of Wemlin Hub

Netcetera | 9

Wemin Hub Architecture

Model Pure Java (no dependencies) Well defined extensible classes Immutable (like Jodatime, every

modification produces a new object) Algebraic Inverse References

Pipeline Compositional (All components are

wired together using a fixed set of well defined interfaces)- Filter (stateless, function)- Transformer (stateless, function)- Aggregator (stateful, function)- Sink (consumer)- Tap (producer)

Page 10: Architecture of Wemlin Hub

Netcetera | 10

Pure Model

- no external dependency- design not influenced by any technology e.g. Hibernate, RDBMS, MVC

Page 11: Architecture of Wemlin Hub

Netcetera | 11

Algebraic, Immutable Model

- algebraic: each object identity is defined by it's contents.

- immutable: each object, once created, cannot be modified. Each part of the pipeline must copy-and-modify each object to perform it’s processing. This characteristic enables easier reasoning about concurrency.

- metadata: classes support arbitrary metadata expressed as key-value pairs. Metadata does not take part in the definition of the object's identity. This allows encoding of format specific information in the model, which can be used in the pipeline.

Page 12: Architecture of Wemlin Hub

Netcetera | 12

Compositional

pipeElement = PipelineBuilder.from(gtfsInputJunction()) .transform(new StoppingPlaceResolver()) .filter(new InvalidStopsFilter()) .transform(new UnresolvedLineVehicleTypeAdder()) .transform(new UnresolvedLineNameAdjuster()) .transform(new LineResolver()) .transform( new LineColorsEnricher(…)) .aggregate(new CacheAggregator(cache())) .to(nullSink());

Page 13: Architecture of Wemlin Hub

Netcetera | 13filter, transform are referentially transparent

Pipeline: Stateless, Functions

boolean accept(Object obj);

Object transform(Object original);

Optional<?> aggregate(Object obj);

Page 14: Architecture of Wemlin Hub

Netcetera | 14

Pipeline: Implementation

Page 15: Architecture of Wemlin Hub

Netcetera | 15

Pipeline: Typical Hub

Page 16: Architecture of Wemlin Hub

Netcetera | 16

Memoization

Immutable Model: each object, once created, cannot be modified.

Problem: Big memory consumption, a lot of objects are created with the same contents

Solution: Memoization

When a factory method is executed, for example:

Station.get("1", "St. Gallen, Bahnhof");

a global cache of objects in searched if a object with the specified data already exists.

If object exists, it is returned, if not, new object is created and stored into cache for future use.

Page 17: Architecture of Wemlin Hub

Netcetera | 17

Memoization (2)

Implementation:

Constructors are made private and replaced by annotated factory methods:

@DesignatedFactoryMethodpublic static Station get(Map<String, ? extends Serializable> attributes, String referenceId, String name, String localName, String place, GeoPoint location) { return new Station(attributes, referenceId, name, localName, place, location);}

Page 18: Architecture of Wemlin Hub

Netcetera | 18

Memoization (3)

@Pointcut("execution(@com.wemlin.hub.memoization.annotations.DesignatedFactoryMethod * *(..))")public void designatedFactoryMethodPointcut() {}

@Around("designatedFactoryMethodPointcut()")public Object handleMemoize(final ProceedingJoinPoint pjp) throws Exception { // search for object with the specified factory parameters in cache}}

Impact: 500 to 1000 times less objects created (depends on how much data is processed)

Page 19: Architecture of Wemlin Hub

Netcetera | 19

Modularization – components architectural constraints- The following architectural constrains define a module in Wemlin Hub:

- it is a maven module, a jar or web fragment- is not allowed to use spring annotations for injection, i.e. all injection is done

via constructors- a module provides components, only a few (up to 4) and facades for the 80%

usage pattern- each component is allowed to hook into the wemlin pipeline only through

the predefined pipeline interfaces Filter, Transformer, Aggregator, input component, output component

- components are Spring independent, as far as it is possible. They may implement some spring interfaces, but as few as possible, and provide means to achieve the same functionality without Spring.

Page 20: Architecture of Wemlin Hub

Netcetera | 20

Station resolver module (1)- The Wemlin Hub Station resolver module does station resolving with help of the reference

stations list- The reference station list contains all CH stations listed in the “Stationsnamen Fahrplan

und Antragsformular für Mutationen“ http://www.bav.admin.ch/dokumentation/publikationen/00475/01497/index.html

- We use the reference station list primarily to match references to stations in incoming data (HAFAS, VDV, GTFS) to known stations for which we fully control the names, have the coordinates and other meta-data that can be associated with them.

- The list is well defined JSON file that lists- attributes of the stations (full name, local name, place, coordinates, agency etc)- their respective referenceId- a set of rules that may be used to match the station in incoming data- connection areas

Page 21: Architecture of Wemlin Hub

Netcetera | 21

Station Resolver (2)- I Every station is resolved by a set of rules:

- General rules- Station specific rules

- General rules- station id (optimal)- station similarity – we use Apache Lucene for search of name similarity in combination

with coordinates distance tolerance

- Station specific rules- matchByRegex – when the station has different id from the one we have, but also the

name we get is slightly different

Example: Bahnhof, Esslingen:

Esslingen Bhf, Esslingen Bhf., Esslingen etc

Page 22: Architecture of Wemlin Hub

Netcetera | 22

Cache- we don’t use a DB:

- all data is time bounded i.e. all data we keep is temporary (daily)

- our choice was an in-memory cache- very simple java maps cache- no third party cache libraries are involved

- the cache is easy to browse via the cache browser component- few implementations available:

- standard cache browser- forwarding cache browser- lazy loading cache browser

- all cache browsers can define filters

Page 23: Architecture of Wemlin Hub

Netcetera | 23

REST- all data that is in the cache is available via the REST api

- there are two versions of the api available:- legacy api (V0): wemlin clients still operate with this one- V2 api according to the new transport.opendata.ch specification – some of the

customers started using it

- both apis support pretty much the same things:- locations listings – cities, stations- lines listing – all lines that operate within a network- trips – for a given period of time- departures – with realtime prognosis

Page 24: Architecture of Wemlin Hub

Netcetera | 24Illustration

Microkernel architectural style

Page 25: Architecture of Wemlin Hub

Netcetera | 25

Wemlin Hub - Blue-Green deployment

- Requirement: ensure zero downtime of the system.

- Decision: blue-green deployment- two identical production environments- the reverse proxy before the machines resolves to one of them depending on

which one is configured active- test on the “idle” server before go-live- switch proxy to the tested instance – the other one is now “idle”

- Advantages:- ensure zero downtime of the system- easy rollback if anything goes wrong

Page 26: Architecture of Wemlin Hub

Netcetera | 26

Wemlin Hub today- currently 4 customers (all in Switzerland)

- they cover nearly 40% of the transport in all Switzerland including Liechtenstein- 17 agencies, 14 with realtime data

- daily processing load- around 33’000 trips, 571’000 stops- 2’500 projections per second in peak hours (in the moment, not the actual

capacity of the system)

- offline transport data conversion - contract with Google for Switzerland- we convert the Swiss yearly transport schedule (over 400 agencies, ~1GB data)

to GTFS (Google Transit Feed Specification) format for Google Maps usage- conversion takes ~20min

Page 27: Architecture of Wemlin Hub

Netcetera | 27

Goran Cvetkoski, senior software engineer, Netcetera

[email protected]

[email protected]

Ognen Ivanovski, chief architect, Netcetera

Contact

Page 28: Architecture of Wemlin Hub

Netcetera | 28

Q&A?