IDML Deep Dive Strata

IDML Deep DiveData preparation without the painJon DaveyStrata 2015

Background1

Remember 2011?

That McKinsey whitepaper The world discovered Hadoop

The first Strata

Also in 2011

๏DataSift launched

๏Twitter firehose re-syndication

๏+ handful of other data sources

๏500-1000 lines of data preparation code per source

๏More data sources - More to build and maintain

๏Many people with an interest in how data is prepared - Support, Product, Solutions

๏Lots of problems to solve - Scaling, stability, training customers and new staff

Many stakeholders in data ingestion

๏Support - “Why can’t customer X see field Y?”

๏Data Science - “Is field A populated enough to be statistically significant?”

๏ Documentation - “What is the purpose of field A and how does it relate to field B?”

๏Test - “How do we measure the entropy in random IDs so we can be sure we aren’t losing data during de-duplication after redundancy?”

Engineering challenges

๏Detecting upstream schema changes

๏Supporting multiple data versions

๏Reducing boilerplate code

๏Software reusability

IDML (Ingestion Data Mapping Language)

๏Cleaner than a general purpose programming language

๏Readable by people who aren’t writing code every day

๏Wide range of features, extensible

What it does2

A sample preparation task: Sanitize scraped content

Data preparation can be verbose..

It’s simpler if you use something designed for it

IDML is designed for data preparation

Closer look at features3

Deeply nested structures (without NPEs)

Aliasing with coalesce

Wide range of validation and transform functions

It’s there or it’s not - No try..catch

Lenient but consistent

The runtime figures things out

Arrays are easy to work with

Filter things

In-place validation

Other features

๏Detects fields that have not been mapped, making it easy to find data that’s not understood

๏Generates metrics about why a rule failed

๏Uniform interface allows the same syntax for JSON and XML

Where it fits4

Multiple deployment patterns

๏Deployable as a standalone service

๏Usable as a library

๏ Kafka consumer

๏ MapReduce mapper

๏ NSQ consumer

๏ Amazon SQS consumer

๏Command line, including REPL

Performance

๏ It’s an interpreter so it’s noticeably slower than hand-written code in contrived benchmarks

๏ In real cases, IO has usually been the bottleneck

๏Unstructured data is inherently suboptimal - dynamic structures like JsonNode are backed with HashMaps and Trees

๏One day it might be faster. Runtimes can often be optimized in much smarter ways: Consider why Java is faster than C++ at virtual method calls

Open sourcing it soon

๏May be rebranded as Ptolemy

๏Support for JSON and XML (and SGML - don’t ask)

๏May improve any of these areas, depending on interest:

๏ Performance

๏ More input and output types

๏ More integration: Spark, Kinesis

๏Would you use it on your own projects? Would you help?

QUESTIONS?

THANK YOU!

IDML Deep Dive Strata

Documents

Strata - absolutephoneanddata.com.au dp5000 handset a… · • Strata CIX Programming Manual • Strata CIX My Phone Manager User Guide. Strata CIX DP5000-series Telephone User Guide

Job evaluation with the STRATA method - pwc.de · Job evaluation with the STRATA method STRATA provides structure Job evaluation with the STRATA method provides structure and hence

Strata schemes management act victoria presentation body corporate strata group

IDML 3MM FINAL

Get involved - PSMG Strata | Strata Management Sydney1. Understandi ng the basics 5 • Key concepts in strata 5 • Strata scheme 5 – Common property 5 – Unit entitlement 6 –

New Strata Briefing GLS 201015 - Universiti Teknologi …fght.utm.my/tlchoon/files/2016/01/New-Strata-Briefing-GLS24NOV15.pdf · - AMENDMENT OF STRATA TITLE ACT, - NEW STRATA MANAGEMENT

IDML File Format Specification · 3 IDML File Format SpeciThcation Contents Abstract ..... 12 Introduction ..... 12

Strata schemes management act victoria presentation dr strata management

Strata Reform | Landgate - Published May 2020...STA: Strata Titles Act 1985 as amended by the Strata Titles Amendment Act 2018. Scheme: a strata scheme or a survey-strata scheme. Consolidation

Strata Facts Chapter3 the Strata Plan

BC Strata Property Act - bazingahelp.zendesk.com · STRATA PROPERTY ACT [SBC 1998] ... 78 Acquisition of land by strata corporation ... Part 15 — Strata Plan Amendment and Amalgamation

RENTAL* STRATA RENTAL OR STRATA - Vancouver · 2020-08-06 · RENTAL* STRATA REQUIRED REQUIRED REQUIRED** RENTAL OR STRATA Development Options: General Building Codes Upgrades

Strata schemes management queensland presentation select strata

MDBS INTERACTIVE DATA MANIPULATION … · MDBS IDML manual-i. OVERVIEW-mdbs IDML manual when the IDML processor is executed, an IDML banner message appears on the console screen

to create IDML Discovery Books - Integrator < TDI Users · to create IDML Discovery Books using Tivoli Directory Integrator Written using TDI 7.1 FP1 Document version 1.2 Eddie Hartman,

Munchkin Gloom Sell Sheet 2015-07-15.idml

Strata complete tasmania strata title management

Strata schemes management queensland challenge strata mangement presentation

Strata Schemes Management Regulation 2016€¦ · Strata Schemes Management Regulation 2016 [NSW] Part 2 Owners corporations and strata committees Part 2 Owners corporations and strata

The Strata Titles Act 1985 The Strata Titles (Amendment