Bubbles – Virtual Data Objects

BubblesVirtual Data Objects

June 2013Stefan Urbanek

data brewery

Contents

■ Data Objects

■ Operations

■ Context

■ Stores

■ Pipeline

Brewery 1 Issues■ based on streaming data by records

buffering in python lists as python objects

■ stream networks were using threadshard to debug, performance penalty (GIL)

■ no use of native data operations

■ difficult to extend

Python framework for data processing and quality probing

Objective

focus on the process,not data technology

■ keep data in their original form

■ use native operations if possible

■ performance provided by technology

■ have other options

for categorical data

* you can do numerical too, but there are

plenty of other, better tools for that

Data Objects

data object represents structured data

Data do not have to be in its final form,

neither they have to exist. Promise of

providing data in the future is just fine.

Data are virtual.

virtual data object

fields

virtual data

SQL statement

iterator

idproductcategoryamountunit price

representations

Data Object

■ is defined by fields

■ has one or more representations

■ might be consumableone-time use objects such as streamed data

SQL statement

iterator

Fields

■ define structure of data object

■ storage metadatageneralized storage type, concrete storage type

■ usage metadatapurpose – analytical point of view, missing values, ...

100Atari 1040STcomputer10400.01985no

integerstringstringintegerfloatintegerstring

typelessnominalnominaldiscretemeasureordinalflag

idproductcategoryamountunit priceyearshipped

Field List

storage type

analytical type

(purpose)

sample metadata

SQL statement

iterator

SELECT *FROM productsWHERE price < 100

engine.execute(statement)

Representations

SQL statement that can be composed

actual rows fetched from database

Representations

■ represent actual data in some waySQL statement, CSV file, API query, iterator, ...

■ decided on runtimelist might be dynamic, based on metadata, availability, …

■ used for data object operationsfiltering, composition, transformation, …

Representations

SQL statement

iterator

natural, most efficient for operations

default, all-purpose, might be very expensive

Representations

>>> object.representations()[“sql_table”, “postgres+sql”, “sql”, “rows”]

data might have been

cached in a table

we might use PostgreSQL

dialect specific features...

… or fall back to

generic SQL

for all other

operations

Data Object Role

■ source: provides datavarious source representations such as rows()

■ target: consumes dataappend(row), append_from(object), ...

target.append_from(source)

for row in source.rows(): print(row)

implementation might

depend on source

Append From ...

Iterator SQL

target.append_from(source)

for row in source.rows(): INSERT INTO target (...)

SQLSQL

INSERT INTO target SELECT … FROM source

same engine

Operations

Operation

✽… ? ...

… ? ...… ? ...

… ? ...

does something useful with data object and produces another data object

or something else, also useful

Signature

@operation(“sql”)def sample(context, object, limit): ...

signature

accepted representation

SQL ✽ … ? ...iteratorSQL

@operation

@operation(“sql”)def sample(context, object, limit): ...

@operation(“sql”, “sql”)def new_rows(context, target, source): ...

@operation(“sql”, “rows”, name=“new_rows”)def new_rows_iter(context, target, source): ...

binary

binary with same name but different signature:

List of Objects

@operation(“sql[]”)def append(context, objects): ...

@operation(“rows[]”)def append(context, objects): ...

matches one of common representations of all objects in the list

Any / Default

@operation(“*”)def do_something(context, object): ...

default operation – if no signature matches

Context

SQL iterator

iterator

SQL iterator

Mongo ✽

collection of operations

Operation Call

context = Context()context.operation(“sample”)(source, 10)

sample sample

iterator ⇢SQL ⇢iteratorSQL

callable reference

runtime dispatch

sample

SQL ⇢

Simplified Call

context.operation(“sample”)(source, 10)

context.o.sample(source, 10)

Dispatch

SQL ✽iteratorSQL

iterator ✽iterator

MongoDB

operation is chosen based on signatureExample: we do not have this kind of operation

for MongoDB, so we use default iterator instead

Dispatch

dynamic dispatch of operations based on representations of argument objects

PrioritySQL ✽

iteratorSQL

iterator ✽SQL

iterator

order of representations mattersmight be decided during runtime

same representations,

different order

Incapable?

join details

same connection different connection

this fails

Retry!

iterator

iteratorSQL

join details join details

retry another

signature

raise RetryOperation(“rows”, “rows”)

if objects are not compose-able as

expected, operation might gently fail and

request a retry with another signature:

Retry when...

■ not able to compose objectsbecause of different connections or other reasons

■ not able to use representationas expected

■ any other reason

Modules

*just an example

collection of operations

SQL Iterator MongoDB

SQL iterator

iterator

SQL iterator

Mongo ✽

Extend Context

context.add_operations_from(obj)

any object that has operations as

attributes, such as module

Stores

Object Store

■ contains objectstables, files, collections, ...

■ objects are namedget_object(name)

■ might create objectscreate(name, replace, ...)

Object Store

store = open_store(“sql”, “postgres://localhost/data”)

store factory

Factories: sql, csv (directory), memory, ...

Stores and Objects

source = open_store(“sql”, “postgres://localhost/data”)target = open_store(“csv”, “./data/”)

source_obj = source.get_object(“products”)target_obj = target.create(“products”, fields=source_obj.fields)

for row in source_obj.rows(): target_obj.append(row)

target_obj.flush()

copy data from SQL table to CSV

Pipeline

SQLSQL SQL SQL

Iterator

sequence of operations on “trunk”

Pipeline Operations

stores = { “source”: open_store(“sql”, “postgres://localhost/data”) ”target” = open_store(“csv”, “./data/”)}

p = Pipeline(stores=stores)p.source(“source”, “products”)p.distinct(“color”)p.create(“target”, “product_colors”)

operations – first argument is

result from previous step

extract product colors to CSV

Pipeline

p.source(store, object_name, ...) store.get_object(...)

p.create(store, object_name, ...) store.create(...) store.append_from(...)

Operation Library

Filtering

■ row filtersfilter_by_value, filter_by_set, filter_by_range

■ field_filter (ctx, obj, keep=[], drop=[], rename={})

keep, drop, rename fields

■ sample (ctx, obj, value, mode)

first N, every Nth, random, …

Uniqueness

■ distinct (ctx, obj, key)

distinct values for key

■ distinct_rows (ctx, obj, key)

distinct whole rows (first occurence of a row) for key

■ count_duplicates (ctx, obj, key)

count number of duplicates for key

Master-detail

■ join_detail(ctx, master, detail, master_key, detail_key)

Joins detail table, such as a dimension, on a specified key. Detail key field will be dropped from the result.

Note: other join-based operations will be implemented

later, as they need some usability decisions to be made

Dimension Loading■ added_keys (ctx, dim, source, dim_key, source_key)

which keys in the source are new?

■ added_rows (ctx, dim, source, dim_key, source_key)

which rows in the source are new?

■ changed_rows (ctx, target, source, dim_key, source_key, fields, version_field)

which rows in the source have changed?

more to come…

Conclusion

■ consolidate representations API

■ define basic set of operations

■ temporaries and garbage collection

■ sequence objects for surrogate keys

Version 0.2

■ processing graphconnected nodes, like in Brewery

■ more basic backendsat least Mongo

■ bubbles command line tool

already in progress

Future

■ separate operation dispatcherwill allow custom dispatch policies

Contact:@Stiivi

stefan.urbanek@gmail.com

databrewery.org

Bubbles – Virtual Data Objects

Technology

From Living to Virtual: Learning from Museum Objects

A standardized set of 3-D objects for virtual reality

Virtual Objects

CTC Makes Reality Virtual...At Concurrent Technologies Corporation (CTC), we have the expertise and technology to convert your real-world objects into virtual objects, and make everything

Manual And Virtual Rotation MANUAL AND VIRTUAL ROTATIONcontrols the rotation of objects via motor commands and mental changes in the visuospatial representation of objects. This hypothesis

Enhancing integration of virtual objects in augmented reality

Resource Allocation Using Virtual Objects in the Internet ...dl.ifip.org/db/conf/icin/icin2016/1570224492.pdf · Resource Allocation Using Virtual Objects in the Internet of Things:

Electrify Your Events with Augmented and Virtual Reality...(VR) RealWorld Augmented Reality(AR) (Virtual Objects in RealWorld) Augmented Virtuality (Real Objects in Virtual World)

Virtual Reflections for Augmented Reality Environments · virtual objects on real world objects and shadows of real world objects on virtual objects. Naemura et al. [6] describe the

Financial Bubbles, Real Estate bubbles, Derivative Bubbles, and

Inheritance Initialization & Destruction of Derived Objects Protected Members Non-public Inheritance Virtual Function Implementation Virtual Destructors

Bubbles, Bubbles Everywhere!

Snake Charmer: Physically Enabling Virtual Objectskaran/papers/p218-araujo.pdfSnake Charmer: Physically Enabling Virtual Objects ABSTRACT Augmented and virtual reality have Virtual

Virtual Objects in Mobile Devices

Virtual Objects and Knowing

Providing Haptics to Walls & Heavy Objects in Virtual

STEREOSCOPIC VISION IN DESKTOP AUGMENTED ...tion of virtual and real-world 3D objects. Besides stereoscopy, our system implements other depth cues like shadows. All virtual objects,

CONTEXTUAL PROCESSING OF OBJECTS: USING VIRTUAL REALITY TO

Realistic Perspective Projections for Virtual Objects and

Virtual Intimate Objects: Final Presentation