23
DATA SERIALIZATION WITH GOOGLE PROTOCOL BUFFERS By: William Kibira

Data Serialization Using Google Protocol Buffers

Embed Size (px)

DESCRIPTION

An easier and more flexible way to structure data for Platform and language independent transportation and storage

Citation preview

Page 1: Data Serialization Using Google Protocol Buffers

DATA SERIALIZATION WITH GOOGLE PROTOCOL BUFFERS

By: William Kibira

Page 2: Data Serialization Using Google Protocol Buffers

What is Data Serialization

● The process of translating a data structure and its object state into a format that can be stored in a memory buffer, file or transported on a network.

● End goal being that it can be reconstructed in another computer environment.

Page 3: Data Serialization Using Google Protocol Buffers

Reasons as To why We do this

● Persist Objects [Store and later Retrieve them]● Perform Remote Procedural Calls● Create Distributed Objects [Corba , JavaRMI,

ICE]

Page 4: Data Serialization Using Google Protocol Buffers

Key Words

● Computer Environment

- Programming Languages

- Operating Systems

- Architectures and processors

● Platform Independent Solutions

Page 5: Data Serialization Using Google Protocol Buffers

Popular Platform Independent Solutions

● JSON and XML● BSON and Binary XML● Google Protocol Buffer , Thrift , Avro

Ref

http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

Page 6: Data Serialization Using Google Protocol Buffers

JSON AND XML

● Most popular● Easily Human Readable to some extent● Most Web based APIs use it by default● Lots of generators for this stuff

Page 7: Data Serialization Using Google Protocol Buffers

How to works

● You write an IDL [Interface Description Language] . Kinda like CORBA IDLs but , much cleaner and more flexible.

● Pass it through a C++ based code generator● Get your Boiler plate code in a given language

you specified

Page 8: Data Serialization Using Google Protocol Buffers

GOOGLE PROTOCOL BUFFERS

● This is a platform independent language independent data serialization solution similar to XML in structure but much smaller in size and easier to structure .

● Been there since 2001 , made open in 2008

Page 9: Data Serialization Using Google Protocol Buffers

JSON BINARY FORMATS

● JSON is darn easy to read , If you can read binary , you definitely need to see a doctor.

● JSON [Gets fat even on little Data], Binary really compact{"deposit_money": "12345678"}

JSON BINARY

'0x6d', '0x6f', '0x6e', '0x01', '0xBC614E'

'0x65', '0x79', '0x31',

'0x32', '0x33', '0x34',

'0x35', '0x36', '0x37',

'0x38'

Page 10: Data Serialization Using Google Protocol Buffers

SPEED AT PARSING

● JSON is Fairly fast but , Binary is close to machine speed since it is readily parse-able.

Page 11: Data Serialization Using Google Protocol Buffers

FLOW

Schema / IDL

C++ Code Generator

C++ JAVA Python JavaScript

Server /Client application bases

Page 12: Data Serialization Using Google Protocol Buffers

What does a Schema Look Like ?

Page 13: Data Serialization Using Google Protocol Buffers

Howto Generate the Code

● Use the protobuffer compiler by specifying the language you want out put and the file.proto

● Protoc -I=/DIR_to_Schema/ --out_language=FOLDER_TO_Buffer/ DIR_TO_Schema/file.proto

Page 14: Data Serialization Using Google Protocol Buffers

A Look in my Terminal

Page 15: Data Serialization Using Google Protocol Buffers

What is Inside My XX.java

Page 16: Data Serialization Using Google Protocol Buffers

SIZE COMPARISON

RMI

GPB

JSON

XML

0 100 200 300 400 500 600 700 800 900 1000

905

250

559

836

Page 17: Data Serialization Using Google Protocol Buffers

Runtime Performance

Server CPU AVG Client CPU AVG Time

Protobuf 30.0% 37.75% 01:19:48

JSON 20.0% 75.00% 04:44:83

XML 12.00 80.75% 05:27:45

Page 18: Data Serialization Using Google Protocol Buffers

Versioning

● This is to do with backward compatibility between Protocol buffers that are old or new

● Old server new Client and Vice Versa

Even if a field has changed , the data will be parsed

Page 19: Data Serialization Using Google Protocol Buffers

Other Protocol Buffers

● MessagePack [.Net]● Thrift [Facebook]● Avro

Page 20: Data Serialization Using Google Protocol Buffers

Reasons To use Protocol Buffers

● They are smaller to push around over networks

● Easier [If Not easiest] to structure● Give a sense object oriented structuring

Page 21: Data Serialization Using Google Protocol Buffers

Reasons Not To use it

● Well, you will have to maintain both the server and clients .

● They may in most cases not be easy to learn● They are not an industry standard.● I am just trying to be fair here :)

Page 22: Data Serialization Using Google Protocol Buffers

SIMPLE DEMO CHAT APPS

● Simple chat application working on both desktops, laptops and Also on different Operating systems

● Partial Inspiration from the Fifth Estate

Page 23: Data Serialization Using Google Protocol Buffers

THE END

● Links to Check out

Google Protocol Buffers Main Page

https://developers.google.com/protocol-buffers/

● Apache Thrift

https://thrift.apache.org/