15
Google Protocol Buffers (Overview) Sergey Podolsky [email protected]

Google Protocol Buffers

Embed Size (px)

Citation preview

Page 1: Google Protocol Buffers

Google Protocol Buffers(Overview)

Sergey [email protected]

Page 2: Google Protocol Buffers

History

Page 3: Google Protocol Buffers

Binary serialization - sucks• Language-specific• Not safe (see “Effective Java”)• Not extensible

Page 4: Google Protocol Buffers

XML, JSON – back to 1999• Too verbose• Need to parse• Slow performance• Huge size if not compressed• No strong types (int vs float)• Need to store field names

Page 5: Google Protocol Buffers

Google Protocol Buffers• Cross-language• Schema evolution• Compact• Strongly typed

Page 6: Google Protocol Buffers

MessagePerson.json Person.proto{ "userName": "Martin", "favouriteNumber": 1337, "interests": ["daydreaming", "hacking"]}

message Person { required string user_name = 1; optional int64 favourite_number = 2; repeated string interests = 3;}

Page 7: Google Protocol Buffers

Options• Outer class• Default value• Deprecated• Speed / Size• Custom options• desriptor.proto

Page 8: Google Protocol Buffers

Our protobuf use cases:• Java, C++, C#• Payload for ZeroC ICE• IBM MQ / Solace messages• DB raw data• Log messages to disk• Compress using TAR.GZIP• Show as XML / JSON• exe utility associated with protobuf files

Page 9: Google Protocol Buffers

Disadvantages• No Map<K, V> / Dictionary<K, V>• No Set<T>• No short / int16 / uint16• No interning• Generated classes are immutable• compiler vs library are not backwards compatible• descriptor.proto is not backwards compatible• Poor number of officially supported languages• Enum is not extensible (unknown resets to 0)

Page 10: Google Protocol Buffers
Page 11: Google Protocol Buffers

Apache Avro• No tag • Schema is required• The entire record is tagged by schema ID• Fields are matched by name• No optional values: union { null, long } is used instead• Resolution rules are used for server vs client schemas

Page 12: Google Protocol Buffers

Apache Avro

JSON Notation IDL{ "type": "record", "name": "Person", "fields": [ {"name": "userName", "type": "string"}, {"name": "favouriteNumber", "type": ["null", "long"]}, {"name": "interests", "type": {"type": "array", "items": "string"}} ]}

record Person { string userName; union { null, long } favouriteNumber; array<string> interests;}

Page 13: Google Protocol Buffers

Apache• “one-stop shop”• RPC framework• Different serialization formats (“protocols”)

Page 14: Google Protocol Buffers

Apachestruct Person { 1: string userName, 2: optional i64 favouriteNumber, 3: list<string> interests}

Page 15: Google Protocol Buffers

Comparison  Thrift Protobuf

Language Bindings Java, C++, Python, C#, Cocoa, Erlang, Haskell, OCaml, Perl, PHP, Ruby, Smalltalk Java, C++, Python

Primitive Types bool, byte, 16/32/64-bit integers, double, string, byte sequence, map<t1,t2>, list<t>, set<t>

bool, 32/64-bit integers, float, double, string, byte sequence, “repeated” properties act like lists

Enumerations Yes YesConstants Yes NoComposite Type Struct MessageException Handling Yes NoDocumentation Lacking GoodLicense Apache BSD-styleCompiler C++ C++RPC Interfaces Yes YesRPC Implementation Yes NoComposite Type Extensions No YesData Versioning Yes Yes

Pros- More languages supported out of the box- Richer data structures than Protobuf (e.g.: Map and Set)- Includes RPC implementation for services

- Slightly faster than Thrift when using "optimize_for = SPEED"- Serialized objects slightly smaller than Thrift due to more aggressive data compression- Better documentation- API a bit cleaner than Thrift

Cons - Good examples are hard to find - Missing/incomplete documentation

- .proto can define services, but no RPC implementation is defined (although stubs are generated for you).