A FLEXIBLE PLUGIN-LIKE DATA
LAYER
Decouple your application logic from your data
Juan Soprano - [email protected] of Software Development
Pixable Inc.
What is Pixable?
Pixable in numbers ~5 million users 9 billion photos (~5 Terabytes) 35 new million photos a day 80 million categories 16 million writes/hour (~40GB/hour) 30 million reads/hour (~120GB/hour)
Logging and profiling
- 15k inserts/sec
Where can you find us?Pixable.com
Where can you find us?iPhone/iPod/iPad
Where can you find us?Android
Presentation In Pixable we have migrated from/to different data storage solutions.
To accomplish this, we've built a plugin-like data layer to allow complete separation between application code and data storage. In fact, our whole migration from MySQL to MongoDB was performed over this layer, helping us to move chunks of data little by little while learning how the system behaved under the new configuration. During the process, we managed to maintain duplicate copies in MySQL and Mongo for a while until the transition was complete. All of this happened in a way almost transparent to the application code, requiring very little changes in the code.
During this talk, we are going to show how we built this architecture and how easy is to integrate other data storages (memcached, S3, etc) on it. We will also share some tips that we've learned down the road and pros/cons of working under this schema.
Initial infrastructure LAMMP (Lynux-Apache-Memcache-MySQL-PHP)
FrontendBackend
API UserMySQL
DB
class user { $id; $first_name; $last_name; public function getUser($id) { $sql = ‘SELECT * FROM users WHERE id =’.$id; $userRS = db->fetchArray($sql); $user = $this->buildUser($userRS); return $user; }}
Issues encountered Limit on the DB connections to Master. Not able to hit the DB hard without generating
lag on the slave servers. Adding a field to an existing table with billions of
records would mean downtime of the App. Adding new DB servers was slow (in some
cases required downtime of the app) and high in server costs.
…so we needed a DB engine easy to grow, schema-less and low in server cost.
Solution found MongoDB
Has built-in shardingReplicaSet features automatic data clone, synchronization
and PRIMARY failover.Our data fits perfectly in the MongoDB document
paradigm.Schema-less.Easier to have many small machines, failures or
maintenances are less traumatic.Background index creation.
…now we needed a way to start migration without having downtime and data loss.
Implementing solution Migrating code from classes/functions with
SQL queries all around the project code to the new Flexible Plugin-like data layer.
FrontendBackendAPI
UserMySQL
DB
Mongo DB
MongoDB User DS
MySQL DB
MySQL User DS
FrontendBackend
User
User Data
Source
API
Implementing solution – Step 1
User Data Source (Plugin manager)
User Data
Source
• Gets the call from the backend for the user data source.
• Evaluates the conditions defined by us to see what Data Source to return and looks for the user in the correct data source.
• If the conditions for migrating are activated will migrate the user if its not migrated already to the new DB Engine.
• Will return the Data Source defined by the conditions above.
Implementing solution – Step 2
Building each DB engine pluginUser Data
Source
MySQL Plugin
MongoDB Plugin
… Plugin
Requirements:• All plugins have to implement the same set of public
methods/functions.• All have to reply in the exact same data structure and format.• All plugins constructors may accept as a parameter another plugin so
we can chain them together if needed.
Memcached Plugin
Implementing solution – Step 3
Moving all SQL queries from different classes methods/functions to the new Data Source infrastructure:
class user { $id; $first_name; $last_name; public function getUser($id) { $sql = ‘SELECT * FROM users WHERE id =’.$id; $userRS = db->fetchArray($sql); $user = $this->buildUser($userRS); return $user; }}
Old Class code:
class user { $id; $first_name; $last_name; public function getUser($id) { $uDS = UserDataSource::getUserDS($id); $userRS = $uDS->getUser(); $user = $this->buildUser($userRS); return $user; }}
New Class code:
Example 1Condition:• Read operation and found in Memcached• Write operation, writing in MySQL and MongoDB.
User Data
SourceBackend
Memcached Plugin
MySQL Plugin
MongoDB Plugin
MongoDBMySQL
DS
Read operation
Write operation
Example 2Condition:• Read and write to MySQL but use MongoDB as backup.
User Data
SourceBackend
Memcached Plugin
MySQL Plugin
MongoDB Plugin
MongoDBMySQL
DS
Read operation
Write operation
Example 3Condition:• Only new users should be migrated but use MongoDB as backup for all
read operations from existing users.
User Data
SourceBackend
Memcached Plugin
MySQL Plugin
MongoDB Plugin
MongoDBMySQL
DS
Read operation
Write operation
Conclusion Pros:
Separates your app’s code from the Data Storage engines languages.
Adding new Data Engines easily.Lets you balance the load generated to each Data Engine.As the company grows, a team can be dedicated to the
Data Plugins development and optimization, while other team can actually develop the application itself.
Cons:Your App will generate more queries to the Data Engines.You will have to write more lines of code when
implementing this plugins that when only using one Data Engine.
Final Recap
User Data
Source
MySQL Plugin
MongoDB Plugin
… Plugin
Memcached Plugin
App
Thanks for listening!
Questions?
Want to work in NY?
We’re hiring: pixable.com/jobs