Multithreaded XML Import (San Francisco Magento Meetup)

Preview:

DESCRIPTION

Author: Fabrizio Branca Date: 2013-10-23

Citation preview

XML Import Multithreaded

…for Magento

San Francisco Magento Meetup Group - October 23, 2013

Fabrizio Branca Lead System Developer at

E-Commerce: Magento

CMS: TYPO3

Portals: ZF, FLOW,…

Mobile Searchperience: SOLR

120 people in 7 offices world-wide

High Performance

/Scale

Global Enterprise Projects

Aoe_Import github.com/AOEmedia/Aoe_Import

git clone --recursive …

YES, of course!

Well, maybe…

Will Aoe_Import be the fastest product importer around?

Actually, Aoe_Import is only a XML Importer “Framework”. It’s

up to you to decide how to handle the xml snippets…

Aoe_Import

XML! Not CSV.

for large XML files

full flexibility in processor implementation

Stream processing (XMLReader)

“event” driven Subscribe your

“Processors” to xpaths

multi-thread support!

Problem

Memory limit

mem

ory

time

single product

Trivial Solution

Memory limit

mem

ory

time

Memory limit

mem

ory

time

Beat the memory Leak by forking

Waiting for other

thread to terminate

Threading

overhead

Process

import

Forking? In PHP?

$pid = pcntl_fork(); if ($pid) { // parent process runs what is here echo "parent\n"; } else { // child process runs what is here echo "child\n"; }

Threadi github.com/AOEmedia/Threadi

Threadi

Clean OOP interface for PHP to forking and process management

Batch Processor Collect a bunch of imports …

…fork… …and process them in

a child process.

Waiting for other thread to terminate

Threading

overhead

Process imports in process collection

Create process

collection

Memory limit

mem

ory

time

Main thread

Forks

No imports are processed in the main thread.

So there’s no memory leak happing here

Every fork starts with the low

memory footprint of the main thread

Find the number of imports

that can be processed at a

time without hitting the memory limit

Multi-threading? Sure!

Number of items in a batch

Number of

threads

processed

in parallel

Problems? Database Connection

Mage::getSingleton('core/resource') ->getConnection('core_write') ->closeConnection();

Database connection

doesn’t like to be cloned!

Problems? Thread Safety

Problems? Thread Safety

--- a/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php +++ b/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php @@ -326,7 +326,7 @@ class Enterprise_Catalog_Model_Index_Action_Catalog_Category_Product_Refresh ->setComment('Catalog Category Product Index Tmp'); $this->_connection->dropTable($this->_getMainTmpTable()); - $this->_connection->createTable($table); + $this->_connection->createTemporaryTable($table); } /**

Other Use-Cases?

Queue processing

Scheduler

Indexes

Everything that’s batchable

Thank you! Any questions?

http://www.aoemedia.com

http://www.fabrizio-branca.de

@fbrnc Follow me on twitter!

My blog