Magento Code Audit
Magento Expert Consultant Group Oleksandr Zarichnyi, Vitaliy Stepanenko
• Issues detected in code
• How we conduct code audit
• Value code audit brings to the table
Will talk about
What is code audit?
Projects
Health Check
Upgrade Analysis
Before Launch Check
Crash Investigation
Experience
50+ projects
6670474 LOC
74396 classes
290594 methods
45860 issues
Issues
Issue 1
throw new Exception( "Cannot find product " + $this->getSku() );
throw new Exception( "Cannot find product " . $this->getSku() );
Issue 1
protected function _revertById($id, $amount = 0) { $giftCard = Mage::getModel('giftcard/giftcard') ->load($id); if ($giftCard) { $giftCard->revert($amount) ->unsOrder() ->save(); } return $this; }
Issue 2
Expression is Always True
Issue 2
protected function _revertById($id, $amount = 0) { $giftCard = Mage::getModel('giftcard/giftcard') ->load($id); if ($giftCard->getId()) { $giftCard->revert($amount) ->unsOrder() ->save(); } return $this; }
for ($i = 0; $i < count($data); $i++) { //.. }
Issue 3
Issue 3
$count = count($data); for ($i = 0; $i < $count; $i++) { //.. }
Issue 4
public function getRandomProduct() { $collection = Mage::getModel('catalog/product') ->getCollection() ->addStoreFilter() ->getSelect() ->order('RAND()'); return $collection->getFirstItem(); }
Fetching More Than Necessary
Issue 4
public function getRandomProduct() { $collection = Mage::getModel('catalog/product') ->getCollection() ->addStoreFilter() ->getSelect() ->limit(1) ->order('RAND()'); return $collection->getFirstItem(); }
Code Smell
FIXME
TO DO
HA CK
Axe Effect
cwe.mitre.org
250 internally mined common entries + 200 entries from other sources ECG
• Template for issue description • Catalog of 400 entries
applicable for PHP and Magento code
Describing Issues
Name
Description
Recommendation
Level of Effort
Priority
Relationships
Architecture and Design Implementation
Installation and Upgrade Configuration
Time of Introduction
Impact Accessibility Accountability Adaptability Administrability Affordability Agility Availability Capability Composability Configurability Compatibility Demonstrability Deployability Durability
Executability Extensibility Evolvability Fidelity Flexibility Functionality Integratability Interoperability Interpretability Maintainability Manageability Mobility Modifiability Operability
Performability Portability Practibilty Practicality Predictability Producibility Recoverability Reliability Repeatability Responsibility Reusability Scalability Serviceability Stability
Supportability Suitability Survivability Tailorability Testability Traceability Trainability Transportability Trustability Understandability Upgradability Usability Verifiability Vulnerability
Product Quality Model
Deliverable: Report
Trends • Most popular issues • Issues breakdown by location, impact, time of
introduction • Overall code quality
• Better understanding nature of the issues
How to Survive?
A lot of routine tasks
A lot of data
A lot of formal stuff
• reVu IDE plugin
• Automated code analyzers
• Report generators
• Data refine tools
ECG Toolkit
Oleksandr Zarichnyi
Code Audit Automation
Vitaliy Stepanenko
Software Audit Tools
1. Static code analyzers 2. Dynamic code analyzers 3. Utilities
Workflow
• Sniffing
• Collecting & merging results
• Exporting data to reVu
• Manual review in reVu
• Generating final report
Code Sniffers
PhpMd (PHP mess detector)
Php_CodeSniffer
How to sniff?
Reflection
Parsing Tokenization
RegExp? Token Lexeme Line
T_OPEN_TAG <?php 1
T_COMMENT /**@var $a bool */ 2
T_VARIABLE $a 3
T_EQUAL = 3
T_LNUMBER 2 3
T_IS_NOT_EQUAL <> 3
T_LNUMBER 1 3
T_SEMICOLON ; 3
<?php /**@var $a bool */ $a = 2 <> 1;
Issues outside PHP code
Xml files (configuration & layout updates)
DB Schema (indexes, non-optimal field types)
Wrong file’s placing & naming
Javascript, CSS & HTML issues
Working on compound sniffers
1. Many different approaches which should be used together
2. Calculations redundancy Tokenize code again and again by each sniffer Typically Magento application have over 8,000 files consisting of code, templates, JavaScript and CSS
Difficulties
Solutions: software graph
1. File system as part of graph
Software graph
1. File system as part of graph
2. PHP Reflection as part of graph (TokenReflection)
Software graph
1. File system as part of graph
2. PHP Reflection as part of graph (TokenReflection)
3. PHP lexical tree inside methods & functions as part of graph (PHP_Parser)
Software graph
1.Back links, circular links (parent class, overridden method)
2.Typed connections, polymorphism
Semantic relations: • Holonymy & meronymy • Hyponymy & Hyperonymy
Node families & extensibility 1. File system 2. PHP • Reflection (classes, methods, namespaces, etc) • PhpDepend (metrics for reflection objects) • Lexical tree (inside php functions) 3. Magento • Directory-based
Magento application, code pools, namespaces, modules • Class-based
models, controllers, blocks, helpers • File-based
Install & upgrade scripts, configuration files, layout updates extends files 4. Other programming languages? 5. Git, SVN? 6. Virtual nodes • Magento functional scopes • Specific code (ex: performing DB Queries)
Software Graph’s API
• Visitor • Direct querying
search methods, fluent interface, state monad • Query language
just syntactic sugar
Software graph: additional benefits
1. Query caching, lazy loading
2. Intelligent node search, traverse algorithms based on relation types
3. Easy way to get path (issue location) File Class Name Method name Line numbers
Query Language Implementation
Parser: Built with Loco, parser combinator for PHP Interpreter: State monad wrapper for graph traverse API + 1. Simple boolean operators 2. Tunneling to native php functions
Examples
Example 1 Find model load in loops \LoopStatement.body\MethodCall[name = “load”]
class Ecg_Sniffs_Performance_LoopModelLoadSniff implements PHP_CodeSniffer_Sniff { public function register() { return array(T_WHILE, T_FOR, T_FOREACH, T_DO); } public function process(PHP_CodeSniffer_File $phpcsFile, $stackPtr) { $tokens = $phpcsFile->getTokens(); $opener = $tokens[$stackPtr]['scope_opener']; $closer = $tokens[$stackPtr]['scope_closer']; for ($ptr = $opener + 1; $ptr < $closer; $ptr++) { $content = $tokens[$ptr]['content']; if ($tokens[$ptr]['code'] === T_STRING && $content == 'load') { $phpcsFile->addError('Model load in loop detected', $ptr, 'ModelLoad', array $content)); } } } }
//*[ name()="node:Stmt_Foreach" or name()="node:Stmt_Do" or name()="node:Stmt_For" or name()="node:Stmt_While" ]//node:Expr_MethodCall/subNode:name[ scalar:string = "load" ]
Example 2
Find all methods in code that has inconsistence between docBlock annotation and really returned value Method [ \DocBlock.returnAnnotation.types as $types, \Statement [ name=“return”, !(expression.returnedType in $types) ] ]
Example 3
Find direct output in models \(MageModel or MageResourceModel)\OutputStatement
Rule Examples 1. Perhaps DB query not inside resource model or install/upgrade script is an issue
2. DB query inside block and controller definitely is an issue
Next concept: confidence
Perhaps? Definitely?
Two types of confidence 1. Confidence based on accuracy of sniffs
Any rules have exceptions
2. Confidence based on accuracy of observations Used technologies are not ideal
Code Bases
1. Target codebase Concrete module, local code pool
2. Auxiliary codebase PEAR libs, whole Magento application
Example: Analyzed class inside target code base, parent class inside auxiliary codebase. We search for copy-pasted code in overridden methods without parent’s method call.
Vitaliy Stepanenko
References
https://github.com/magento-ecg/coding-standard – ECG CodeSniffer coding standard
http://cwe.mitre.org – Common Weakness Enumeration
https://github.com/syllant/idea-plugin-revu – reVu code review plugin
https://github.com/nikic/PHP-Parser – PHP Parser
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-
contained-tags – Epic answer about parsing HTML with regular expressions
http://phpmd.org/ – PHP Mess Detector
https://github.com/Andrewsville/PHP-Token-Reflection – PHP Token Reflection
Questions