Click here to load reader
Upload
pmjones88
View
1.222
Download
27
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Modernizing LegacyApplications In PHP
Paul M. JonesZendCon 2014
1
Read These
2
About Me
• 8 years USAF Intelligence
• BASIC in 1983, PHP since 1999
• Jr. Developer, VP Engineering
• Aura project, Zend_DB, Zend_View
• PHP-FIG, PSR-1, PSR-2, PSR-4
• MLAPHP (http://mlaphp.com)
3
Overview• Legacy Applications
• Prerequisites
• Implement an Autoloader
• Consolidate Classes& Functions
• Replace globals with DI
• Replace new with DI
• Write Tests
• Extract SQL Statements
• Extract Domain Logic (Model)
• Extract Presentation Logic (View)
• Extract Action Logic (Controller)
• Replace remaining include
• Separate public resources
• Decouple URLs from page scripts
• Remove repeated logic in page scripts
• Replace page scripts with DI container
4
Legacy Applications
5
What Does Legacy Mean?• Page scripts directly in the document root
• Special index files to prevent directory access
• Special logic at the top of some files to `die()` or `exit()`
• Include-oriented instead of class-oriented or object-oriented
• Relatively few classes
• Poor class structure
• Relies more heavily on functions than on class methods
• Page scripts, classes, and functions combine MVC concerns
• One or more incomplete attempts at a rewrite
• No automated test suite6
Typical PHP Application Structurebin/ # command-line tools cache/ # cache files common/ # commonly-used include files classes/ # custom classes Image.php # Template.php # functions/ # custom functions db.php # log.php # cache.php # setup.php # configuration and setup css/ # stylesheets img/ # images index.php # home page script
7
Typical PHP Application Structurejs/ # JavaScript lib/ # third-party libraries log/ # log files page1.php # other page scripts page2.php # page3.php # sql/ # schema migrations sub/ # sub-page scripts index.php # subpage1.php # subpage2.php # theme/ # site theme files header.php # a header template footer.php # a footer template nav.php # a navigation template 8
Typical PHP Application Code001.php
9
Maintenance Headaches• include statements to execute logic
• inline function definitions
• global variables
• MVC logic combined in a single script
• trusting user input
• possible SQL injection vulnerabilities
• possible XSS vulnerabilities
• unquoted array keys generating notices
• if blocks not wrapped in braces
• copy-and-paste repetition
10
Technical Debt
• A metaphor referring to the eventual consequences of poor or evolving software architecture and software development within a codebase.
• As a change is started on a codebase, there is often the need to make other coordinated changes at the same time in other parts of the codebase.
• http://en.wikipedia.org/wiki/Technical_debt
11
Rewrite From Scratch?• Sexy! Tempting! Do it right the first time!
• Expend effort while not earning revenue
• Old devs on new project? New devs on new project?
• Takes longer than you think
• End up with different bad architecture12
Incremental Refactoring• Pay off smallest debt first (build inertia and raise spirits)
• Small changes across codebase
• Build on previous small changes
• Goal is not “perfection”
• Goal is to “improve over time”13
Refactor To A Framework?
• Essentially a rewrite. If system could be swapped to a framework …
• … we would not actually need to.
• At the end of this process, swapping out is more possible …
• … but might not want to at that point.
14
Review
• What is legacy code?
• Why to refactor instead of rewrite
15
Prerequisites
16
Basic Tools
• Revision control system
• PHP 5.0 or later
• Editor/IDE with multi-file search and replace
• Style guide
• Test suite
• Attitude and approach
17
Revision Control
• SVN, Git, Hg, etc
• Change logs, commits, reverts, push/share changes
• Even if you are the only one on the project
• Install and initialize before continuing
18
PHP 5.0 Or Later
• Yes, this is ancient, but it’s legacy code
• 4.x is a non-starter (autoloading)
• Later is better (5.4/5/6)
• Upgrade to 5.x before continuing (itself a detailed process)
19
Editor/IDE With Multi-File Search And Replace
• Emacs
• PHPStorm
• SublimeText
• TextMate
• Vim
• Zend Studio
20
Style Guide• “This code looks fine” said no developer ever
• Mixed or idiosyncratic styles lead to cognitive friction
• If project has a consistent style, keep it (ugly or not)
• If not, pick a *public* standard (PSR-1 and PSR-2)
• Resist the urge to reformat in toto — single files only21
Test Suite
• If we had unit tests already, we’d be a lot better off
• Characterization tests best
• QA testing is OK-ish
• “Spot check” is worst
• Short-term incentives lean toward “spot check” at first
• Improved discipline with time
22
Attitude And Approach
• Make one small change
• “Spot check” it
• Commit it
• Push it
• Notify QA
• Make the next small change
23
Attitude And Approach
• No big changes
• Revert and redo as needed
• Add as little as possible
• Keep the application running
• Each incremental change is a quality improvement
• Stop at any point and start again later
24
Review
• Basic prerequisites for refactoring
• Attitude and approach to refactoring
25
Implement an Autoloader
26
Why Autoload?• Instead of “include ClassName.php” …
• … PHP will load the ClassName on request
• Remove “definition” includes/require statements
• Reveal the execution structure of the application
• Fewer includes means less to wade through27
Autoloading Standards
• PSR-0 class-to-file mapping
• “\Foo_Bar\Baz\Dib_Zim” maps to “…/Foo_Bar/Baz/Dib/Zim.php”
• PSR-4 only if no underscores in classes, and full class-to-file mapping
28
Single Central Class Location
• Need a base, known directory for the autoloader
• Create a directory in the application for *all* classes (lib, src, classes)
• May already exist: only class files? mixed structures?
• Later, we will move all classes to this location
29
Register Autoloader In Setup
• Create an autoloader function or class
• Register it with SPL early in setup code
• Must be before any definitional “include” statements
• 002.php
30
Followup
• Spot check
• Commit
• Push
• Notify QA
31
What If I Already Have An Autoloader?
• Use it as-is if already PSR-0
• Modify to add PSR-0 behavior
• Register added PSR-0 autoloader
• Composer:$loader = require '/path/to/app/vendor/autoload.php';$loader->add('', '/path/to/app/classes');
32
Performance Implications?
• Some reason to think autoloading is slower than include
• Optimize on the right resources
• Any drag from autoloading likely overshadowed by other issues
• If this is the main bottleneck, you’re in great shape
33
Review
• What is autoloading?
• Why use autoloading?
• Basic implementation
34
Consolidate ClassesAnd Functions
35
Why Consolidate?
• Remove include calls in favor of autoloading
• Remaining includes reveal logic paths
• Provide structure for class hierarchy
36
Process (1/2)
• Find an include statement that pulls in a class definition file.
• Move that class definition file to our central class directory location, making sure that it is placed in a sub-path matching the PSR-0 rules.
• In the original file *and in all other files in the codebase* where an `include` pulls in that class definition, remove that `include` statement.
37
Process (2/2)
• Spot check to make sure that all the files now autoload that class by browsing to them or otherwise running them.
• Commit, push, and notify QA.
• Repeat until there are no more `include` calls that pull in class definitions.
38
Initial Class Locationclasses/ # our central class directory location Mlaphp/ Autoloader.php # A hypothetical autoloader class foo/ bar/ baz.php # a page script includes/ # a common "includes" directory setup.php # setup code index.php # a page script lib/ # a directory with some classes in it sub/ Auth.php # class Auth { ... } Role.php # class Role { ... } User.php # class User { ... }
39
Find A Candidate Include
<?php require 'includes/setup.php'; require_once 'lib/sub/User.php';
// ... $user = new User(); // ... ?>
40
Move The Class Fileclasses/ # our central class directory location Mlaphp/ Autoloader.php # A hypothetical autoloader class User.php # class User { ... } foo/ bar/ baz.php # a page script includes/ # a common "includes" directory setup.php # setup code db_functions.php # a function definition file index.php # a page script lib/ # a directory with some classes in it sub/ Auth.php # class Auth { ... } Role.php # class Role { ... }
41
Remove First Include Call
<?php require 'includes/setup.php';
// ... // the User class is now autoloaded $user = new User(); // ... ?>
42
Remove Other Include Calls
• The class file is likely included in other locations
• Use project-wide search facility to find other includes of the class
• Remove those include calls as well
• Keep a list of changed files
^[ \t]*(include|include_once|require|require_once).*User\.php
43
Beware Mismatches
• Inspect “by eye” the search results when removing
• Make sure the include call is to the same class file
include_once("/usr/local/php/lib/User.php");
44
Spot Check
• “Spot check” changed files
• Best to work with error reporting at E_ALL or E_WARNING
• Look for “file not found” and “class not defined” errors
45
Followup
• Commit, push, notify QA
• Continue to next class
• Done when there are no more class-related includes
46
Consolidate FunctionsInto Classes
• Might be more procedural than class- or object-oriented
• Want to leverage autoloading, but only works with classes
• Convert function files into classes and call them as methods
47
Process (1/3)• Find an `include` statement that pulls in a function definition file.
• Convert that function definition file into a class file of static methods; we need to pick a unique name for the class, and we may need to rename the functions to more suitable method names.
• In the original file *and in all other files in the codebase* where any functions from that file are used, change calls to those functions into static method calls.
48
Process (2/3)
• Spot check to see if the new static method calls work by browsing to or otherwise invoking the affected files.
• Move the class file to the central class directory location.
• In the original file *and in all other files in the codebase* where an `include` pulls in that class definition, remove the relevant `include` statement.
49
Process (3/3)
• Spot check again to make sure that all the files now autoload that class by browsing to them or otherwise running them.
• Commit, push, and notify QA.
• Repeat until there are no more `include` calls that pull in function definition files.
50
Find A Candidate Include
<?php require 'includes/setup.php'; require_once 'includes/db_functions.php';
// ... $result = db_query('SELECT * FROM table_name'); // ... ?>
51
Convert Function FileTo Class File
• 003.php
• The include call stays in place
• The file also stays in place
52
Change Function Callsto Static Method Calls
• Find old call, replace with new call
• Do so for each old function converted to a static methodFind:
db_query\s*\(
Replace:
Db::query(53
Spot Check #1
• Check to make sure the new static calls work
• The file has not moved yet
• The include has not been removed yet
54
Move The Class File classes/ # our central class directory location Db.php # class Db { ... } Mlaphp/ Autoloader.php # A hypothetical autoloader class User.php # class User { ... } foo/ bar/ baz.php # a page script includes/ # a common "includes" directory setup.php # setup code index.php # a page script lib/ # a directory with some classes in it sub/ Auth.php # class Auth { ... } Role.php # class Role { ... }
55
Followup
• Remove all include calls to the moved file
• Spot check the files with changes
• Commit, push, notify QA
• Continue until all function files are converted to class files
56
How To Pick Candidate Include Files?
• Manually traverse the entire codebase and work file-by-file.
• Generate a list of class and function definition files, and then generate a list of files that `include` those files.
• Search for every `include` call and look at the related file to see if it has class or function definitions.
57
What If An Include Defines More Than One Class?
• Split the single file into multiple files.
• Create one file per class, and name each file for the class it contains per the PSR-0 naming and autoloading expectations.
58
Isn’t “One Class Per File” Wasteful?
• Optimize on the right resources
• Any performance drag is small compared to other issues
• Consistency above all: if some have one, and others have 2+ … ?
• Reasonable to group subclasses in subdirectories
59
What About Inline Definitions?
• Remove the inline definition to its own file.
60
What If Definition File Executes Logic?
• Move the definition and leave the logic
• Do not remove the include call, so the logic is executed
• 004.php
61
What If Two Classes Have The Same Name?
• Rename one or both of the classes when moving them.
62
What About Third-Party Libraries?• Don’t want to move 3rd-party libs (makes upgrading harder)
• If it uses an autoloader, add to stack; if not:
• Modify our autoloader with the 3rd-party lib in mind
• Write an additional autoloader for the 3rd-party lib
• Consolidate to their own location if possible (move entire package)63
What About System-Wide Libraries?
• Installed outside the legacy application for system-wide use
• If Horde/PEAR, modify PSR-0 autoloader to look in multiple dirs
• 005.php
64
Use Instance Methods Instead Of Static Methods For Functions?
• Instance methods are better in the long run
• More trouble to convert at first, but do so if you can
• Easier to convert to static first, then to instance, but more time
• 006.php
65
Can This Be Automated?
• Not that I know of.
66
Review• Autoloading all classes from central location
• Autoloading all functions as classes from central location
• Definitional includes now removed
• Remaining includes execute logic
• More class-oriented, less include-oriented67
Replace Globals With Dependency Injection
68
Why Replace Globals?
• Classes still have globals in them
• “Spooky action at a distance” — change “here” affects “there”
• 007.php
69
Inject Dependencies Instead
• Create dependency outside the class
• Pass it into the class at construction time
• Then we can see dependencies from the outside
70
Process (1/2)
• Find a global variable in one of our classes.
• Move all global variables in that class to the constructor and retain their values as properties, and use the properties instead of the globals.
• Spot check that the class still works.
71
Process (2/2)
• Convert the global calls in the constructor to constructor parameters.
• Convert all instantiations of the class to pass the dependencies.
• Spot check, commit, push, and notify QA.
• Repeat with the next global call in our class files, until none remain.
72
First Steps
• Search for “global $” in the central class directory
• Move all globals in class to constructor (008.php)
• Spot check
• Convert globals to properties and constructor params (009.php)
73
CHANGE INSTANTIATIONS
• Find instantiations of the class (imperfect but pretty good):
• new\s+ClassName\W
• use\s+ClassName\W
• Change instantiations to use constructor params (010.php)
74
Followup
• Spot check, commit, push, notify QA
• Continue until there are no more uses of global keyword
75
What About Globals In Static Methods?
• Convert to an instance method and pass dependency
• Pass dependency as a method parameter
76
What About Superglobals?• Special case of globals: automatically global, like it or not
• Must replace them since they are global
• Cannot find with search for “global”
• Often need 2-3 superglobals at a time
• Could pass copies, but $_SESSION may not work that way77
Superglobal Solution: Pass A Request Object• Use the Request data structure from http://mlaphp.com/code
• Inject into all classes using superglobals
• Classes then use the Request for copies of superglobal data
• Request handles the special case of $_SESSION
• 011.php78
What About $GLOBALS?
• Remove all uses of $GLOBALS the same as with the global keyword.
79
Review
• Removed all global keywords and replaced with injection
• Removed all superglobals and replaced with injection
80
Replace new with Dependency Injection
81
Why Replace New?
• Makes dependencies explicit
• Decouples object *creation* from object *use*
• Allows more control for object creation from outside
• 012.php
82
Parallel Solutions
• Replace “early” creation with dependency injection
• Replace “late” creation by injecting a Factory object
• Factory’s only purpose is to create and return objects
83
Process (1/3)• Find a non-Factory class with the new keyword in it.
• For each “early” creation in the class ...
• Extract each instantiation to a constructor parameter.
• Assign the constructor parameter to a property.
• Remove any constructor parameters and class properties that are used only for the `new` call.
84
Process (2/3)
• For each “late” creation in the class ...
• Extract each block of creation code to a new Factory class.
• Create a constructor parameter for each Factory and assign it to a property.
• Modify the previous creation logic in the class to use the Factory.
85
Process (3/3)
• Change all instantiation calls for the modified class to inject dependencies.
• Spot check, commit, push, and notify QA.
• Repeat with the next new keyword that in a non-Factory object.
86
Find A Use Of New
• Search the project for “new” instantiations
• If in constructor method, inject directly (013.php)
• If in a non-constructor method, extract and inject Factory (014.php)
87
Change Instantiations
• Search for new\s+ItemsGateway\(
• Create dependencies and pass to the instantiation (015.php)
88
Followup
• Spot check, commit, push, notify QA
• Repeat with next new keyword in non-Factory class
89
Isn’t This A Lot Of Code?
• Yes, it’s true — there are more lines of code to suit the same end.
• The real issue is not “more code”.
• The real issues are “more testable, more clear, less coupled”
• 016.php
90
What About SPL ClassesAnd Exception Classes?
• These are reasonable exceptions to the “no new outside of Factory” rule, since they are provided by PHP directly and relatively simple.
• Includes ArrayObject, ArrayIterator, Exceptions of all sorts
• Be careful about other native PHP classes like PDO with complex setups, outside configuration, etc.
• Err on the side of dependency injection instead of inline creation.91
What About Intermediary Dependencies?
• Sometimes dependencies have dependencies
• These are injected into one class so that another class can use them
• Construct completed object and pass that instead
• 017.php
92
Can A Factory Create Collections?
• Factories are for object-creation logic
• Object-creation logic may be simple or complex
• As long as it is limited to object creation and return, all is well
• 018.php
93
Can We Automate All These Injections?
• “Manual” injection so far — create dependencies “by hand” and inject
• Tedious and repetitive, even with Factory objects
• DI (aka IOC) Container is tempting, but resist for now
• Too radical a change at this point; later will be more natural fit
94
Review
• Removed all in-class instantiations
• Direct injection and Factory classes
• Page scripts might be longer but dependencies now explicit
95
Write Tests
96
Why Tests?
• Previously, changes to existing code filled us with dread
• Tests help to assure us that changes do not break functionality
97
Test Resistance
• Feels like make-work after such great progress
• No feeling of joy from writing code that just tests other code
• Does not itself feel like an improvement to the codebase
98
Testing Dogma
• "Do not interact with the file system; build a virtual file system instead.”
• "Do not interact with the database; build a set of data fixtures instead.”
• "Rewrite your classes to use interfaces instead of concrete classes, and write test doubles for all the dependencies.”
• Sounds insurmountable so skip it and move on … ?
99
Overcoming Test Resistance
• Without tests, condemning ourselves to more suffering later
• Enabling a continued awfulness to our work life
• Uncertainty that a change does not break something
• Writing tests sucks, but having written tests is awesome
100
The Way of Testivus• http://www.agitar.com/downloads/TheWayOfTestivus.pdf
• "More testing karma, less testing dogma.”
• "The best time to test is when the code is fresh.”
• "Write the test that needs to be written.”
• "An imperfect test today is better than a perfect test someday.”
• "Write the test you can today."101
Write The Best Test You Can
• Characterization that checks output
• Functional/integration that interacts with network/database/files
• Loose unit test that combines concrete classes
• Strict unit tests that isolate the unit with test doubles
102
Build Inertia Towards Testing• Imperfection is irrelevant; the test is more important than the unit.
• Imperfect tests can be perfected; nonexistent tests cannot.
• Writing *no* tests leads to *never* writing tests
• Writing *any* tests leads to writing *more* tests.
• Writing better tests leads to writing better units.103
Caveats
• Testing is relatively specialized
• The full scope of testing is to much for this talk
• Read “The Grumpy Programmer’s PHPUnit Cookook” (Hartjes)
104
Process (1/2)
• Install PHPUnit
• Create a tests directory and test config files
105
Process (2/2)• Pick a public class method
• Write a test for it
• Run the test; rejoice if it passes, rejoice if it fails
• Commit/push when the test passes
• Pick another public method and write a test for it
• Continue until all public methods are tested106
Install PHPUnit
• composer global require phpunit/phpunit=4.*
• pear config-set auto_discover 1pear install pear.phpunit.de/PHPUnit
• Download phpunit.phar
107
Create A Tests DirectoryAnd Config Files
• Create a top-level tests directory
• Create subdirectory for test classes in tests directory
• Add phpunit.xml and bootstrap.php files
• 019.php
108
Pick A Public Class Method
• Pick something simple at the start
• A class that is easy to construct (no dependencies)
• A public method that returns a result (no echo/print)
• Want an easy win; small wins build confidence and motivation
109
Run The Empty Test
• Watch it fail, and rejoice!
• The suite works, there’s just no testing code in it.
tests $ phpunit PHPUnit 3.7.30 by Sebastian Bergmann.
Configuration read from tests/phpunit.xml
F
Time: 45 ms, Memory: 2.75Mb
There was 1 failure:
1) Warning No tests found in class "FooTest".
FAILURES! Tests: 1, Assertions: 0, Failures: 1. tests $
110
Add A Test Method And Run Again• 021.php
• Rejoice when it passes!
• If it ever fails again, something is broken.
• Rejoice when it fails! You know you need to fix it.
tests $ phpunit PHPUnit 3.7.30 by Sebastian Bergmann.
Configuration read from tests/phpunit.xml
.
Time: 30 ms, Memory: 2.75Mb
OK (1 test, 1 assertion) tests $
111
Followup
• Commit the passing test, push
• Probably no need to notify QA *unless* code changed
• Pick another public method to test
• Continue until all public methods have tests
112
Can We Skip This And Do It Later?
• No.
113
Come On, Really, Can We Do This Later?
• You still have high test resistance, which is understandable.
• If you won’t write all your tests at once, at least write them in stages.
114
Options For Writing Tests In Stages
• One complete test class per day.
• Each time you use a method, check for a test, and write it if needed.
• Each feature or bug, keep list of classes or methods, and write tests for them as part of of the completion process.
• Each new method, write a test for it.
115
Delegate Writing Of Tests
• Junior dev!
• But soon the junior will know more about the system than you.
• Empower the junior to berate the authors to write testable code.
116
Testing Is Not Optional
• Writing a test suite is a non-negotiable aspect of modernizing
• If you don’t start now, you never will
• Write them now, or write them as we go, sooner rather than later
117
What About Hard-To-Test Classes?
• Working Effectively With Legacy Code (Feathers)
• Refactoring (Fowler et al.)
• https://github.com/adamculp/refactoring101
118
What About Our Earlier Characterization Tests?
• Keep them
• Basis of acceptance tests for QA team
119
Should We Test ProtectedAnd Private Methods?
• Tests that inspect the inner workings of a class are too brittle
• Public methods expose the results of the inner workings
120
Can We Change A Test?
• If method behavior changes on purpose, test may have to change too
• Run the entire suite, not just the one test
121
Do We Need To Test Third-Party Libraries?
• If they already have tests, no.
• Otherwise, our priorities will dictate.
• Dependency on the library acting a certain way? Test just that.
• Remember that our legacy application is the main concern.
122
What About Code Coverage?
• A report from PHPUnit telling us how many lines were tested
• Uncovered code is untested code; more coverage is better
• 100% is a good goal but likely unfeasible early on
123
Review
• We have at least started writing tests in The Way Of Testivus
• We are committed to adding new tests as part of normal work
• Each new test means more warnings about breaking changes
• Rejoice when they pass! Rejoice when they fail!
124
Extract SQL Statements
125
Why Extract SQL?
• Begin to separate the various concerns of the application
• Leads to easier maintainability and testability
• Data source work becomes separate from data manipulation work
• “SQL” is a stand-in for any data source
126
Embedded SQL
<?php $db = new Db($db_host, $db_user, $db_pass); $post_id = $_GET['post_id']; $stm = "SELECT * FROM comments WHERE post_id = $post_id"; $rows = $db->query($stm); foreach ($rows as $row) { // output each row } ?>
127
Goals For Extracting SQL
• Test the SQL interactions in isolation from the rest of the code
• Reduce the number of repeated SQL strings throughout the codebase
• Collect related SQL commands for generalization and reuse
• Isolate and remove security flaws such as SQL injection
128
Extract And Replace With Gateway Classes
• Only responsibility is to move data to and from data sources
• Probably “Table Data Gateways”
• Use a Gateway appropriate to the data source
129
Process (1/2)
• Search entire codebase for SQL statements.
• For each statement not already in a Gateway, move the statement and relevant logic to a related Gateway class method.
• Write a test for the new Gateway method.
130
Process (2/2)
• Replace the statement and relevant logic in the original file with a call to the Gateway class method.
• Spot check, commit, push, and notify QA.
• Repeat with the next SQL statement that is outside a Gateway class.
131
Search For SQL Statements
• Search for (SELECT|INSERT|UPDATE|DELETE)
• Easier if statement case is consistent (either upper or lower)
• Mixed-case leads to false positives
<?php $stm = "SELECT * FROM comments WHERE post_id = $post_id"; $rows = $db->query($stm); ?>
132
Move SQL To Gateways• Tedious and detail-oriented, even in simple cases
• What to name the class and method?
• How to deal with query parameters?
• How to avoid security flaws?
• What is the proper return value?133
Namespace And Class Names• Organize by layer, or by entity?
• By layer : Gateway\* or DataSource\Gateway\*
• By entity: Comments\* or Domain\Comments\*
• If organization in place, follow it
• Otherwise, by entity: Domain\Comments\CommentsGateway134
Method Names• Look to existing code for a convention; stay consistent if possible
• If no consistent pattern, try select*(), insert*(), etc.
• Try indicating the result and criteria: selectOneById(), etc.
• Use good professional judgment; take hints from the existing code
• Stay consistent above all135
Initial Gateway Class And Method• 022.php
• Essentially exact copy of existing logic
• Wrapped in class with dependency injection and method param
• No changes to logic, including security problems
• Need to fix SQL injection when we see it136
SQL Injection
137
SQL Injection
• “Little Bobby Tables” http://xkcd.com/327/
• Never assume parameter values are safe, even if you “know” they are
• Old “safe” values become unsafe with changes to codebase
• Each Gateway method should prevent SQL injection *itself*
138
Defeating SQL Injection• Best: Prepared statements and parameter binding, or cast to int/float
• OK: Quote-and-escape each parameter before interpolating into the query string
• Better than nothing: Escape each input parameter before interpolating into the query string (still susceptible to charset problems)
• 023.php139
Write A Test• Remember The Way Of Testivus!
• Write the test you can, when you can, while the code is fresh
• It will be imperfect, but it can be perfected
• It will likely touch the database, so have a test database
• 024.php140
Replace The Original SQL Code
• 025.php
• No changes to operational logic in original location
141
Followup
• Spot check, commit, push, notify QA
• Find the next SQL statement in a non-Gateway location
142
What About INSERT, UPDATE, DELETE ?• Example showed only SELECT
• Other query types also go in Gateways
• INSERT/UPDATE/DELETE harder because of params
• Consider using a data array for these
• 026.php143
What About Repetitive SQL Strings?
• We are likely to see lots of repetition with variation
• SELECT * FROM comments WHERE post_id = $post_id LIMIT 10
• Professional judgment: combine where simple, avoid where complex
• Don’t over-imagine possible scenarios! Combine only what you see.
• Always runs tests after changes144
What About Complex Query Strings?
• Simple queries as examples so far, likely to see more complexity
• Extract query-building logic to Gateway from surrounding logic
• Requires care, caution, and high attention to detail
• 027.php vs 028.php
145
What About QueriesIn Non-Gateway Classes?
• Examples have all been page scripts
• Other classes likely to have SQL embedded as well
• Follow same extraction process
• Inject Gateway into originating class
• 029.php146
Can We Extend From A Base Gateway Class?
• Once you have enough extractions to see patterns, yes.
• Be careful not to preemptively create functionality
• 030.php
147
What About Complex QueriesAnd Complex Result Structures?
• Examples have been with single tables
• Likely to need multiple tables with joins
• Likely to have nested-array result structures
148
How To Extract Complex QueriesAnd Result Strctures
• Identify how to split up the queries into separate Gateways (if at all)
• Determine which Gateway should receive the “main” logic, even if across multiple tables
• 031.php
• Note we still have an N+1 problem
149
What If There Is NoDatabase Abstraction Class?
• Examples assume a “Db” or “Pdo” type of object
• Legacy codebases often rely on mysql_*() functions direction
• Upgrade to PDO if you can, but this can be difficult
• Could move mysql calls to Gateways, but global state (link identifier)
150
Option: Proxy Calls Through MysqlDatabase
• http://mlaphp.com/code
• Class itself contains the link identifier, removing global state
• Connects only on queries, not on creation
151
Replacing mysql_*() with MysqlDatabase• Search the entire codebase for the mysql_ prefix on function calls.
• Where there are function calls with the mysql_ function prefix …
• Create or inject an instance of MysqlDatabase.
• Replace each mysql_ function prefix with the MysqlDatabase object variable and a single arrow operator (->)
• Spot check, commit, push, and notify QA.
• 032.php152
After Replacing mysql_*()
• The plan a migration away from the deprecated mysql_*() to PDO
• Easier now that all mysql_*() calls are in the Gateways
153
Review
• We have extracted all SQL statement to Gateways
• Now have tests for all Gateways, and thus all SQL interactions
• Possibly moved away from mysql_*() to PDO or MysqlDatabase
154
Extract Domain Logic (Model)
155
Why Extract Domain Logic?
• Data source work extracted to encapsulate database interactions
• Domain logic co-mingled with presentation and other logic
• Validation, modification, calculation, etc.
• Extract so we can reuse and test independent of other concerns
156
Domain Logic Patterns (1/2)
• Transaction Script "... organizes [domain] logic primarily as a single procedure, making calls directly to the database or through a thin database wrapper. Each transaction will have its own Transaction Script, although common subtasks can be broken into subprocedures."
• Domain Model "... creates a web of interconnected objects, where each object represents some meaningful individual, whether as large as a corporation or as small as a single line on an order form."
157
Domain Logic Patterns (2/2)
• Table Module "... organizes domain logic with one class per table in the database, and a single instance of a class contains the various procedures that will act on the data. ... if you have many orders, a Domain Model will have one order object per order while a Table Module will have one object to handle all orders.”
• Service Layer "... defines an application's boundary and its set of available operations from the perspective of interfacing client layers. It encapsulates the application's business logic, controlling transactions and coordinating responses in the implementation of its operations."
158
Which To Use?
• Service Layer implies sophistication that is not likely present
• Domain Model implies well-defined set of business entity objects
• Table Module seems OK; probably not working with single tables
• If one of these already implemented, use it! Otherwise …
159
Transaction Script
• Dump existing logic into a single class method and call it
• Simple, boring, lowest-common-denominator, but works anywhere
• Remember: refactor as-it-is, not rewrite as-we-want-it-to-be
• After extracting we can think about more sophisticated patterns
160
WARNING• This is the most difficult part of modernization
• Domain logic is the true core of our application
• Difficult to give examples as domain logic is different in each app
• Success depends on familiarity and deep knowledge of application
• Care, attention to detail, time consuming, very demanding161
Process (1/2)
• Search entire codebase for Gateway uses outside Transactions classes
• Examine code surrounding Gateway operations for domain logic
• Extract domain logic to one or more Transactions classes
• Modify the original code to use the Transactions class
162
Process (2/2)
• Spot check to make sure the original code still works properly
• Write tests for the extracted Transactions logic
• Commit, push, notify QA
• Search again for uses of Gateways outside of Transactions
163
Search For Gateway Uses
• Search for new .*Gateway and track through code
• Might be injected into a class that uses it for domain logic
• Might be passed along into other methods after initial use
• Code surrounding the usage is primary candidate for examination
164
Discover And Extract Domain Logic• WARNING: Follow all previous lessons when extracting!
• No use of globals or superglobals (use Request object)
• No use of new keyword outside Factory objects
• Use dependency injection
• Consistency above all165
Candidate Logic For Extraction• normalizing, filtering, sanitizing, and validating of data
• calculation, modification, creation, and manipulation of data
• sequential or concurrent operations and actions using the data
• retention of messages from those operations and actions
• retention of values and variables for later inputs and outputs166
Extraction Activities
• Break up or reorganize the extracted domain logic into support methods
• Break up or reorganize the original code to wrap the new Transaction calls
• Retain, return, or report data needed by the original code
• Add, change, or remove variables in original code related to extracted logic
• Create and inject dependencies for the Transactions classes and methods
167
This Is A Learning Exercise• Picking apart the app is a process of discovery
• Do not be afraid to make multiple attempts at extraction
• Going to fail at it over and over
• Treat these failures as knowledge-building exercises
• Revision control is your best friend168
Example Extraction
• 033.php and 034.php
169
Extraction Notes (1/3)• Two separate transactions: submit new article, update existing article
• Each operated on the user's credit counts in the database
• Various data normalizing and support operations
• Extracted to an ArticleTransactions class and two separate methods
• Named the methods for the domain logic being performed, not for the underlying technical operations
170
Extraction Notes (2/3)• Input filtering encapsulated as a support method in the
ArticleTransactions class for reuse
• ArticleTransactions receives ArticlesGateway and UsersGateway dependencies
• Variables related only to domain logic removed from the page script
• Placed into the Transactions class as properties171
Extraction Notes (3/3)
• Code in the original page script has been greatly reduced
• Is now essentially an object creation and injection mechanism
• Passing user inputs to the domain layer and getting back data to output later
• Original code can no longer see the $failure variable
• Must now get failure information from the object for later presentation
172
Followup• Spot check original code, modify/revert as needed
• Write Transactions tests, modify/revert as needed
• Spot check *again*, modify/revert/retest as needed
• Commit, push, notify QA
• Search again for uses of Gateways outside Transactions173
Are These SQL Transactions?• No, “Transaction Script” pattern not same as SQL transaction
• Does not have to wrap an actual SQL transaction
• However, thinking in terms of what a transaction should be may help define what the Transaction Script should contain
• The singular purpose of the Transaction Script may help determine boundaries of the domain logic to be extracted
174
What About Repeated Domain Logic?• As with Gateway extractions, we may find repeated domain logic
• Use professional judgment: if identical, or nearly, then combine
• If there are recognizable support elements, extract them separately
• Err on the side of caution: small steps are better than big ones
• Test, test, test175
Are Printing And EchoingPart Of Domain Logic?
• No; those are part of presentation logic
• Transactions classes should not contain print or echo
• Extract so it remains outside of domain logic
176
Can A Transaction Be A Class By Itself?
• Some logic complex enough to warrant a single class
• Or standardize of on single classes for Transactions by name
• Use __invoke() method or another method indicating single purpose
177
What About Domain Logic In Gateways?
• Possible that we moved domain logic with data source logic
• Extract from Gateway into Transaction
• Update relevant tests
178
What About Domain LogicIn Non-Domain Classes?
• Examples showed domain logic in page scripts
• Might be in other classes as well
• Follow same extraction process, then inject Transaction into class
• Make calls to the injected Transaction class
179
Review• Extracted core of our application (domain) to Transactions classes
• Mostly shuffled logic around to a new layer
• Domain logic independently testable
• Page scripts now much shorter
• Basis for further refactoring: Service Layer, Domain Model, DDD, etc180
Extract Presentation Logic (View)
181
Why Extract Presentation Logic?
• We have extracted a Model system (domain logic)
• Other concerns combined in the codebase (View, Controller)
• Separate the presentation logic into a View layer
• Independently testable from controller/action logic
182
What Is Presentation Logic?
• Generally, anything that sends output to the client
• Specifically, everything in the HTTP response (headers included!)
• echo, print(), printf(), sprintf()
• header(), setcookie(), setrawcookie()
183
The Response Is The Presentation
• Extract presentation logic to a View file
• The View file will be executed inside a Response object
• http://mlaphp.com/code/
184
Process (1/2)
• Find a page script that contains presentation logic mixed in
• Rearrange the code to collect all presentation logic into a single block *after* all the other logic, and spot check
• Extract the block to a view file to be delivered via a _Response_, and spot check again
185
Process (2/2)
• Add proper escaping logic and spot check again
• Write a test for the extracted logic
• Commit, push, notify QA
• Look for more presentation code mixed in with other code
186
Search For Presentation Logic
• At this point it should be easy; page scripts greatly reduced
• echo, print(), printf(), sprintf()
• header(), setcookie(), setrawcookie()
187
Rearrange Page Script And Spot Check
• Collect all presentation logic after a big noticeable comment
• Remember previous rules: no globals/superglobals, no new, etc.
• Spot check to make sure the page script still works
• 035.php and 036.php
188
Rearrangement Notes
• Moved variables not used by the business logic down to presentation block
• Moved header.php include down to presentation block
• Moved logic acting only on presentation variables to presentation block
• Replaced $_SERVER['PHP_SELF'] with an $action variable
• Replaced $_GET['id'] with an $id variable
189
Extract Presentation To A View File
• Create a top-level views directory
• Pick a view file name; match to page script path
• /foo/bar/baz.php maps to /views/foo/bar/baz.php
• Append with .html.php, .json.php, etc. if desired
190
Move Presentation Block To View File• Cut from page script, paste into view file
• Replace with a Response object that references the view file
• Always use the same variable name for the Response!
• Spot check to find missing variables (E_ALL will help)
• 036.php (recap) and 037.php191
Add Proper Escaping
• Escaping helps prevent Cross-Site Scripting vulnerabilities
• Always escape all variables all the time
• Escape for the output context: HTML body, HTML attribute, CSS, JS
• Requires diligence and attention to detail
192
htmlspecialchars()<form action="<?php echo $request->server['PHP_SELF']; ?>" method="POST">
<form action="<?php echo htmlspecialchars( $request->server['PHP_SELF'], ENT_QUOTES, 'UTF-8' ); ?>" method="POST">
193
Response::esc()
<form action="<?php echo $this->esc($request->server['PHP_SELF']); ?>" method="POST">
194
Escape For Proper Context
• Zend\Escaper for CSS, JS, HTML attributes
• Aura\Html for escapers and other methods
195
Write A View File Test
• Presents unique challenges
• A file, not a class
• Slightly different testing structure
196
Testing Structure
• Add tests/views/ directory
• Modify phpunit.xml to add views/ directory
<phpunit bootstrap="./bootstrap.php"> <testsuites> <testsuite> <directory>./classes</directory> <directory>./views</directory> </testsuite> </testsuites> </phpunit>
197
The View File Test• Even though testing a file, the test itself is a class
• Name for the view file being tested
• /views/foo/bar/baz.php => /tests/view/foo/bar/BazTest.php
• Mimic the Response-calling code in page script
• 038.php198
Rejoice When It Fails
• The expectation is empty output ($expect == ‘’);
• If it passes, we have a problem ;-)
• Now we need to assert correctness of output
199
Asserting Correctness Of Output
• Probably not testing entirety of output
• Naive check for key or critical strings with strpos()
• 039.php
200
More Robust Assertions
• PHPUnit assertSelect*() methods (CSS selectors)
• Zend\Dom\Query for finer examination of DOM tree
• Start simple, move toward more complex as needed
201
Followup
• Commit, push, notify QA
• Continue looking for mixed-in presentation logic
202
What About Headers And Cookies?
• Presentation work includes headers and cookies
• header() and setcookie() send output right away, regardless of ob_*()
• Difficult to test when unbuffered
203
Response Object Buffering Methods• Use Response::header(), setCookie(), and setRawcookie()
• Captures calls into array structure properties
• Inspect using Response::getHeaders()
• Response::send() calls the real header() etc. methods
• 040.php204
What If We Already HaveA Template System?
• Might use template in page script instead of Response
• If there are headers being sent, mixing logic in page script
• Future proof by moving template calls to view file
• Have page script invoke the view file via the Response
• 041.php205
What About Streaming Content?• Don’t want to read large files into memory
• Instead, want to stream file a few kb at a time
• Use Response::setLastCall() to invoke a callable after send()
• The callable can then stream out content
• 042.php206
What If We Have Lots Of Presentation Variables?
• Examples show only a few variables; likely to have 10, 20, more
• Presentation may have several include files needing those vars
• Collect commonly-used presentation vars to their own object
• Pass that object to the presentation
• 043.php207
What About Class MethodsThat Generate Output?
• Classes will often use echo, print, etc. instead of returning values
• Convert them to return the output instead of echoing it
• Remember to escape properly!
• Then echo the method call
• 044.php208
What About Business LogicMixed Into The Presentation?
• May discover Gateway or Transaction calls inside presentation code
• Move the calls outside the presentation code and retain in variable
• Pass variable to presentation
• May result in extra loops, but goal is clean separation
• 045.php209
What If A Page ContainsOnly Presentation Logic?
• Page script may be only presentation with no other code
• Use a Response object and view file anyway
• Need consistency in page scripts for later steps
210
Review
• All presentation logic extracted to view files
• All view files being executed through a Response object
• Views are testable independent of model and controller logic
• Page scripts getting shorter and shorter
211
Extract Action Logic (Controller)
212
Why Extract Action Logic?• Page scripts have only two kinds of logic now:
• Dependency-building logic uses app settings to create objects
• Action logic uses the created objects to generate Response
• Move action logic to its own Controller layer to separate concerns
• Independent testability213
Process (1/2)
• Find a page script where action logic is still mixed in
• Collect action logic to its own central block and spot check
• Extract the action logic to a new Controller class
• Modify the page script to use the new Controller and spot check
214
Process (2/2)
• Write a unit test for the new Controller class and spot check again.
• Commit, push, notify QA
• Find another page script with embedded action logic
215
Search For Embedded Action Logic
• Every page script ;-)
216
Collect Action Logic And Spot Check• Move all setup and dependency-creation code to the top
• Move all code using the created objects to the end
• Leave $response->send() as very last line
• Spot check when finished
• 046.php and 047.php217
Extract To Controller Class
• Pick a class name
• Likely in the Controller\* namespace
• Should reflect page script hierarchy
• /foo/bar/baz.php => Controller\Foo\Bar\BazPage.php
218
Move The Action Logic To Controller Class• Create a skeleton Controller class
• Use consistent invocation method name; e.g., __invoke()
• Cut the central action logic block, paste into the Controller
• Spot check and modify Controller to pass dependencies as params
• 048.php219
Convert Controller to DI and Spot Check• Move method params to constructor and retain as properties
• Modify method to use properties instead of params
• Modify page script to inject dependencies in setup and spot check
• 049.php
• Consistency in calling the controller will help in later steps220
Write A Controller Test• Name the test for the controller class
• Build all the controller dependencies in setup()
• If we can use test doubles (fakes, mocks, etc) so much the better
• Make assertions against the returned Response object
• getView(), getVars(), getHeaders(), getLastCall()221
Followup
• Commit, push, notify QA
• Move on to next page script with embedded action logic
222
Can We Pass Parameters To The Controller Method?
• Better not to do so, but only for reasons of consistency
• Most consistent is to pass nothing at all
• In later steps we will need Controller objects to be constructed ready-to-go with no added information needed
• All “extra” information should go in via constructor
223
Can A Controller Have Multiple Actions?
• Better not to, but sometimes easier when converting complex page scripts
• If necessary, split into separate action methods within the class
• Put switching logic in __invoke()
• Do not call them from the page script (consistency)
• Keep testing!
224
What If The Controller Contains Include Calls?
• Page scripts might still have include calls in them
• Artifact of include-oriented architecture
• These need to be extracted, but not yet
• Leave them in place for now
• The next step will remove them225
Review
• All action logic now extracted from page scripts
• Controllers are now independently testable
• Page scripts are now very short: create controller dependencies, create and invoke controller, get back Response and send it
• Complete Model-View-Controller system
226
Replace Remaining Includes
227
Why Replace Includes? (1/2)• Achieved MVC separation, but classes may still have includes
• Artifact of legacy “include-oriented” architecture
• Executes logic at include-time
• Mixes scopes and obscures logic flow; difficult to test
• 050.php228
Why Replace Includes? (2/2)• Method code and include code tightly coupled in non-obvious ways
• $user variable must pre-exist in scope
• Introduces $user_messages and $user_is_valid into scope
• May overwrite pre-existing variables in scope
• Solution: convert to a class method, then call that method229
WARNING
• If only a few includes, not so tough
• If many/complex/interdependent, very tedious and time consuming
• Possibly more difficult than Domain extraction due to spaghetti nature of include files
230
Process (1/5)
• Search the classes/ directory for an include call in a class.
• Search entire codebase to find how many times the file is included.
• Only one time? Or many times?
231
Process (2/5)
• If the included file is used only once (3 steps) …
• Copy included file code as-is directly over the `include` call.
• Test the modified class, and delete the include file.
• Refactor the copied code so that it follows all our existing rules (globals, new, DI, return not output, no includes).
232
Process (3/5)
• If used more than once (7 steps) …
• Copy contents of included file as-is to new class method.
• Replace include with inline instantiation and invocation.
• Spot check to find missing variables; add to new method by reference.
233
Process (4/5)• Search entire codebase for include calls to same file; replace each with inline
instantiation and invocation; spot check and and unit test.
• Delete original include file; unit test and spot check the entire legacy application.
• Unit test new class; refactor to follow all our existing rules (globals, new, DI, return not output, no includes).
• Replace each inline instantiation of the new class in each of our class files with dependency injection, testing along the way.
234
Process (5/5)
• Final unit test and spot check
• Commit, push, notify QA
• Look for the next include call in classes
235
Search For Includes (1/2)• Search only classes for
^[ \t]*(include|include_once|require|require_once)
• Returns list of candidate include calls; find easiest one
• Then search again for that specific include in the entire codebase
• Found require ‘foo/bar/baz.php’; …
• ^[ \t]*(include|include_once|require|require_once).*baz\.php236
Search For Includes (2/2)
• Found require ‘foo/bar/baz.php’; …
• ^[ \t]*(include|include_once|require|require_once).*baz\.php
• Use only file name; relative paths (examine each for false positives)
• Count number of true positives leading to that include file
• Used once, or used more than once?237
Replace Single Include
• Copy include code directly over the include call in the class
• Run the unit tests for the class
• Delete the original include file and run *all* unit tests
• Refactor and retest: no globals, no new, use DI, use return not output, no includes
238
Replace Multiple Include
• Copy include code to new method (new or existing class)
• Pick an appropriate name for the method (and class if needed)
• If only method in new class, __invoke() is a good choice
239
Replace Original Include Call
• Find original include call
• Replace with inline instantiation of class and invoke the new method
• Violates rule about “new” keyword; we will address this later
• 051.php
240
Discover Missing Variables• Scope now separated so code breaks: variables not present
• Spot-check or unit test to find missing variables (E_ALL)
• Add to method by reference (exact same vars, reduces changes)
• Pass to the method in the calling code
• Let the tests guide you!
• 052.php241
Replace Other Includes
• Now replace other includes of the same file (use search results)
• Replace in the same way: inline instantiation and invocation
• Might be in classes or non-class files (e.g. view files)
• Spot check and unit test each replacement
242
Delete Include File Itself
• After replacing all calls to the file, delete the file
• Spot check and unit test again
243
Write Test For New Class MethodAnd Refactor
• Write unit test for include code extracted to class method
• Refactor: globals, new, DI, output not return, no includes
• Continuously run unit tests to ensure no breaks
244
Convert Inline Instantiations To Injection
• Find each inline instantiation that replaced the include
• Replace with dependency injection
• Unit test and spot check
245
Followup
• Final spot check and unit test
• Commit, push, notify QA
• Find next include call in classes
246
Can We Extract Many IncludesTo The Same Class In Different Methods?
• Example shows extraction to a single method in a new class
• Reasonable to extract related includes to the same class
• Be sure to use appropriate method names
247
What About Includes In Non-Class Files?
• Not so concerned about includes outside of classes
• View files might have lots of includes, but that’s their nature
248
Review
• Extracted all includes from class files
• Removed one of the last artifacts of legacy architecture
• The internal architecture now looks very good
• Time to look at overall end-to-end architecture elements
249
Separate Public Resources
250
Why Separate?• MVC separation, but entire application still in document rood
• Need special protections on private files
• Errors in web server configuration can expose the application
• Separate public and private resources
• Sets up structure for later refactoring to front controller251
Degenerate Front Controller
• Web server itself acting as router/dispatcher
• Page scripts are mapped directly onto file system
• Web server document root is base of mapping
• Fine for public resources, not for private ones (like config files)
252
Process• Coordinate with operations personnel to communicate intentions
• Create new document root directory in app with temp index file
• Reconfigure server to point to new document root and spot check
• Move public resources to new document root and spot check
• Commit, push, coordinate with operations for QA testing253
Coordinate With Operations Personnel
• The single most important part of the process
• Never make changes that affect server config without doing so
• Tell them our intent (i.e., “changing the document root”)
• Feedback from ops will tell us what we need to do
254
Create A Document Root Directory
• Create it in the existing application directory structure
• Call it something sensible; e.g. “public” or “docroot”
• Add an “index.html” file for spot checking
• 053.php
255
Reconfigure Server• Probably local development server or virtual instance server
• Apache-related .conf file uses DocumentRoot directive
• From DocumentRoot "/path/to/app" …
• … to DocumentRoot "/path/to/app/docroot"
• Reload or restart the server and browse to index.html256
Move Public Resources• Move public resources to docroot/ (page scripts, HTML, CSS, JS, etc)
• Do not move config files, class files, view files, CLI scripts, etc.
• If you should not be able to browse to it, do not make it public
• Modify include paths as necessary as this will change relative paths
• 054.php257
Followup
• Spot check as you move files, and complete spot check at end
• Commit
• Push
• Coordinate with operations for QA deployment and testing
258
Is This Really Necessary?
• Foundational element for the next step in modernization (front controller).
259
Review
• Beginning to refactor the overall architecture
• Created a separation between public and private resources
• New public document root that keeps the application private
260
Decouple URLs From Page Scripts
261
Why Decouple URLs?• Web server acting as combined router and dispatcher to pages
• Routes to page scripts are the document root file system
• Changes to file system mean changes to URLs, and vice versa
• Cannot intercept incoming request before dispatch to page script
• Add a Front Controller with Router as single point of entry262
Process• Coordinate with operations to communicate our intentions.
• Create Front Controller script in document root.
• Create pages/ directory for page scripts, and “not found” controller.
• Reconfigure the web server for URL rewriting and spot check.
• Move page scripts from docroot/ to pages/ and spot check.
• Commit, push, coordinate with operations for QA testing.263
Coordinate With Operations
• The single most important part of the process
• Never make changes that affect server config without doing so
• Tell them our intent (i.e., “enable URL rewriting”)
• Feedback from ops will tell us what we need to do
264
Add A Front Controller Script
• Add script as docroot/front.php
• Uses a “Router” to auto-map public URLs to page script paths
• Router is at http://mlaphp.com/code
• 055.php
265
Create Pages DirectoryAnd Page-Not-Found Script
• Front Controller refers to $pages_dir
• All page scripts will go there instead of document root
• Create top-level pages/ directory to receive the page scripts
• Create pages/not-found.php with related Controller and View
• 056.php266
Reconfigure The Server• Enable URL rewriting via appropriate config file for your web server
• In Apache on Ubuntu, this might be sudo 2enmod rewrite
• Instruct server to point all incoming requests to front controller
• 057.conf
• Reload or restart the server and spot check (not found!)267
Move Page Scripts• Move each page script from docroot/ to pages/ (works in either)
• Move only the page scripts, not other public resources
• Spot check after each move
• May need to change include-path values again as a result
• 058.txt268
Followup
• Commit
• Push
• Coordinate with operations for QA deployment and testing
269
Did We Really Decouple From The File System?
• Just moved from docroot/ to pages/
• URL path still maps to subdirectory path
• Even so, yes, now decoupled
• Intercessory Router can point any URL path to any script
• 059.php270
Review
• Moved page scripts out of document root
• Use front controller to route URL paths to page scripts
• Only 2 steps remain!
271
Remove Repeated Logic In Page Scripts
272
Why Remove Repetition?
• Each of our page scripts now highly repetitive
• Load setup, instantiate dependencies for page controller, invoke page controller, send response (060.php)
• Only part that is different is the individual dependencies
• Move surrounding repeated logic to Front Controller
273
Process
• Modify Front Controller to add setup, controller invocation, and response sending work.
• Modify each page script to remove the setup, controller invocation, and response sending work.
• Spot check, commit, push, and notify QA.
274
Modify Front Controller
• 061.php
• Replaced the “require Router” line with the setup line(s)
• Added lines to invoke controller and send the response
• Having used the same variable names everywhere makes this easy
275
Modify Each Page Script
• Remove the setup, controller invocation, and response sending
• Must apply to all page scripts
• 062.php
276
Followup
• Spot check
• Commit
• Push
• Notify QA
277
What If Setup Is Inconsistent?
• Some page scripts might have different or varying setup work
• Setup has to be identical for extraction to Front Controller
• *Force* the setup work to be the same if needed
• If cannot force it, move all shared setup to Front Controller and leave special case setup in the page script (degenerate but necessary)
278
What If Variable Names Are Inconsistent?
• We have emphasized consistent naming of variables throughout
• However, we all make mistakes and oversights
• Change names as needed to make them consistent with Front Controller expectations
279
Review
• Page scripts now reduced to bare core of dependency creation
• Front Controller handles all surrounding logic
280
Replace Page Scripts With DI Container
281
Why A DI Container?• Page scripts are at a bare core of dependency creation functionality
• Still including page scripts into Front Controller Scope
• Dependency creation functionality is itself somewhat repetitive
• Move all dependency creation logic to its own layer (Container)
• Custom container from http://mlaphp.com/code 282
Process (1/3)
• Add a services.php file to create the Container and its services
• Define a “router” service in the Container
• Modify the Front Controller to include the services.php file and use the “router” service
283
Process (2/3)
• Extract creation logic from each page script to the Container.
• Define a service named for the page script.
• Copy the page script logic into that service, renaming variables as needed.
• Route the URL path for the page script to the container service.
• Spot check, commit, and move on to the next page script.
284
Process (3/3)
• Remove the empty pages/ directory.
• Spot check and unit test
• Commit, push, notify QA
285
Add A DI Container File
• Add a file named “services.php” in the setup files location
• Yes, this is another include, but having a separate file will be organizationally clean
• Populates the container with all existing global variables as properties
$di = new \Mlaphp\Di($GLOBALS);286
Add A Router Service
• 063.php
287
Creating Service Definitions• Convention: use all lower-case names for shared services,
use fully-qualified class names for new instance services
• Service definitions are callables (typically closures) to create objects
• Create *and return* the object from the service definition
• Services are lazy-loaded (created only when called)
• $di->get() for a shared object, $di->newInstance() for a new instance288
Modify The Front ControllerTo Use The Router Service
• 064.php
• If the Container has a service that matches the returned Route name, use the service object
• Otherwise treat the Route name as a pages/ path
• Either way, we get a $controller variable
• Spot check and commit289
Extract Each Page ScriptTo A Container Service
• In services.php, add a $di->set() line with the fully-qualified class name of the page script controller, and an empty closure
• Copy the page script logic into the closure for the service definition
• Change global variables to $di properties
• 065.php
290
Route URL Path To Container Service
• 066.php
• Returns a Container service name now instead of a pages path
291
Spot Check• Browse to the converted URL path and see if it works
• Common errors:
• Failure to replace $var with $di->var
• Failure to return the object from the service definition
• Mismatch between service name and mapped route name292
Followup• Commit
• Replace next page script
• When all are deleted, remove pages/ dir
• Spot-check overall
• Push, notify QA293
How Can We Refine Our Service Definitions?
• When extraction is complete, service definitions long and repetitive
• Can further reduce creation logic by extracting shared services and separating class instance creation logic
• 067.php
294
What If There Are IncludesIn The Page Script?
• Hold your nose and copy them over to the container service
• Look for commonalities in setup and extract them to classes
• The classes themselves can become services if needed
295
How Can We Reduce The Size Of The Services File?
• Container definitions may reach tens or hundreds of definitions
• No way to reduce volume in total, only to organize more effectively
• Reasonable to split definitions across multiple files, and make services.php include those files
296
How Can We Reduce The Size Of The Router Service
• The router contains one entry for each URL in the application
• Reasonable to create a routes.php file that returns an array for the router definition
• 068.php
• Again, no way to reduce volume, only to organize more effectively
297
Review
• Added a Container for creation logic and removed page scripts
• No longer page-based; modernization is complete
• Celebrate!
• But new work still awaits :-)
298
Conclusion
299
From Legacy …
• Page-based, include-oriented, spaghetti mess of globals
• No clear separation of concerns (SQL, MVC, etc)
• No tests
• All resources in the public document root
300
… To Modern• Autoloaded (no more includes)
• Dependency Injected (no more globals or new)
• Unit Tested
• Layer Separated (SQL, MVC)
• Front Controlled (private resources, router, container)301
Still Many Opportunities For Improvement
• Just because it’s modern, doesn’t mean we’re done
• The application will always have need for greater perfection
• The modernization process has revealed many opportunities for us
302
Data Source Layer
• Currently consists of Gateways
• Consider converting to Data Mappers for cleaner object interaction
303
Models
• Currently Transaction Scripts
• Consider Domain Model and/or Service Layer
• Consider DDD to further refine the logic and structure
304
Views
• Still relatively monolithic and include-oriented
• Consider Two Step View for layouts separated from central elements
305
Controllers
• May be handling unrelated actions
• May contain too much logic
• Consider extracting to Model or View layers
306
Model-View-Controlleror Action-Domain-Responder?
• Model-View-Controller itself might not be sufficient
• Consider refactoring further to Action-Domain-Responder
• http://pmjones.io/adr
307
Routing System
• Currently a transitional/temporary state
• Prefer path-param-based URLs instead of query-string-based
• Consider replacing with more powerful router
• http://github.com/auraphp/Aura.Router
308
Dispatcher System
• Front Controller handles dispatching of routes
• Consider separating dispatch from routing
• http://github.com/auraphp/Aura.Dispatcher
309
Dependency Injection Container
• DI Container is still very “manual” in nature
• Consider more powerful Container system
• http://github.com/auraphp/Aura.Di
310
A Better Class Of Problems• We still have problems, but they are *high quality* problems
• We can address them in a much better way than in a legacy system
• We can use Refactoring (Fowler et al) more effectively
• Better separation of code allows us to address independent areas
• Unit tests will keep us from introducing breaks311
Conversion To Framework?
• With improved architecture, can consider moving to a framework
• Easier to determine which parts can go, and which can stay
• Consider the tradeoffs (public support vs higher discipline)
• Cannot judge for you; you will have to determine for yourself
312
Our Own Growth
• We now have a better quality of professional life
• No longer dread working with the codebase
• More productive time, less time frustrated and demoralized
• Our programming practices have improved as well
313
Go Tell Others!
• Continue improving our code and our practices
• Tell others that they need not suffer with legacy codebases
• As code gets better, the more lives get better
• Move on to more interesting and challenging issues
314
Thanks!
• http://leanpub.com/mlaphp/c/zendcon2014
• http://paul-m-jones.com/
• @pmjones
315