Upload
marcus-deglos
View
423
Download
0
Embed Size (px)
Citation preview
Faster, SmallerA core feature proposal for improving file
synchronization between Drupal environments
Wednesday, 17 February 16
How it works nowStream wrappers
• public://
• private://
• temporary://
Locations stored in variables:
• file_public_path
• file_private_path
• file_temporary_path
Wednesday, 17 February 16
Core save functions [D7]
file_save_data($data, 'public://foo');
file_unmanaged_save_data($data, 'public://bar');
Wednesday, 17 February 16
Configuring Color [D7]
$id = $theme . '-' . substr(hash('sha256', serialize($palette) . microtime()), 0, 8);
$paths['color'] = 'public://color';
$paths['target'] = $paths['color'] . '/' . $id;
foreach ($paths as $path) {
file_prepare_directory($path, FILE_CREATE_DIRECTORY);
}
Wednesday, 17 February 16
Aggregated CSS [D7]$filename = 'css_' . drupal_hash_base64($data) . '.css';// Create the css/ within the files folder.$csspath = 'public://css';$uri = $csspath . '/' . $filename;// Create the CSS file.file_prepare_directory($csspath, FILE_CREATE_DIRECTORY);
if (!file_exists($uri) && !file_unmanaged_save_data($data, $uri, FILE_EXISTS_REPLACE)) {
return FALSE;}
Wednesday, 17 February 16
Hoarders Paradise
Like a hoarder who keeps everything, it all ends up in one big bucket of stuff
sites/default/files
Wednesday, 17 February 16
Sync all the things!rsync -rltp live.example.com:/var/www/sites/default/files/ stage.example.com:/var/www/sites/default/files/
Big bucket of stuff
Big bucket of stuff
Live Stage
Wednesday, 17 February 16
Several hours later…
Wednesday, 17 February 16
Big bucket of stuffHere is the 'stuff' on my blog site:
cssctoolsdocument_uploads.htaccessjsstaticxmlsitemap
Wednesday, 17 February 16
Big bucket of stuffHere is the 'stuff' that I actually need to sync:
cssctoolsdocument_uploads.htaccessjsstaticxmlsitemap
Wednesday, 17 February 16
Excluding cachesSome files are auto-generated caches, such as:
• Aggregated CSS/JS
• Image-style thumbnails
• Sitemaps
Wednesday, 17 February 16
CostFor sites that are image-heavy, and/or have a large number of image-styles, the 'regenerable content' can be many times the size of the original source.
Wednesday, 17 February 16
More efficient rsyncrsync -rltp --exclude css --exclude ctools --exclude js --exclude styles --exclude xmlsitemap live.example.com:/var/www/sites/default/files/ stage.example.com:/var/www/sites/default/files/
Wednesday, 17 February 16
More efficient rsyncrsync -rltp --exclude css --exclude ctools --exclude js --exclude styles --exclude xmlsitemap live.example.com:/var/www/sites/default/files/ stage.example.com:/var/www/sites/default/files/
Becomes confusing and needs maintenance.
Wednesday, 17 February 16
What if there were TWO buckets?
Smaller bucket of
stuff
Big bucket of stuff I can
rebuild
Smaller bucket of
stuff
Live Stage
Wednesday, 17 February 16
Additionalstream-wrappers?
Stream wrappers
• public://
• private://
• temporary://
• cache-public://
• cache-private://
Locations stored in variables:
• file_public_path
• file_private_path
• file_temporary_path
• file_cache_public_path
• file_cache_private_path
Wednesday, 17 February 16
Precedents
• Drupal data-cache API.
• By default, uses DB tables
• Abstracted via cache-bins
• Cache tables identified via hook_flush_caches()
Wednesday, 17 February 16
DX: where is safe?When I first started Drupalling, I had a client who requested the ability to add custom CSS. So I created a quick UI in the admin area, thought about how to store the data, and decided that it would be sensible to reuse the sites/default/files/css path.
It was a shock a couple of days after launch, when the client asked "Where has my custom CSS gone?"
Wednesday, 17 February 16
DX: where is safe?function drupal_clear_css_cache() {
file_scan_directory(file_create_path('css'), '.*', array('.', '..', 'CVS'), 'file_delete', TRUE);
// Clear the page cache, so cached pages do not reference nonexistent CSS.
cache_clear_all();
}
Wednesday, 17 February 16
DX: where is safe?Yes, everything beneath sites/default/files/css was deleted.
This was back in D5, and it is a little better in D8: it only deletes files that haven't been modified in 30 days.
Be careful where you put your assets!
Wednesday, 17 February 16
DXSeparating persistent storage from regenerable cache storage will make it easier for developers to recognise and implement good directory-structure habits, and give a warning sign to dangerous locations (e.g. cache-public://css is more obviously risky a place to store persistent files than public://css).
Wednesday, 17 February 16
Backward CompatibilityIf the variable for cache-public:// doesn't exist, it could inherit the setting used by public://.
Reusing the same location as public:// would mean that for most users, there wouldn't be any noticeable change, or any break in their configuration.
Wednesday, 17 February 16
Risky synchronization?In some cases, running rsync on the entirety of sites/default/files can be harmful.
Some autogenerated content - such as XML sitemaps - may be specific to an environment: for example, the base URL is often different between stage and live.
This could cause all sorts of unwanted side-effects: duplicate notifications and inaccurate test results are just two that immediately spring to mind.
Wednesday, 17 February 16
Edge-casesThere may be custom or contrib code expecting assets such as image thumbnails to belong under public:// - e.g. looking up information such as the size of the image.
If the site were upgraded, and the developer also moved the location of cache-public://, this could cause failures such as recursive lookups, and the cause may not be immediately apparent to the developer.
Wednesday, 17 February 16
Edge-casesOn the whole, I think the edge-cases are minimal, and can be addressed by good communication of the implications of the change.
Wednesday, 17 February 16
Potential use-cases
• Synchronization of files between environments (e.g. live to staging)
• Backups
• Proxies/CDN delivery
• Garbage collection: scanning for orphaned/removable files
Wednesday, 17 February 16
Goals of the change1. All data in public:// should be persistent and
necessary.
2. All data in cache-public:// should be disposable, and regenerable from other sources.
3. All data in public:// should be tracked in the file-usage API; untracked files indicate orphaned/deletable content.
Wednesday, 17 February 16
SummaryAdding two stream-wrappers to core would allow regenerable content to be stored separately from persistent content, simplifying a number of tasks such as back and synchronization between environments.
This change would be backwards-compatible, would not affect existing sites without action from the site-owner, and would improve developer's understanding of directory structures created by modules.
Wednesday, 17 February 16