36
Computer Science in WordPress Taylor Lovett

What You Missed in Computer Science

Embed Size (px)

DESCRIPTION

This presentation explains what Computer Science actually entails. It covers ways to describe code performance using Big-Oh notation comparing different post meta and taxonomy queries and it discusses concurrency as it applies to WordPress, specifically data races and how they can occur while counting post views.

Citation preview

Page 1: What You Missed in Computer Science

Computer Science in

WordPress

Taylor Lovett

Page 2: What You Missed in Computer Science

My name is Taylor Lovett

- Senior Strategic Web Engineer at 10up

- Core Contributor

- Plugin Author (Safe Redirect Manager)

- Plugin Contributor

- BS in Computer Science from the University

of Maryland, College Park

Page 3: What You Missed in Computer Science

What is Computer Science?

- It can mean a lot of things. It is really the

study of computational theory, computer

software, and hardware.

Page 4: What You Missed in Computer Science

Theory of Computation

- General Mathematics (Calculus, linear

algebra, general computational theory,

statistics)

- Algorithms (a method to solve a problem)

- Data structures (which data structure will

allow us to access our data the quickest?)

- Graph theory

Page 5: What You Missed in Computer Science

Computer Software

- Programming techniques and design patterns

(i.e a singleton class)

- Concurrent design patterns (data races)

- Mobile software development

- Operating system software

- Web development

- Databases

- Networking

- Benchmarking

Page 6: What You Missed in Computer Science

Computer Hardware

- Motherboards

- Memory types (solid state, RAM, etc.)

- Benchmarking (processor execution time)

- Pipelining

- Processors

Page 7: What You Missed in Computer Science

Big-Oh Notation

- "Big O notation is used to classify algorithms by how

they respond (e.g., in their processing time or working

space requirements) to changes in input size." --

Wikipedia

- Very useful to describe how performant your code

may or may not be

- Big-Oh usually describes the upper bound of a

function (worst-case)

Page 8: What You Missed in Computer Science

Big-Oh Notation (cont.)

- Big-Oh notation is concerned with measuring the rate

of growth of the amount of processing that your code

might do on an unknown input size

- In Big-Oh we are only concerned about how a our

code performs as the input size approaches infinity.

Mathematically speaking, this means we only care

about the highest order term:

i.e. O(3n2 + 5n) = O(n2) since as n approaches infinity

the only thing that matters is the n2

Page 9: What You Missed in Computer Science

Let's look at some

examples!

Page 10: What You Missed in Computer Science

// $fruits contains a non-empty array of strings

function contains_orange( $fruits = array() ) {

for ( $i = 0; $i < count( $fruits ); $i++ ) {

if ( 'orange' == $fruits[$i] ) return true;

}

return false;

}

Best Case Scenario: Loop executes once,

orange is found, and it returns.

Worst Case Scenario: Loop executes n times

(where n is the number of elements in $fruits)

Performance: contains_orange() is in O(n)

Page 11: What You Missed in Computer Science

Remember!

- With Big-Oh we are only concerned with what

happens in the worst case. Sometimes knowing

what happens in the best case is useful, but we

are mostly worried about the performance hit

our code could take in the worst possible

situation.

Page 12: What You Missed in Computer Science

// $fruits contains a non-empty array of strings. For educational

// purposes, $fruits is guaranteed to have at least one duplicate.

function contains_duplicate_fruit( $fruits = array() ) {

for ( $i = 0; $i < count( $fruits ); $i++ ) {

for ( $z = 0; $z < count( $fruits ); $z++ ) {

if ( $i != $z && $fruits[$z] == $fruits[$i] )

return true;

}

}

return false;

}

What does everyone think?

Page 13: What You Missed in Computer Science

Best Case Scenario: Outer loop executes

once, inner loop executes twice, duplicate is

found, function returns

Worst Case Scenario: Outer loop executes n -

1 times (where n is the size of $fruits), inner

loop executes n times for each outer loop

execution... n * (n -1) = n2 - n

Performance: contains_duplicate_fruit is in

O(n2 - n) = O(n2)

Page 14: What You Missed in Computer Science

An important reminder

- We dropped the (-n) from our final Big-Oh

evaluation because, as n approaches infinity,

n2 dominates and (-n) becomes insignificant.

Page 15: What You Missed in Computer Science

But seriously... How is

this useful?

Page 16: What You Missed in Computer Science

Big-Oh Notation and Databases

- Big-Oh notation is used a lot in conjunction

with SQL operations.

- We've all heard that indexing a column in

MySQL makes search on that column faster.

- But why? What does that actually mean?

Page 17: What You Missed in Computer Science

MySQL Indexes

- An index is a data structure that speeds up

search time for information.

- Without an index, searching for a specific

column value is O(n) because in the worst case

scenario every single row in the table must be

examined.

Page 18: What You Missed in Computer Science

MySQL Indexes

- When a column is indexed, MySQL takes the data

across all of the rows in that column and stores

references to that data in a B-tree (this structure is

used for the majority of index types).

- A B-tree is just what it sounds like: A tree of data that

speeds up search time. The worst case scenario for

the amount of items to be processed in a B-tree is log

n. A log is a mathematical function such that:

n2 > n > log n

http://en.wikipedia.org/wiki/B-tree

Page 19: What You Missed in Computer Science

Post Meta Queries

- The full Big-Oh analysis of a post meta query is

pretty complex because of the join operation and

therefore is outside the scope of this talk.

- For our purposes, searching for posts based on a

meta key is O(n) where n is the number of posts that

have that key.

- Let's frame this in terms of featured posts. Featured

posts refers to the situation where a website needs to

mark select posts as featured and query for them.

Page 20: What You Missed in Computer Science

Featured Posts Solution #1

On post update:

if ( isset( $_POST['meta_box_feature'] ) )

update_post_meta( $post_id, 'featured', 1 );

else

update_post_meta( $post_id, 'featured', 0 );

Query:

$args = array(

'meta_key' => 'featured',

'meta_value' => 1,

);

$featured_posts = new WP_Query( $args );

Page 21: What You Missed in Computer Science

Solution #1 Analysis

- Using this code, every time a post is saved, it will have

post meta attached to it such that 'featured' = 1 or 0. This

will create a ton of unnecessary post meta rows.

- Remember searching for posts based on a meta key is

O(n) where n is the number of posts that have that key.

Therefore saving meta when a post is not featured is not

only unnecessary but will really slow us down. This would

result in O(m) performance where m is the number of

posts!

Page 22: What You Missed in Computer Science

Featured Posts Solution #2

On post update:

if ( isset( $_POST['meta_box_feature'] ) )

update_post_meta( $post_id, 'featured', 1 );

else

delete_post_meta( $post_id, 'featured' );

Query:

$args = array(

'meta_key' => 'featured',

'meta_value' => 1,

);

$featured_posts = new WP_Query( $args );

Page 23: What You Missed in Computer Science

Solution #2 Analysis

- This solution is a major improvement over our first

one. This will result in O(n) search time where n is the

number of featured posts.

- However, we can still do better.

Page 24: What You Missed in Computer Science

Featured Posts Solution #3

Let's create a tag called 'featured' and attach it to all our featured

posts:

On init:

$args = array( ... );

register_taxonomy( 'featured', 'post', $args );

Query:

$args = array(

'post_tag' => 'featured'

);

$featured_posts = new WP_Query( $args );

Page 25: What You Missed in Computer Science

Solution #3 Analysis

- For our purposes, searching for posts based on a tag

is O(log n) since there is an index on the tag id

column.

The full Big-Oh analysis of our tag solution is pretty

complex due to SQL join operations and therefore is

beyond the scope of this talk.

Page 26: What You Missed in Computer Science

Concurrency

- In Computer Science concurrency is a

property describing the event where multiple

computations are executed simultaneously,

sometimes interacting with each other.

Page 27: What You Missed in Computer Science

Concurrency

- With concurrent programming we can, among

other things, force each core in a computer to

process a piece of a larger problem or handle

separate tasks. This is extremely powerful.

- When not properly account for, Concurrency

can sometimes result in unexpected bugs that

are difficult to reproduce.

Page 28: What You Missed in Computer Science

Concurrency in WordPress

- Concurrency takes a slightly different form in

WordPress. We don't solve problems by

starting new threads/processes. However,

since behind the scenes servers can run

multiple processes at the same time and thus

multiple users can execute the same code

simultaneously, issues surrounding

concurrency can arise.

Page 29: What You Missed in Computer Science

Tracking Postviews in WordPress

- A common request in WordPress is to display the

number of views for each post on the frontend.

- There are many different ways to approach this

problem; the most common is to increment an

integer stored in post meta each time a post is

viewed, then to display this number for each post.

- This implementation can lead to data races.

Page 30: What You Missed in Computer Science

Here is the code that executes on

each post request

$views = get_post_meta( $id, 'views', true );

$views++;

update_post_meta( $id, 'views', $views );

Page 31: What You Missed in Computer Science

Data Races

- A data race is the situation where two or more

threads access a shared memory location, at

least one of those accesses is a write, and the

order of the accesses is unknown (meaning

there are no explicit locking mechanisms used).

- Think of each page request as a thread on the

server. If two users request a post at the same

time, a data race for pageviews occurs since

both accesses are writing to the postmeta

table.

Page 32: What You Missed in Computer Science

A Possible Ordering of Events

Code executed for User A is in red and User B in blue

$views = get_post_meta( $id, 'views', true ); // $views = 0

$views++; // $views = 1

update_post_meta( $id, 'views', $views ); // _views = 1

$views = get_post_meta( $id, 'views', true ); // $views = 1

$views++; // $views = 2

update_post_meta( $id, 'views', $views ); // _views = 2

In this ordering of events, $views ends up with a value of 2

which is what we want. However, these events could occur

in any order...

Page 33: What You Missed in Computer Science

Another Ordering of Events

$views = get_post_meta( $id, 'views', true ); // $views = 0

$views = get_post_meta( $id, 'views', true ); // $views = 0

$views++; // $views = 1

$views++; // $views = 1

update_post_meta( $id, 'views', $views ); // _views = 1

update_post_meta( $id, 'views', $views ); // _views = 1

In this ordering of events, $views ends up with a value of 1

which is NOT what we want.

Page 34: What You Missed in Computer Science

Conclusion:

This algorithm won't work!

Page 35: What You Missed in Computer Science

Solution to Pageview Problem?

Solution 1: Jetpack plugin. We can install

Jetpack and leverage it's stats API to query

information on specific posts.

Solution 2: Google Analytics. Using a websites

Google Analytics account, we can set custom

variables on a post-to-post basis and query the

API based on those variables.

Page 36: What You Missed in Computer Science

Questions?