20 date-times

  • View
    1.169

  • Download
    0

  • Category

    Business

Preview:

Citation preview

Today we will use the ggplot2 and lubridate packages

install.packages("lubridate")library(ggplot2) #load first!library(lubridate) #load second!

Garrett GrolemundPhd Student / Rice University

Department of Statistics

Dates and Times

1. four more table rules

2. manipulating date-times

3. manipulating time spans (i.e, math with date-times)

Table style guidelines

1. When to use a table

Use a table instead of a graphic if:

a) there is only a small amount of data, or

b) precision is important

2. Significant digits

Pick an appropriate amount of significant digits (at most 4 or 5)

Use signif( ) to round data to that amount

3. Align decimals

Ensure that decimal points line up so that differences in order of magnitude are easy to spot.

3. Align decimals

Ensure that decimal points line up so that differences in order of magnitude are easy to spot.

4. Include captions

Always include a captions.

Less enthusiastic readers will only look at your figures, so try to summarize the whole story there.

Captions should explain what data the figure shows and highlight the important finding.

date-times in R

Time is a measurement system similar to the number system (which measures quantity). Just as numbers can be arranged on a number line, times can be arranged on a time line

monotonically increasing

0 CEBC AD

0 CE

A date-time is a specific instant of time. It refers to an exact point on the time line. For example,

January 1st, 2000 12:34:00, or

right nowJanuary 1st, 2000 12:34:00

right now

0 CE

2000-01-01 12:34:00

Identifying instants

x seconds since 0 CE

Instants of time are commonly identified in two ways:

1) as the number of seconds since a reference time

2) by a unique combination of year, month, day, hour, minute, second, and time zone values

R stores date-times as either POSIXct or a POSIXlt objects.

POSIXct objects are stored as the number of seconds since a reference time (by default 1970-01-01 00:00:00 UTC)

unclass(now())

POSIXlt objects are stored as a unique combination of year, month, day, hour, minute, second, and time zone values

unclass(as.POSIXlt(now())

Parsing datesParse character strings into date-times with the ymd() type functions in lubridate

e.g, ymd("2010-11-01")*

*note: this is also a quick way to create new dates

Parsing datesuse the function whose name matches the order of the elements in the date

ymd("2010-11-02")

dmy("02/11/2010")

mdy("11.02.10")

ymd_hms("2010-11-02 04:22:58")

Your turn

Parse the column of dates in emails.csv into POSIXct objects.

Access a specific element of a date-time with the function that has the element’s name

now()year(now())hour(now())

day(now())yday(now())wday(now())wday(now(), label = TRUE)wday(now(), label = TRUE, abbr = FALSE)

month(now(), label = TRUE, abbr = FALSE)

Determine which day of the week you were born on

Your turn

Use the email data to determine at which hour of the day Professor Wickham is most likely to respond to your email.

em <- read.csv("emails.csv")

em$hour <- hour(em$time)

qplot(hour, data = em, binwidth = 1)

Accessor functions can also be used to change the elements of a date-time

now()year(now()) <- 1999hour(now()) <- 23

day(now()) <- 45

tz(now()) <- "UTC"

Determine what day of the week your birthday

will be on next year

Time zones

Different clock times. All refer to the same instant of time

2010-11-01 03:53:06 PDT

2010-11-01 04:53:06 MDT

2010-11-01 05:53:06 PDT

2010-11-01 06:53:06 EDT

Switch between time zones with with_tz()

with_tz(now(), "UTC")

with_tz(now(), "America/New_York")

Same clock times. All refer to the different instants of time

2010-11-01 03:53:06 PDT

2010-11-01 03:53:06 MDT

2010-11-01 03:53:06 PDT

2010-11-01 03:53:06 EDT

Alter time zone with force_tz()

Recall that force_tz() returns a new instant of time

force_tz(now(), "UTC")

force_tz(now(), "America/New_York")

What time is it now where Hadley is (London)? What time was it here when London clocks displayed our current

time?

rounding instants

We can round an instant to a specified unit using floor_date(), ceiling_date(), round_date()

e.g.

round_date(now(), "day")

round_date(now(), "month")

Your turn

Use ddply to calculate how many emails Hadley received per day since September 1st. How will you subset? How will you pair up emails sent on the same day?

Plot the number of emails per day over time. Compare the number of emails sent by day of the week.

# just the recent emails

recent <- subset(em, time > ymd("2010-09-01"))

# binning into days

recent$day <- lubridate::floor_date(recent$time, "day")

# calculating daily totals

> daily <- ddply(recent, "day", summarise, words = sum(words), emails = length(day))

qplot(day, emails, data = daily, geom = "line")

qplot(wday(day, label = T), emails, data = daily, geom = "boxplot")

Time spans/ math with date-times

0 CE

A time span is a period of time. It refers to an interval on the time line. For example,

19 months, or

one century

one century 19 months

What does time measure?

- one half of something called space-time?

- position of the sun?

- tilt of the Earth’s axis?

- number of days left until the weekend?

- All of the above (and poorly at that)?

The month suggests the tilt of the Earth’s axis (but requires a leap day to get back in sync)

The hour suggests where the sun is in the sky (but requires time zones and daylight savings)

The Earth’s movement is decelerating, but space-time is constant (which requires random leap seconds)

as.period(diff(.leap.seconds))

ConsiderSuppose we wish to record the opening value of the S&P 500 everyday for a month. Since the stock market opens at 8:30 CST, we could calculate:

force_tz(ymd_hms("2010-01-01_08:30:00"), "") + ddays(0:30)

how about

force_tz(ymd_hms("2010-03-01_08:30:00"), "") + ddays(0:30)

What went wrong?

Why does this matter?

What do we mean by exactly one month from now?

How long is an hour?

How long will an hour be at 2:00 this sunday morning?

According to clock times, the time line looks more like this

day light savings leap dayday light savings

3 types of time spans

We can still do math on the time line, as long as we’re specific about how we want to measure time.

We can use 3 types of time spans, each measures time differently: durations, periods, and intervals

DST <- force_tz(ymd_hms("2010-11-7 01:06:39"), "")

durationsDurations measure the exact amount of seconds that pass between two time points.

DST + ddays(1)

Use new_duration() or a helper function to create a durations object. Helper functions are named d + the

plural of the object you are trying to create.

new_duration(3601)new_duration(minute = 5)dminutes(5)dhours(278)dmonths(4) #no dmonths()

Durations are appropriate when you wish to measure a time span exactly, or compare tow time spans. For example,

- the radioactive half life of an atom

- the lifespan of two brands of lightbulb

- the speed of a baseball pitch

Periods measure time spans in units larger than seconds. Periods pay no attention to how many sub-units occur during the unit of measurement.

periods

DST + days(1)

{

{No surprises.

2010-11-01 00:00:00 + months(1) will always equal 2010-12-01 00:00:00 no matter how many leap seconds, leap days or changes in DST occur in between

Why use periods?

=

Why not use periods?We cannot accurately compare two periods unless we know when they occur.

1 month = 31 daysJanuary = 31 days

February = 31 days

?

Use new_period() or a helper function to create a period object. Helper functions are simply the plural

of the object you are trying to create.

new_period(3601)new_period(minute = 5)minutes(5)hours(278)months(4) # months are not a problem

Periods are appropriate when you wish to model events that depend on the clock time. For example,

- the opening bell of a stock market

- quarterly earnings reports

- reoccurring deadlines

parsing time spans

a time span that contains only hours, minutes, and seconds information can be parsed as a period with hms(), hm(), ms(), hours(), minutes(), or seconds()

e.g ms("11:45")

Intervals measure a time span by recording its endpoints. Since we know when the time span occurs, we can calculate the lengths of all the units involved.

intervals

{

Intervals retain all of the information available about a time span, but cannot be generalized to other spots on the time line.

Intervals can be accurately converted to either periods or durations with as.period() and as.duration()

Create an interval with new_interval() or by subtracting two dates.

int <- ymd("2010-01-01") - ymd("2009-01-01")

Access and set the endpoints with start() and end(). Note that setting preserves length (in seconds).

start(int)

end(int) <- ymd("2010-03-14")

Intervals are always positive

converting between time spans

Periods can be converted to durations by using the most common lengths (in seconds) of each time unit.

These are just estimates. For accuracy, convert a period to a interval first and then convert the interval to a duration.

arithmetic with date-times

time spans are meant to be added and subtracted to both each other and instants

e.g, now() + days(1) - minutes(25:30)

Challenge

Write a function that takes a date and returns the last day of the month the date occurs in.

multiplication/division

Multiplication and division of time spans works as expected.

Dividing periods with durations or other periods can only provide an estimate. Convert to intervals for accuracy.

Dividing intervals by durations creates an exact answer.

Dividing intervals by periods performs integer division.

modulo arithmetic and integer division

integer division128 %/% 5 = 25

modulo128 %% 5 = 3

Your turn

Calculate your age in minutes. In hours. In days. How old is someone who was born on June 4th, 1958?

lifetime <- now() - ymd("1981-01-22")

lifetime / dminutes(1)

lifetime / dhours(1)

lifetime / ddays(1)

(now() - ymd("1981-01-22")) / years(1)

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.