14
pandas4xr Alan G. Isaac January 29, 2014 1 Introduction to Exchange Rate Data with Python As a preliminary, we import some modules we will be using. We adopt conventional short names. In [2]: #preliminaries import datetime import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt %matplotlib inline import pandas as pd Some of these libraries change rapidly, so it is good to know what version you are using. In [68]: print("numpy version {}".format(np.__version__)) print("matplotlib version {}".format(mpl.__version__)) print("pandas version {}".format(pd.__version__)) numpy version 1.8.0 matplotlib version 1.3.1 pandas version 0.12.0 1.1 Learning Outcomes • understand the Series and DataFrame data types of Pandas • access and analyze online data • access and analyze CSV files • use Pandas and Matplotlib basic time-series charts • smooth with moving average 2 Pandas: Basic Data Analysis Pandas background: • website: http://pandas.pydata.org • Wes McKinney’s intro: http://vimeo.com/59324550

pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

pandas4xr

Alan G. Isaac

January 29, 2014

1 Introduction to Exchange Rate Data with Python

As a preliminary, we import some modules we will be using. We adopt conventional short names.

In [2]:

#preliminariesimport datetimeimport numpy as npimport matplotlib as mplimport matplotlib.pyplot as plt%matplotlib inlineimport pandas as pd

Some of these libraries change rapidly, so it is good to know what version you are using.

In [68]:

print("numpy version {}".format(np.__version__))print("matplotlib version {}".format(mpl.__version__))print("pandas version {}".format(pd.__version__))

numpy version 1.8.0matplotlib version 1.3.1pandas version 0.12.0

1.1 Learning Outcomes

• understand the Series and DataFrame data types of Pandas• access and analyze online data• access and analyze CSV files• use Pandas and Matplotlib basic time-series charts• smooth with moving average

2 Pandas: Basic Data Analysis

Pandas background:

• website: http://pandas.pydata.org• Wes McKinney’s intro: http://vimeo.com/59324550

Page 2: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

2.1 Data Types

Core information: http://pandas.pydata.org/pandas-docs/dev/dsintro.htmlTwo important data types defined by pandas:Series and DataFrame.

A DataFrame is like a spreadsheet; a Series is like a spreadsheet column. We often think of a series as observationson a single variable. Note: all the data has a common type. Extensive documentation is available online. See:http://pandas.pydata.org/pandas-docs/dev/dsintro.html

Series

In [3]:

s1 = pd.Series([10,11,12,13], name=’my series’)print(s1)print("the first element is {}".format(s1[0]))s1[0] = 20print("Now the first element is {}".format(s1[0]))print(s1.describe())

0 101 112 123 13Name: my series, dtype: int64the first element is 10Now the first element is 20count 4.000000mean 14.000000std 4.082483min 11.00000025% 11.75000050% 12.50000075% 14.750000max 20.000000dtype: float64

You can index and slice a series. See http://pandas.pydata.org/pandas-docs/dev/indexing.html

In [31]:

print(s1[0])printprint(s1[0::2])

20

0 202 12Name: my series, dtype: int64

A Series has an associated plot method, which depends on Matplotlib.

In [26]:

ax = s1.plot()ax.legend()

Out [26]:

<matplotlib.legend.Legend at 0xa5a4908>

Page 3: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

Note that the Series is plotted against its index.

In [29]:

s1.index

Out [29]:

Int64Index([0, 1, 2, 3], dtype=int64)

DataFrame

We can create a DataFrame many ways. One way is to combine multiple Series objects.

In [4]:

s2 = pd.Series([30,31,32,33],name="series two")#first approach: concatenation of seriesdf1 = pd.concat([s1,s2], axis=1)df1

Out [4]:

my series series two0 20 301 11 312 12 323 13 33

In [9]:

#another approach, giving new namesdf1 = pd.DataFrame(dict(s1=s1,s2=s2))df1

Out [9]:

s1 s20 20 301 11 312 12 323 13 33

You can easily extract a series: its name is an attribute of your DataFrame.

In [16]:

print(df1.s1)printprint(type(df1.s1))

Page 4: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

0 201 112 123 13Name: s1, dtype: int64

<class ’pandas.core.series.Series’>

You can index and slice a DataFrame.

See: http://pandas.pydata.org/pandas-docs/dev/indexing.html

In [15]:

print(df1.ix[0]) #get the first rowprintprint(df1[::2]) #get every second row

s1 20s2 30Name: 0, dtype: int64

s1 s20 20 302 12 32

<class ’pandas.core.series.Series’>

Often we want our index to be named.

In [5]:

df1.index.name="period"df1

Out [5]:

s1 s2period0 20 301 11 312 12 323 13 33

A DataFrame also has a plot method.

In [6]:

df1.plot();

Page 5: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

We can create dataframes with time-series data from files or from the web. We are particularly interested in data fromFRED.

2.2 Get Data from FRED

In [17]:

import pandas.io.data as webdatatstart = datetime.datetime(2010, 1, 1)tend = datetime.datetime(2013, 12, 15)#get the EURUSD exchange rateexuseu = webdata.DataReader("EXUSEU", "fred", tstart, tend);print(type(exuseu))print(exuseu.head())

<class ’pandas.core.frame.DataFrame’>EXUSEU

DATE2010-01-01 1.42662010-02-01 1.36802010-03-01 1.35702010-04-01 1.34172010-05-01 1.2563

In [11]:

print(exuseu.index) #this is a DatetimeIndexprintprint(exuseu.describe())ax1 = exuseu.plot()

<class ’pandas.tseries.index.DatetimeIndex’>[2010-01-01 00:00:00, ..., 2013-12-01 00:00:00]Length: 48, Freq: None, Timezone: None

EXUSEUcount 48.000000

Page 6: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

mean 1.333623std 0.055664min 1.22230025% 1.29687550% 1.32710075% 1.366200max 1.446000

Smooth Data with Moving Average

In [52]:

exuseu[’mavg’] = pd.rolling_mean(exuseu, 4) #create a new columnprint(exuseu.head()) #we lose 3 observationsexuseu[3:].plot()

EXUSEU mavgDATE2010-01-01 1.4266 NaN2010-02-01 1.3680 NaN2010-03-01 1.3570 NaN2010-04-01 1.3417 1.3733252010-05-01 1.2563 1.330750

Out [52]:

<matplotlib.axes.AxesSubplot at 0xdc3c5f8>

Page 7: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

In [36]:

exusuk = webdata.DataReader("EXUSUK", "fred", tstart, tend);exjpus = webdata.DataReader("EXJPUS", "fred", tstart, tend);print(type(exusuk))print(exusuk.head())

<class ’pandas.core.frame.DataFrame’>EXUSUK

DATE2010-01-01 1.61582010-02-01 1.56182010-03-01 1.50582010-04-01 1.53322010-05-01 1.4669

In [18]:

#get multiple series (requires pandas 0.13+)#erates = webdata.DataReader(["EXUSEU", "EXUSUK", "EXJPUS"], "fred", tstart, tend)

In [57]:

erates = pd.DataFrame(dict(eurusd=exuseu.EXUSEU,gbpusd=exusuk.EXUSUK,jpyusd100=100/exjpus.EXJPUS))erates.plot();

Page 8: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

2.3 Currency Returns

Define raw foreign currency return:

dlst =St − St−1

St−1=

St

St−1− 1

In [39]:

dls = erates.pct_change()print(dls.head())dls = dls[1:] #slice off the NaNprint(dls.corr())dls.plot();

eurusd gbpusd jpyusdDATE2010-01-01 NaN NaN NaN2010-02-01 -0.041077 -0.033420 0.0106682010-03-01 -0.008041 -0.035856 -0.0063562010-04-01 -0.011275 0.018196 -0.0292832010-05-01 -0.063651 -0.043243 0.016088

eurusd gbpusd jpyusdeurusd 1.000000 0.724467 0.077517gbpusd 0.724467 1.000000 0.186015jpyusd 0.077517 0.186015 1.000000

Page 9: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

In [40]:

pd.scatter_matrix(dls, diagonal=’kde’);

In [50]:

fig, ax = plt.subplots(1,1)corr = dls.corr()plt.imshow(corr, cmap=’Blues’, interpolation=’none’)

Page 10: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

plt.colorbar()plt.xticks(range(len(corr)), corr.columns)plt.yticks(range(len(corr)), corr.columns);

Riskiness of Currency Returns

In [51]:

# Code from Thomas Wiecki’s Financial Analysis in Pythonplt.scatter(dls.mean(), dls.std())plt.xlabel(’Mean Currency Returns’)plt.ylabel(’Std Currency Returns’)for label, x, y in zip(dls.columns, dls.mean(), dls.std()):

plt.annotate(label,xy = (x, y), xytext = (20, -20),textcoords = ’offset points’, ha = ’right’, va = ’bottom’,bbox = dict(boxstyle = ’round,pad=0.5’, fc = ’yellow’, alpha = 0.5),arrowprops = dict(arrowstyle = ’->’, connectionstyle = ’arc3,rad=0’))

Page 11: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

2.4 Writing and Reading CSV

In [18]:

#get quarterly real GDP data from FREDgdpc1 = webdata.DataReader("GDPC1", "fred", tstart, tend)print(gdpc1.describe())gdpc1[’ly’] =np.log(gdpc1[’GDPC1’]) #create new columnprint(gdpc1.describe())

GDPC1count 15.000000mean 15220.846667std 379.293620min 14597.70000025% 14918.20000050% 15242.10000075% 15536.800000max 15839.300000

GDPC1 lycount 15.000000 15.000000mean 15220.846667 9.630131std 379.293620 0.024938min 14597.700000 9.58861925% 14918.200000 9.61033650% 15242.100000 9.63181775% 15536.800000 9.650967max 15839.300000 9.670249

We can save our data set as a .csv file.

Page 12: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

In [9]:

gdpc1.to_csv(’data/temp.csv’) #‘data‘ directory must already exist

Take a look at the file you created. (Use type on Windows or cat on other systems.)

In [13]:

!type data\temp.csv

DATE,GDPC1,ly2010-01-01,14597.7,9.588619261040032010-04-01,14738.0,9.5981844713269452010-07-01,14839.3,9.605034345797522010-10-01,14942.4,9.6119580883554542011-01-01,14894.0,9.608713726270592011-04-01,15011.3,9.6165585298045542011-07-01,15062.1,9.6199369338637962011-10-01,15242.1,9.631816615023332012-01-01,15381.6,9.6409272688581262012-04-01,15427.7,9.6439198739744242012-07-01,15534.0,9.6507864489795932012-10-01,15539.6,9.6511468835646182013-01-01,15583.9,9.6539936090239462013-04-01,15679.7,9.6601221610683072013-07-01,15839.3,9.670249472472733

Naturally you can read CSV data from disk.

In [14]:

df2 = pd.read_csv(’data/temp.csv’, index_col=’DATE’, parse_dates=True)df2.head()

Out [14]:

GDPC1 lyDATE2010-01-01 14597.7 9.5886192010-04-01 14738.0 9.5981842010-07-01 14839.3 9.6050342010-10-01 14942.4 9.6119582011-01-01 14894.0 9.608714

In [19]:

dly = gdpc1[’ly’].diff()print(dly) #note the NaN; lose an obs to diffprintdly=dly[1:] # slice it offprint(dly)#plot itdly.plot();

DATE2010-01-01 NaN2010-04-01 0.0095652010-07-01 0.0068502010-10-01 0.0069242011-01-01 -0.0032442011-04-01 0.0078452011-07-01 0.0033782011-10-01 0.0118802012-01-01 0.0091112012-04-01 0.0029932012-07-01 0.0068672012-10-01 0.0003602013-01-01 0.0028472013-04-01 0.0061292013-07-01 0.010127Name: ly, dtype: float64

Page 13: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

DATE2010-04-01 0.0095652010-07-01 0.0068502010-10-01 0.0069242011-01-01 -0.0032442011-04-01 0.0078452011-07-01 0.0033782011-10-01 0.0118802012-01-01 0.0091112012-04-01 0.0029932012-07-01 0.0068672012-10-01 0.0003602013-01-01 0.0028472013-04-01 0.0061292013-07-01 0.010127Name: ly, dtype: float64

2.5 Use Matplotlib with Pandas Data Types

Here we want to plot two series with different frequencies in one chart, so we turn to direct use of Matplotlib.

In [23]:

fig, ax1 = plt.subplots(1,1)ax1.plot(dly.index, dly, ’b-’)ax1.set_ylabel(’GDP growth’, color=’b’)ticks = ax1.get_xticks()ax1.set_xticks((ticks[0],ticks[-1]))ax2 = ax1.twinx() #use same x-axis with separate y-scaleax2.plot(exuseu.index, exuseu, ’r-’);ax2.set_ylabel(’EUR-USD’, color=’r’);

Page 14: pandas4xr - American University · s2=pd.Series([30,31,32,33],name="series two") #first approach: concatenation of series df1=pd.concat([s1,s2], axis=1) df1 Out [4]: my series series

3 END