Upload
dinhdang
View
222
Download
0
Embed Size (px)
Citation preview
Week 1
The computer environment and Python
Introduc7on to Coding and Objec7ve Analysis
for the Atmospheric and Oceanic Sciences
Statistics and data analysis resources for AOS
We have also created a Dropbox folder containing some of the material below, and that can be accessed here. Pleasesend suggestions on how best to make these resources available.
Published books
• Daniel S. Wilks. Statistical methods in the atmospheric sciences, volume 100 of International geophysics series.Academic Press, 2011
• Hans von Storch and Francis W. Zwiers. Statistical analysis in climate research. Cambridge University Press,1999
Course lecture notes available online
Some of the most valuable material lives in the lecture notes of courses at other universities. Note that the first twobelow use MATLAB code for most examples, though the syntax is very similar to Python, and we are happy to helptranslate.
• Dennis Hartmann’s (U. Washington) course notes for “Objective analysis”
• Chris Bretherton’s (U. Washington) course notes for “Computational methods for data analysis”
• Julien Emile-Geay’s (USC) course notes for “Data analysis in the earth & environmental sciences”
Other good links
Websites
• The folks at NCAR have written a great Climate Data Guide (intro to common tools and methods)
• Johnny Lin, a professor and Python enthusiast, maintains his own Python for the Atmospheric and OceanicSciences blog
Documents
• Silvia A. Venegas’ “Statistical methods for signal detection in climate”
• Abdel Hannachi’s “Primer for EOF analysis of climate data”
• Charles E. Grinstead and J. Laurie Snell’s “Introduction to probability”
Green Tea Press
Allen B. Downey, an engineering professor at Olin in Massachusetts, also writes great (and free!) books on learningPython, as well as introductory and advanced statistics with a Python bent. A few of his books are linked below,though more can be found at www.greenteapress.com.
• Think Python → a book on how to ‘think like a computer scientist’ using Python (introductory chapters aregreat for beginners; later chapters are heavy on object oriented programming)
• Think Stats → a book on probability and statistics using Python
• Think Bayes → a book on Bayesian statistics using Python
UNIX/LINUX An opera7ng system that is widely used for scien7fic programming
Interface is the command line in a terminal shell
Synop7c lab computers, professor clusters, Apple products (OSX), Google products
(Android), and supercomputers use it
UNIX/LINUX An opera7ng system that is widely used for scien7fic programming
Interface is the command line in a terminal shell
Synop7c lab computers, professor clusters, Apple products (OSX), Google products
(Android), and supercomputers use it
Top level directory (root, home directory, /home/neil/)
Subdirectory #1 (/home/neil/scripts/) Subdirectory #2 (/home/neil/figures/)
hello_world.py
fortran_is_old.f90
matlab_is_my_friend.m
america.eps
china.png
chile.jpg
20 useful commands 1. “ls” – list the contents of a directory (‐l, ‐h, ‐r, ‐t)
2. “cp” – change directories
3. “pwd” – print current working directory
4. “cp” – copy a filename to a new filename or copy a file to a new directory (‐r)
5. “mv” – change a filename or move a file to a new directory (‐r)
6. “rm” – remove files (‐f, ‐r)
7. “mkdir” and “rmdir” – make and remove a directory
8. “sudo” – make a super do ac7on (requires owner or root password)
9. “man” – print the manual for the following command
10. “which” – show loca7on of shell commands
11. “top” – print details of CPU processes (users and opera7ons going on now)
12. “ssh” – secure shell login to remote machines (ssh –Y or ssh –X for shell forwarding)
13. “scp”, “s_p”, “rsync”, “_p” – (securely) transfer files from one machine to another
14. “bg” – print what jobs are running the background
15. “.” or “./” – put files in the current directory or run a script in the current directory
16. “screen” – run a long script in a screen, allows you to log off and not kill the script
17. “ctrl‐c” or “ctrl‐z” – kill a script
18. “ctrl‐a”, “ctrl‐e” – move to the beginning or end of the command line
19. “cat” – print the contents of a filename on console without opening an editor
20. “grep” – a magical find all command
Package managers
A package manager is a collec7on of so_ware and tools that helps you install things cleanly and easily on your computer
This is the best way to keep installa7ons clean and manageable. If you have to install so_ware you will use on the command line, USE A PACKAGE MANAGER IF POSSIBLE.
• For Mac – homebrew (hdp://brew.sh/)
• For LINUX – yum (hdp://yum.baseurl.org/) – also apt, ap7tude, pacman
• For Windows – chocolatey (hdps://chocolatey.org/)
• Anaconda – use for Python! (all opera7ng systems)
Text Editors
• Simply a plaiorm to type code, read text files,
or write love notes to your opera7ng system
• There are so many!
– Some are great for coding on your laptop, like
TextWrangler (OSX), SublimeText, etc.
Text Editors
• But there are two “classics” that unix systems usually have pre‐installed
– vi (or vim)
– emacs
– the learning curve is steep for each, but the payoff is that you can code on any unix machine without any editor compa7bility issues
– “A 2009 survey of Linux Journal readers found that vi was the most widely used text editor…” (Wikipedia aka FACT).
Your bash profile
• Scripts that are executed when bash (i.e. terminal) is used
• vi ~/.bash_profile
• source ~/.bash_profile
home directory “hidden” files
Your bash profile
Let’s get a nice preface to the command line:
export PS1="\u@\h\w:”
neilberg@whiz: [type commands here]
Your bash profile
Let’s get some color highligh7ng (machine dependent):
export LSCOLORS=ExFxBxDxCxegedabagacad
final_paper.pdf
coding_class/
Your bash profile
Let’s define some shortcuts:
alias cheddar="ssh -Y [email protected]”
alias ls=“ls –l”
>>> cheddar (executes ssh –Y [email protected])
>>> ls (executes ls –l)
Typing on the command line
Your bash profile
Let’s define some paths:
export NCARG_ROOT=/usr/local/ncarg
>>> echo $NCARG_ROOT
/usr/local/ncarg
Or append a path to your command path:
export PATH=$NCARG_ROOT/bin:$PATH
>>> echo $PATH
/Users/neilberg/anaconda/bin:/usr/local/ncarg
Why should I code in Python?
• Free – completely free. 100% free.
• Open source – no company or person owns Python
– coding geniuses are constantly refining, improving, and adding to the language
• Readability – the “code” reads like speech; forced indents make the code clean and organized
• High level – a ton of the real computer science is done behind the scenes, e.g. no compiling
• Mathema7cal, scien7fic, ploung libraries – powerful, well documented, and only a few keystrokes away
• Jobs a_er you graduate – NASA/NOAA/NCAR/NWS, Los Alamos Na7onal Laboratory, LNLL, and ESRI all use Python…oh, and
same with Google, Reddit, Yahoo, Walt Disney Anima7on, Pintrest, Dropbox, YouTube, and Yelp.
The Zen of Python (hdps://www.python.org/dev/peps/pep‐0020/)
The Zen of Python
Beau7ful is beder than ugly.
Explicit is beder than implicit.
Simple is beder than complex.
Complex is beder than complicated.
Readability counts.
In the face of ambiguity, refuse the tempta7on to guess.
Now is beder than never.
hdp://www.7obe.com/index.php/content/paperinfo/tpci/index.html
Coding in Python ‐ resources
• “Think Python” by Allen Downey (free and awesome) – hdp://www.greenteapress.com/thinkpython/thinkpython.pdf
• The official documenta7on (it’s very well wriden, always has examples) – hdp://docs.scipy.org/doc/numpy/reference/
• Stackoverflow (stackoverflow.com) – Smarter people have already coded your problem (or parts of it)
– Search previously submided ques7ons or submit your own
– Pure programmers can be ego7s7cal and obnoxious. Don’t take offense, they’re just jealous they aren’t atmospheric, oceanic, and space scien7sts.
• Google anything and everything – copy and paste error messages
– search for help through message boards, list serves, blog posts, etc
– seriously, how did people code before Google?
Coding in Python – installa7on
• Xcode – Apple so_ware development tools
(hdps://developer.apple.com/xcode/downloads/) ‐‐ the App Store
• XQuartz if Mac: hdp://xquartz.macosforge.org/landing/
• Python
– Pre‐installed on all Macs, but not all packages you will need • To install missing packages, you can use Homebrew: hdp://brew.sh/
– Anaconda is the easiest way to get everything: hdp://con7nuum.io/downloads • Available for Windows, Mac, and Linux plaiorms
– Other op7ons: • Enthought Canopy ‐ hdps://www.enthought.com/products/canopy/
Python 2 vs Python 3
• I (Neil) code with Python version 2.7, but if I started today, I’d go with version 3.4. – 2.7 is most commonly used today because of its robust libraries and
historical usage
– 3.4 is newer and ac7vely being developed, but may not have every library that is available in 2.7
• There are some incompa7ble differences, like the “print” command. Aside from that, you probably won’t no7ce a difference.
• The following slides will use version 3 syntax, but this syntax was back ported (allowed) in version 2.x. So you will not need to change anything if you’re using 2.x.
The essen7al modules
• NumPy – Numerical Python (hdp://www.numpy.org/)
– arrays (matrices), linear algebra, Fourier transform
– make the switch now MATLAB users:
hdp://wiki.scipy.org/NumPy_for_Matlab_Users
• SciPy – Scien7fic Python (hdp://www.scipy.org/)
– advanced mathema7cal algorithms
• Matplotlib – Ploung (hdp://matplotlib.org/)
– MATLAB style ploung, but in Python. Winning!
• netCDF4 – Network Common Data Format version 4 (hdps://code.google.com/p/
netcdf4‐python/)
– the format that many data sets are stored in
– atmospheric/climate models, reanalyses, observa7ons
Let’s code
Guido van Rossum, creator of Python
Check installa7ons
• Open an interacEve python shell
>>> python
Python 2.7.9 |Anaconda 2.0.1 (x86_64)| (default, Dec 15 2014, 10:37:34) [GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Check installa7ons
• Import the modules you need
>>> import numpy
>>> [blank line if installed correctly]
>>> import numpyy
>>> Traceback (most recent call last): File "<stdin>", line 1, in <module>ImportError: No module named numpyy
>>> import scipy
>>> import matplotlib
>>> import netCDF4
Code on!
• Make the computer say “Hello World!”
>>> print(‘Hello World!’)
Hello World!
[Note: Python accepts single or double quotes for strings]
Code on!
• For loops – itera7ng over a set of something X 7mes
>>> for course in [‘AOS1’, ‘AOS2’, ‘AOS3’]:...print(‘I have taken ‘+course)...I have taken AOS1I have taken AOS2I have taken AOS3
----------------Notes:
1. anything bracketed is a “list”
2. loops must end with a colon
3. all code following a for loop must be indented by 1 tab (4 spaces)
4. concatena7ng (joining) strings can be done through the + sign
Callable scripts
• The interac7ve shell is great for simple code and debugging, but we’re wri7ng 100+ line scripts that need to be saved.
• All Python scripts end with .py
• Run the code in the current directory with
>>> ./python my_script_name.py
You can also add “.” to the $PATH environmental variable in
your .bash_profile to avoid having to type ./ every 7me you
call a Python script:
export PATH=.:$PATH
NumPy – simple stats
>>> import numpy as np
NumPy – simple stats
>>> import numpy as np
>>> a = np.array([1,2,3,4,5], dtype=‘int’)
>>> a
array([1, 2, 3, 4, 5])
NumPy – simple stats
>>> import numpy as np
>>> a = np.array([1,2,3,4,5], dtype=‘int’)
>>> a
array([1, 2, 3, 4, 5])
>>> a + 5
array([ 6, 7, 8, 9, 10])
NumPy – simple stats
>>> import numpy as np
>>> a = np.array([1,2,3,4,5], dtype=‘int’)
>>> a
array([1, 2, 3, 4, 5])
>>> a + 5
array([ 6, 7, 8, 9, 10])
>>> np.mean(a)
3.0
NumPy – simple stats
>>> import numpy as np
>>> a = np.array([1,2,3,4,5], dtype=‘int’)
>>> a
array([1, 2, 3, 4, 5])
>>> a + 5
array([ 6, 7, 8, 9, 10])
>>> np.mean(a)
3.0
>>> np.std(a)
1.41421356237
>>> np.std(a, ddof=1)
1.58113883008
NumPy – mul7‐dimensional arrays
>>> b = np.array([[5,6,7,8], [9,10,11,12]])
>>> b
array([[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
NumPy – mul7‐dimensional arrays
>>> b = np.array([[5,6,7,8], [9,10,11,12]])
>>> b
array([[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> first_column = b[:,0]
>>> first_row = b[0,:]
>>> first_column
array([5, 9])
>>> first_row
array([5, 6, 7, 8])
Python indexing starts at ZERO!!!
NumPy – mul7‐dimensional arrays
>>> b = np.array([[5,6,7,8], [9,10,11,12]])
>>> b
array([[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> first_column = b[:,0]
>>> first_row = b[0,:]
>>> first_column
array([5, 9])
>>> first_row
array([5, 6, 7, 8])
>>> for value in first_row:
... if value > 6:
... print(value)
...
7
8
NumPy – indexing and slicing
• As we’ve seen, Python indexing start at ZERO!
• Moreover, a “range” of numbers in Python does NOT include
the end value.
>>> range(5)
[0, 1, 2, 3, 4]
>>> range(1,10)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
What value results from this expression?
>>> range(1,10)[5]
NumPy – indexing and slicing
What value results from this expression?
>>> range(1,10)[5]
6
Why not 5?
Because Python indexing starts at zero, so the 5th index is the 6th value in
the range argument.
>>> range(1,10)
[1,2,3,4,5,6,7,8,9]
indices: [0th,1st,2nd,3rd,4th,5th,6th,7th,8th]
Next level slicing
>>> numbers = range(10)[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
• Reverse >>> numbers[::-1] [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
• Every second element>>> numbers[::2][0, 2, 4, 6, 8]
• Second half (using “len” function)>>> numbers[len(numbers)/2:][5, 6, 7, 8, 9]
NumPy – missing values
• “nan” is handled very well in NumPy
– recommend seung missing values (e.g. 1e20,
‐999, m, ‐‐). to “np.nan”
NumPy – missing values
• “nan” is handled very well in NumPy
– recommend seung missing values (e.g. 1e20,
‐999, m, ‐‐). to “np.nan”
>>> precip = np.array([25.3, 26.2, 24.5, -999])
>>> precip[precip==-999] = np.nan
or
>>> precip[precip < 0] = np.nan
>>> precip
array([ 25.3, 26.2, 24.5, nan])
NumPy – missing values
>>> precip = np.array([25.3, 26.2, 24.5, -999])>>> precip[precip==-999] = np.nanor>>> precip[precip < 0] = np.nan>>> preciparray([ 25.3, 26.2, 24.5, nan])
• Average with missing (nan) values>>> np.nanmean(precip)25.333333333333332
• Sum with missing (nan) values>>> np.nansum(precip)76.0
• Max/min with missing (nan) values>>> np.nanmax(precip)26.199999 (this is 26.2 but is carried out to many decimal points because it’s a
float64 type)