30
Data Science Made Easy in ArcGIS Using Python and R James Jones, Lu Zhang and Lain Graham [email protected] ; [email protected] and [email protected]

Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Data Science Made Easy in ArcGIS Using Python and R

James Jones, Lu Zhang and Lain [email protected]; [email protected] and [email protected]

Page 2: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Workshop Agenda

• Introduction• Python • R• Conclusion / Q&A

Page 3: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Data Science

• Core analytics in ArcGIS- Maximize performance and utility- E.g. Spatial Statistics, Geostatistics, Spatial Analyst- E.g. GeoAnalytics, Insights, ArcGIS Python SDK

• The interoperability of the ArcGIS platform makes workflows more efficient - Techniques and methodologies continue to develop- Data availability continues to increase

• The data science community is vast and evolving- ArcGIS extends directly via scripting APIs

- E.g. Python, R, Java

From Core to Community

Page 4: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Data Science CommunityPython

Page 5: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Data Science Community

• Well over 12,000 packages to enhance core• Most widely used statistical software in the world• Diverse and powerful

- Universities, Government, Industry- Finance, Ecology, Statistics- Machine learning, predictive analytics

R

Page 6: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Battle of BandsWhich one is best?

General Programming Language Functionality Tailored towards statistics and data Analysis

Yes (PyPI and Conda) Package Management Yes (CRAN)

Individual machines to distributed computing environments Scalability Standalone computer or individual

servers

Large number of libraries for graphical display of data Display of Data Numerous libraries for making

incredible graphics

General Programming, Data Science, Web Development Use Cases Lingua franca of data science

YES! Integrates with ArcGIS YES!

What are you most comfortable with?What is the best tool for the job?

Page 7: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Data Science Process

Ask a question

Get the data

Explore the data

Model the data

CommunicateDownloadScrape

ETLEnrich

ChartSummary & Descriptive

StatisticsDevelop Hypotheses

Identify Patterns

RegressionMachine Learning

“Big Data”

Jupyter NotebooksWritten/Textual Reports

Web AppsStory MapsPowerPoint

Page 8: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Getting the Data

Non-GIS Data• Large number of libraries to download,

transform, condition, and prepare data• Generic web libraries (Requests, urllib)• API Specific libraries (Tweepy)• Ability to scrape and parse existing web

sites (Scrapy and BeautifulSoup)

GIS Data• ArcPy allows native access to all Esri

Data formats and the full functionality of ArcGIS Desktop

• ArcGIS API for Python allows for access and interaction with content within your WebGIS

Page 9: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Exploring Data

• Jupyter Notebooks are your friend• ArcGIS API for Python was designed to work

natively with the Jupyter Notebook• Ability to create Spatially Enabled DataFrame• SEDF requires ArcGIS API for Python version 1.5• Core Pandas DataFrame, allows for rapid

exploration of data• Can create maps, charts and a variety of other

objects quickly using a common syntaz

Page 10: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Modeling the Data

• Many native tools to enable modeling of data (Space-Time Pattern Mining, Density Based Clustering, etc…)

• Integration of popular third-party Machine Learning/Deep Learning libraries

- Scikit-Learn- Tensorflow- PyTorch- NLTK

Apps

Desktop APIs

Page 11: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Machine Learning Tools in ArcGIS

• Maximum Likelihood Classification

• Random Trees• Support Vector Machine

• Empirical Bayesian Kriging• Areal Interpolation• EBK Regression Prediction• Ordinary Least Squares

Regression and Exploratory Regression

• Geographically Weighted Regression

Classification Prediction

Clustering• Spatially Constrained

Multivariate Clustering• Multivariate Clustering• Density-based Clustering• Image Segmentation• Hot Spot Analysis• Cluster and Outlier Analysis• Space Time Pattern Mining

Page 12: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Jupyter Notebooks and data integration

Python Demo

Page 13: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

An IntroductionThe R-ArcGIS Bridge

Page 14: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

R?

• A widely used statistical programming language • More than 10,000 Packages• Powerful and Open-Sourced

- Universities, Government, Industry- Finance, Ecology, Statistics- Machine learning, predictive analytics

Page 15: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

ArcGIS? • A GIS platform• Open Standards / OGC Compliant

- Universities, Government, Industry- Finance, Ecology, Statistics

Page 16: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

• The R-ArcGIS bridge provides the ability to integrate R and ArcGIS functionality.

The R-ArcGIS Bridge

Page 17: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

ArcGIS R script R

ArcGIS R

ArcGIS R

GIS Analyst

Data Scientist

Developers

Who Can Use the R-ArcGIS Bridge?

Page 18: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

ArcGIS Pro

R

1.1 (or later)

3.2.2 (or later)

ArcMap

10.3.1 (or later)

Recommended !

RStudio

Version Requirements for the R-ArcGIS Bridge

Page 19: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Installing the Bridge / Getting Started

ArcGIS Pro Project Tab RStudio

• arc.open()• Reads GIS data that is currently stored in a shapefile, geodatabase, layer, or table into a R

environment (containing spatial information and attribute information)

• arc.select()• Obtain your data set in an R data frame object. Offers the ability to work with specific attributes

from your data (dplyr / tidyr) and construct SQL queries

• arc.write()• Allows you to easily save your data to shapefile, geodatabase, or table to an ArcGIS environment

Top 3 Most Useful R-ArcGIS Bridge Functions:

Page 20: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Vector Support

• Ability to read and write vector data

• Support for key R objects and spatial packages- R data frame object- Compatibility with sp- Compatibility with sf

• Customize data manipulations- Craft SQL queries / Subset by specific columns- Reproject data as needed

• Maintain spatial geometries when working with dplyr

Points

Lines

Polygons

Page 21: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Raster Support

• Ability to read and write raster data- Handle big data raster data with the ability to read in chunks by bands- Compatibility with CRF format and Mosaic Datasets

• Customize selections and subsets- Create subsets by bands or pixel rows and columns- Resample options available- Select desired pixel format for specific analyses

Page 22: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

How is Microsoft R useful?

Raster data can become a big data problem, quickly

Mosaics

Time

Mosaics: Data structure to store/process rasters in space and time

Time-Series Rasters/Mosaic

Microsoft R

Page 23: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Working with Big Data?

• Microsoft Open R is a publicly available R-version for big data

• Contains almost all CRAN libraries

• Window-based operations and image operators speed up drastically

• Set-up and Usage from Pro is exactly the same as traditional R

Page 24: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Predict Home Vacancy Rates in Washington DC based on Vacant Home Locations in Baltimore (XY Coordinates)

R-ArcGIS Bridge Demo

Page 25: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Ask a question

Get the data

Explore the data

Model the data

Communicate

ArcGIS

ArcGIS / RStudio

RStudio

ArcGIS

Page 26: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

ResourcesLearn More on Using the R-ArcGIS Bridge

Resources from UC 2018:- https://github.com/R-ArcGIS

Getting Started:- Analyzing Crime Using Statistics and the R-ArcGIS Bridge Learn Lesson- Using the R-ArcGIS Bridge Introductory Web Course

Creating R Script Tools:- Integrating R Scripts into Geoprocessing Tools Web Course- arcgisbinding Package Vignette

Powerful, In-depth Workflows in ArcGIS and R- Identify an Ecological Niche for African Buffalo

Page 27: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Upcoming Sessions at Developers Summit (1/31/2019)

• ArcGIS API for Python: Introduction to Scripting your Web GIS- Thursday, January 31st @ 10:30 am

• ArcGIS Pro: Scripting with Python- Thursday, January 31st @ 10:30 am

• ArcGIS API for Python: Advanced Scripting- Thursday, January 31st @ 1:30 pm

• Expand your Analysis with ArcGIS GeoAnalytics Server- Thursday, January 31st @ 2:30 pm

• Maps & Charts: Making Visual Interfaces Accessible- Thursday, January 31st @ 3:30 pm

Page 28: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Print Your Certificate of AttendancePrint Stations Located at L Street Bridge

Wednesday

6:30 pm – 9:00 pm Networking Reception

National Building Museum

Page 29: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g

Please Take Our Survey on the AppDownload the Esri Events app and find your event

Select the session you attended

Scroll down to find the feedback section

Complete answersand select “Submit”

Page 30: Data Science Made Easy in ArcGIS Using Python and R · Data Science • Core analytics in ArcGIS-Maximize performance and utility-E.g. Spatial Statistics, Geostatistics, Spatial Analyst-E.g