Upload
louis-dorard
View
1.031
Download
0
Embed Size (px)
DESCRIPTION
Everyone can do data science with the help of tools such as: - import.io for visually scraping data from the web - Pandas to wrangle data in Python - BigML to apply machine learning to data. In this presentation , I introduce what machine learning is before moving on to a case study where I show how to build a real estate pricing model. Check out import.io's webinar for the whole thing: http://blog.import.io/post/become-a-data-scientist-in-an-hour
Citation preview
Everyone can dodata science"
import.io webinar 23/9/14!
Louis Dorard (@louisdorard)
US real estate portals:"- Realtor - Zillow - Trulia - …
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)
3 1 860 1950 house 565,000
3 1 1012 1951 house
2 1.5 968 1976 townhouse 447,000
4 1315 1950 house 648,000
3 2 1599 1964 house
3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000
4 2001 house 855,000
3 2.5 1472 2005 house
4 3.5 1714 2005 townhouse
2 2 1113 1999 condo
1 769 1999 condo 315,000
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)
3 1 860 1950 house 565,000
3 1 1012 1951 house
2 1.5 968 1976 townhouse 447,000
4 1315 1950 house 648,000
3 2 1599 1964 house
3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000
4 2001 house 855,000
3 2.5 1472 2005 house
4 3.5 1714 2005 townhouse
2 2 1113 1999 condo
1 769 1999 condo 315,000
Let’s create a real estate pricing model
Fabien Durand (@thefabiendurand)
www.louisdorard.com/guest/everyone-can-do-data-science-importio
Data Science:"- domain knowledge - hacking abilities - machine learning
What the @#?~% is ML?
“Which type of email is this? — Spam/Ham”"-> Classification
“How much is this house worth? — X $” -> Regression
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)
3 1 860 1950 house 565,000
3 1 1012 1951 house
2 1.5 968 1976 townhouse 447,000
4 1315 1950 house 648,000
3 2 1599 1964 house
3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000
4 2001 house 855,000
3 2.5 1472 2005 house
4 3.5 1714 2005 townhouse
2 2 1113 1999 condo
1 769 1999 condo 315,000
ML is a set of AI techniques where “intelligence” is built by
referring to examples
??
(McKinsey & Co.)
“A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics
and machine learning.”
(Bret Victor)
Making ML effortless
HTML / CSS / JavaScript
HTML / CSS / JavaScript
squarespace.com
The two phases of machine learning:
• TRAIN a model
• PREDICT with a model
The two methods of prediction APIs:
• TRAIN a model
• PREDICT with a model
The two methods of prediction APIs: • model = create_model(dataset)!
• predicted_output = create_prediction(model, new_input)
from bigml.api import BigML !# create a model!api = BigML()!source = api.create_source('training_data.csv')!dataset = api.create_dataset(source)!model = api.create_model(dataset) !# make a prediction!prediction = api.create_prediction(model, new_input)!print "Predicted output value: ",prediction['object']['output']
http://bit.ly/bigml_wakari
Recap
• Classification and regression
• 2 phases in ML: train and predict
• Prediction APIs make it easy to build models
• Let’s use them on real estate data to predict price from house characteristics
• Encoding domain knowledge
• Making our life easier: restricting data to only 1 city
BigML!
• Look at data
• Split into training and test
• Build model from training
• Evaluate model on test
• Errors: mean absolute error (or percentage?)
Other import.io + BigML use cases:!- Predict ebook rating from description - Predict sales of etsy stores
Talk at #APIconUK!tomorrow in London
ML Algorithm API
Automated Pred. API
Text Classification API
Vertical Pred. API
Fixed-model Pred. API
AB
STRA
CTIO
N
www.louisdorard.com/machine-learning-book 50% off for 24 hours with code “importio”
!
!
@louisdorard