24
REAL ESTATE - CLASSIFICATION OF PLACES/STATES By Shyam Anadkat

Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Embed Size (px)

Citation preview

Page 1: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

REAL ESTATE - CLASSIFICATION OF PLACES/STATES

By Shyam Anadkat

Page 2: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

RECOMMENDATION ON REAL ESTATE INVESTMENT

Clustering Techniques was used to Segment all the places depending on return on Investment (Yield), Rent/Income and Percentage of Population living in Houses in that area

Investment Strategy based on the Clusters;

Cluster 1:-Return on Investment takes time. Densely populated Areas.

Cluster 2:- Scarcely populated with affordable property rates and low rental Income

Clusters 3:-Risky Investment, as people have low average Income compared to Gross Rent to be paid.

Clusters4:-Densely populated with Low Rental Income . Potential for Long term Investment due to Strong industrial presence.

Cluster 5:-Premium Investment. Not Attractive for the Investment

Cluster 6:- Stability , Long term perspective and Low investment with good Return in Property

Cluster 3:- High Return on Rental Income Cluster 4:-Median Household Income

Cluster 6:- High Return on Rental Income and Median House hold Income

Page 3: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Recommendation on Real Estate Investment

Predictive Model made by Decision Tree to identify customer needs and suggest best Clusters/States.

Predictive Model for Property Investment

STEP1:Auto Evaluation of Clusters

Enter Investment Amount: 500000Place Type borough

Average Population 0.98

Cluster best suitable 4

STEP2:Summary Stats of Cluster

Expected Yield: 5.168568984

Expected Gross Revenue: 936.5032258

Expected Rent/Income: 18.87493886

Expected Income of People in Area: 60234.78194

Conclusion Future Growth Likely

State feasible Cluster 4 States

Sample Model Based on Automation

Automated Model for the Investment Selection based on Amount , Location and Population

User Input

It will recommend all states similar to your

requirement

Guidelines for using Excel:1) Enter only fields which have background color of Orange.2) Enter Investment Amount – Ranging from 63,285 – 100,00,0003) Select Place Type from drop box only. -4) Tell us whether you want to live in Area where major population is living in Houses

High 0.99 (Population is majorly occupied by Household) Low 0.87 (Low people are living in houses as compared to total population)

Automatic Cluster Identification

Decision Tree

Page 4: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Recommendation on Real Estate Investment

Predictive Model for Cluster Detection.

(Left Box plot )Time frame for the Return of Investment vs. Amount of Investment based on Cluster (Right Box Plot)

Page 5: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Recommendation on Real Estate Investment

Tableau Visualization – Pie Chart gives idea of how clusters are distributed in each state and size of the pie chart represents the average returns/yields that can be expected

Cluster wise returns(i.e. yields) for each state

Cities on the North Eastern side have good return on the Investment but with the lower penetration Western region could have exellecent Long term Opportunities .

Page 6: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Recommendation on Real Estate Investment

Tableau Visualizations – Pie Chart gives idea of how clusters are distributed in each state and size of the pie chart represents the average Cost of Houses that can be expected

Cluster wise Investment Cost for each State

Cities near the sea shore has High Investment requirements in comparison to the Return on Property which are away from the sea shore.

Page 7: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Recommendation on Real Estate Investment

Left Graph – Value of Household by State VS Right graph – Yield for each State

Tableau Visualization – Interesting Way to Represent

Tableau Visualization – Bubble represents average investment cost for particular area and color of bubble represents Cluster in which it falls

Tableau Visualization – Size of Text represents return on Investment and Color represents Cluster in which it falls.

Page 8: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

SYNOPSIS - OBJECTIVEObjective of the Business Case Study:

Main objective of the case study was to help clients recommend good places to invest in property.

Client in majorly interested gaining maximum return from the property. Return can be estimated as Yield = Revenue Earned/Total Property Value. Hence our major aim will be clustering the dataset and finding places which can give high yield.

Client might also be interested in knowing the future growth of the property which is likely to happen. Future growth is calculated as Rent/Income. Lesser the ratio more better chances of future growth.

We will then formulate rules based on which Client will be able to make decision about investor’s looking for which cluster and then suggest overall performance of that cluster.

Page 9: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Approach to the problem:Method chosen to classify the dataset into high yield and low Rent/Income was Clustering.

Clustering: Clustering is basically task of grouping similar observations. In our case it will group observation depending on yield and Rent/Income so we can say that same cluster behave similarly.

Decision Tree: We have also formulated the rules that can be used to identify property belongs to which Cluster depending on “Rent”, “Household Income” and “Value of Property”

Steps Involved: Data Preparation Data Exploration Clustering Decision Trees

SYNOPSIS – APPROACH TO THE PROBLEM

Page 10: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Data Preparation:

Cleaning Data: There were many duplicates present in the dataset which was cleaned initially.

Identifying Missing Values: There were no Missing Values in the dataset.

Outlier Detection: Clustering was approach which has identified outlier in dataset and has segregated them into separate clusters with very high or low yield different from general population

SYNOPSIS – STEPS TAKEN IN ANALYSIS

Page 11: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Data Exploration

Note: Both are Normally Distributed

SYNOPSIS - STEPS TAKEN IN ANALYSIS

yield     Mean 6.330619731Median 6.030484895Mode 7.387210119Standard Deviation 2.129441335Kurtosis 8.760989073Skewness 1.716207976Range 21.10265131Minimum 1.943220252Maximum 23.04587156Count 1598

Rent_Income     Mean 21.25694459Median 20.59609055Mode 29.70666365Standard Deviation 4.563985385Kurtosis 1.355759761Skewness 0.947727722Range 29.2111742Minimum 11.38720519Maximum 40.59837939Count 1598

11.38720519

13.63421859

15.88123199

18.12824539

20.37525879

22.62227219

24.86928559

27.11629899

29.36331239

31.61032579

33.85733919

36.10435259

38.35136599More

0

40

80

120

Histogram

Frequency

Bin

Freq

uenc

y

1.943220252

3.566501122

5.189781992

6.813062862

8.436343731

10.0596246

11.68290547

13.30618634

14.92946721

16.55274808

18.17602895

19.79930982

21.42259069More

050

100150200250

Histogram

Frequency

Bin

Freq

uenc

y

Page 12: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

SCATTER PLOT:

NOTE: If you see this graph, it is very closely grouped.

SYNOPSIS - STEPS TAKEN IN ANALYSIS

Page 13: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

CORRELATION MATRIX:

As you can see Yield is negatively correlated to “median household income”

Yield will be highly correlated with “median value” and “median gross rent” as it is derived from those two variable.

Other than that all variables are not correlated to yield.

SYNOPSIS - STEPS TAKEN IN ANALYSIS

 

population_Occ_Hsg_Un

t populationmedian value

median gross rent yield

median house hold

incomeRent_Incom

epopulation_Occ_Hsg_Un

t 1            population 0.999949 1          

median value 0.153892 0.153961 1        

median gross rent 0.129204 0.128252 0.826901 1      

yield -0.09738 -0.09786 -0.68942 -0.39435 1    median

house hold income 0.034971 0.033172 0.672362 0.762791 -0.46071 1  

Rent_Income 0.100158 0.101542 0.056529 0.162922 0.222758 -0.47077 1

Page 14: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

BOX PLOTS:

Yields:

Rent_Income:

SYNOPSIS - STEPS TAKEN IN ANALYSIS

Page 15: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

CLUSTERING:

Variable used for clutsering dataset:1) Yield (Highest Importance)2) Rent/Income3) % of Population Occupied

K-Means Algorithm:This non heirarchial method initially takes the number of components of the population equal to the final required number of clusters. In this step itself the final required number of clusters is chosen such that the points are mutually farthest apart. Next, it examines each component in the population and assigns it to one of the clusters depending on the minimum distance. The centroid's position is recalculated every time a component is added to the cluster and this continues until all the components are grouped into the final required number of clusters.

SYNOPSIS - STEPS TAKEN IN ANALYSIS

Page 16: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Clusters Visualization:

SYNOPSIS - STEPS TAKEN IN ANALYSIS

Page 17: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

CLUSTER SUMMARY:

- High Yield - Low Rent/IncomeCLUSTER 1: High average population staying in Houses, with low yield and high Rent_Income. Value of Property is high, rent is also high but median income is low at averaget US.In this group property rates are high than value of property. Rent are high compared to Income of person. Population is very high. These are high end Metropolis cities.States: Arizona, California etc..

CLUSTER_FREQ_ _STAT_ yield

Rent_Income

population_Occ_Hsg_Un

t population median_valuemedian_gro

ss_rentmedian_house_hol

d_income

1 163 MEAN 5.1303 27.7676392,472.404

9402,303.423

3 259,647.8528 1,014.4540 44,473.0429

2 6 MEAN 7.4138 26.0448 70,405.6667 87,884.8333 145,400.0000 862.3333 40,721.1667

3 36 MEAN 11.9399 28.7895164,571.888

9167,572.777

8 87,525.0000 861.5556 36,465.7778

4 775 MEAN 5.1686 18.8749241,798.196

1247,666.743

2 237,816.5187 936.5032 60,234.7819

5 2 MEAN 22.6483 32.3112393,070.500

0401,019.500

0 35,900.0000 676.5000 25,374.5000

6 483 MEAN 8.0238 21.3971234,554.186

3239,866.434

8 122,181.9876 804.6894 45,718.0124

Popolation MEAN 6.3306 21.2569254,158.651

4260,473.771

6 198,858.1364 900.3974 52,539.0438

SYNOPSIS – OVERALL SUMMARY

Page 18: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

SYNOPSIS – OVERALL SUMMARY

CLUSTER 2: It has lowest average population residing in Houses with average Yield and average Rent_Income. Low property rate, low income and low rent from average.Places: Bloomington City, RaleighCity, Tompkins City, Columbia City, Coryell City, Walker Country

Cluster 3: Lowest Property Rates, Average Rent and average income of people residing in this place.Eg: Cobb Country, Lakeland City, Miami Garden City etc..

Cluster 4: This cluster contains places with High Population, low yield and low Rent_Income. This is major cities preferred by population to stay. As it may be close to city/industries and may be less expensive in rent.Eg: Houston Country, Boulder Country/City, San Francisco Country/City etc..

Page 19: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

SYNOPSIS – OVERALL SUMMARY

Cluster 5:This cluster contains highest population with High Yield and high Rent_Income ratio. It is actually outliers from all the dataset and contains only 2 place Detroit City and Flint City of Miami. Property rate are highest, rent in not above average also and income of people is also not high.

Cluster 6: This cluster contains city with high yield and average Rent_Income and average Population living in Houses. Value of houses are low. As compared to average rate in US.Eg: Pittsburg City, Bay County, Hollywood City etc..

Page 20: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

SYNOPSIS – SOME VISUALIZATIONS - TABLEAU

Bubble size/Colour shows number of places in each state

Page 21: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Heat Map: Box represent count of Places in that Cluster & color represents count of States

SYNOPSIS – SOME VISUALIZATIONS - TABLEAU

Page 22: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

We have generated some rules to identify whether property belongs to Cluster 1, 4 and 6. As we have less observation in 2 , 5 and 3 we have avoided those observations. Dataset was partitioned into Training and Testing with proportion of 70% to 30% respectively.

Confusion Matrix:

As seen from Confusion Matrix: Cluster 1: 20/38 = 0.52 will be success rate Cluster 4: 196/197 = 0.9949 will be success rate Cluster 6: 109/120 = 0.90833 will be success rate

SYNOPSIS – DECISION TREES

Decision Tree Rules

Predicted

Actual

  1 2 3 4 5 6

1 20 0 0 18 0 0

2 0 0 0 2 0 0

3 8 0 0 2 0 0

4 1 0 0 196 0 0

5 0 0 0 0 0 0

6 11 0 0 109 0 0

Page 23: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

Based on Cluster:Cluster 1 – Recovery time of Property is HighCluster 2* – Good Investment optionCluster 3 – Risky Investment (Income low rent high)Cluster 4 – Future Growth LikelyCluster 5 – Premium InvestmentCluster 6 – Stable Investment Cluster* - Low Avg Population occupied by Household Cluster

ANNOVA for cluster has been done to see if the group means of cluster are away from each other Statistically.

Scope Further:1) We can also cluster zip codes to go further down hierarchy.2) We can also collect more data (Land Size, Land_Housing_Unit,

Land_by_Industry/Company, US citizen population and median_ppl_each_house etc..) and then do regression.

CLIENT RECOMMENDATION

Page 24: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Shyam Anadkat

SAS Codes:

Tableau Screenshots:

References: http://www.tableausoftware.com/learn/tutorials/on-demand/advanced-mapping-techniques http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#cluster_toc.htm

APPENDIX

SAS Codes

Tableau Graphs

Microsoft Excel Worksheet

F:\VOZAG PROJECT\Project Revised Final Su