21
Revenue & employment Analysis of International Students in USA Team Members: Priyanka Kale, Apekshit Bhingardive, Aditya Verma Guide: Dr. Jongwook Woo 24th Annual Student Symposium, CSULA 26 th February 2016

Revenue & Employment Analysis of International Students in USA using PyHive

Embed Size (px)

Citation preview

Page 1: Revenue & Employment Analysis of International Students in USA using PyHive

Revenue & employment Analysis of International Students in USA

Team Members: Priyanka Kale, Apekshit Bhingardive, Aditya VermaGuide: Dr. Jongwook Woo

24th Annual Student Symposium, CSULA26th February 2016

Page 2: Revenue & Employment Analysis of International Students in USA using PyHive

What is Big Data?

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.

It's not the amount of data that's important. It's what we do with the data that matters.

Machine Learning: big data often doesn't ask why and simply detects patterns.

Digital footprint: big data is often a cost-free byproduct of digital interaction.

Page 3: Revenue & Employment Analysis of International Students in USA using PyHive

Purpose of Analysis

To develop a system which will assist us to determine the revenue generated by international students.

Examining the relationship between new international enrollments and institutional income at public colleges, universities and professional organizations in the US.

Page 4: Revenue & Employment Analysis of International Students in USA using PyHive

Continued..

To understand the effects of increased international student enrollment on net revenue generation in US

Find out the income from Universities

Predict the impact of international students on revenue generation

Predict employment opportunities in the US

Page 5: Revenue & Employment Analysis of International Students in USA using PyHive

• Basic formula for calculating economic Benefit

Page 6: Revenue & Employment Analysis of International Students in USA using PyHive

Analysis is done using:

Analysis on huge data is done using the Hadoop File system (HDFS)

Hadoop environment using Horton Sandbox on Azure

Using Python and HIVE [Pyhive] – iPython Notebook

HUE

Google Fusion tables

WEKA Framework

Page 7: Revenue & Employment Analysis of International Students in USA using PyHive

Loading data into HDFS: File has been uploaded using Hadoop command line

Interface

Page 8: Revenue & Employment Analysis of International Students in USA using PyHive

Hortonworks Sandbox configuration

Number of nodes: 3 Size : Basic A4 with 8 cores 14 Gb memory

Page 9: Revenue & Employment Analysis of International Students in USA using PyHive

Creating tables in HUE from existing data

Page 10: Revenue & Employment Analysis of International Students in USA using PyHive

Connecting HIVE through Python Using Ipython notebook for writing the python

code

Embedding HiveQL inside python code.

Page 11: Revenue & Employment Analysis of International Students in USA using PyHive

Executing the Hive script from python code:

Page 12: Revenue & Employment Analysis of International Students in USA using PyHive

Visualizing data with Graphs

Alabam

a

Alask

a

Arizon

a

Arkan

sas

Califo

rnia

Color

ado

Connec

ticut

Delawar

e

Distric

t of C

olumbia

Feder

ated

State

s of M

icron

esia

Florid

a

Georg

iaGua

mHaw

aii

Idaho

Illinois

Indian

aIow

a

Kansa

s

Kentu

cky

Louisi

anaMain

e

Marsh

all Is

lands

Maryla

nd

Massa

chus

etts

Michiga

n

Minnes

ota

Mississ

ippi

Missou

ri

Monta

na

Nebra

ska

Nevad

a

New H

amps

hire

New Je

rsey

New M

exico

New Yor

k

North

Caro

lina

North

Dak

otaOhio

Oklaho

ma

Oregon

Palau

Pennsy

lvania

Puerto

Rico

Rhode I

sland

South

Caroli

na

South

Dak

ota

Tenn

esse

eTe

xas

$0.00

$5,000,000,000.00

$10,000,000,000.00

$15,000,000,000.00

$20,000,000,000.00

$25,000,000,000.00

TOTAL EARNING FROM FEES

Page 13: Revenue & Employment Analysis of International Students in USA using PyHive

Major earning states

California; 9.55%

New York; 10.84%

Pennsylvania; 7.36%

Percentage of total income

CaliforniaNew YorkPennsylvania

Page 14: Revenue & Employment Analysis of International Students in USA using PyHive

Visualizing Data in Google Fusion Tables

Page 15: Revenue & Employment Analysis of International Students in USA using PyHive

Supervised Learning using Classification:

WEKA framework has been used to classify the states depending on there total value of earnings.

UserClassifier Algorithm provided by WEKA tool has been used to generate graph of classification.

Final outcome of the Hive script executed in python has been processed using above mentioned algorithm.

Page 16: Revenue & Employment Analysis of International Students in USA using PyHive

Continued.. The class color differentiate the states into categories : For instance New York lies in orange color zone with being the among the top revenue generating state

Page 17: Revenue & Employment Analysis of International Students in USA using PyHive

Value Proposition:

International Students mobility trends: By 2017, the global middle class is projected to increase its spending on educational products and services by nearly 50 percent.

Institutions can take this growth into consideration!

United States a more welcoming nation!

Page 18: Revenue & Employment Analysis of International Students in USA using PyHive

Predictive Modelling:

Page 19: Revenue & Employment Analysis of International Students in USA using PyHive

Employment Analysis – How ? Finding data where international student work after their graduation

Based on the number students employed in current and past years

Number of employers hiring international students in every filed of the grad study [Job positions]

Page 21: Revenue & Employment Analysis of International Students in USA using PyHive

Thank You!