View
119
Download
3
Category
Tags:
Preview:
DESCRIPTION
Short presentation about my final project at Zipfian Academy about quantifying Data Scientist profiles using Linkedin data. The prototype web app is available at: bit.ly/cybads
Citation preview
Could You Be a Data Scientist?
Carlo Torniai, Ph.D.@carlotorniai
• Quantify data scientist profiles features • Analyze aspirant data scientist profiles• Provide useful feedback
Goal
?
Why this is relevant?
• A quantitative characterization of data scientists profiles can help closing the loop between job seekers and recruiters
Image: http://www.getelastic.com/wp-content/uploads/puzzle1.jpg
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
• Linkedin API:– General Information– Past work history– Education
• Web Scraping:– Skills
• 1500 profiles– Data Scientists– Software Engineer– Business Analysts– Mathematicians– Statisticians
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
Business AnalystsData scientists
Software Engineers
StatisticiansMathematicians
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
Bioi
nfor
mati
cs
Biol
ogy
Com
pute
r Sc
ienc
e
Econ
omic
s
Elec
tron
ics
Astr
onom
y
Mat
h
Neu
rosc
ienc
e
Oth
er
Phys
ics
Psyc
holo
gy
Stat
s
Engi
neer
ing
Number of PhDs by topic and profiles
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
For the purpose of this project I trained with skills and education features the following models:Random Forest• Classify the profileNaïve Bayes• Multi class probabilities to asses profiles
background componentsK-means• Capability of suggesting similar and relevant profiles
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
For the purpose of this project I trained with skills and education features the following models:
Model Training set Purpose
Random Forest
All 5 categories Classify the profile
Naïve Bayes 4 classic categories: SE, BA, MT, ST
Asses profile backgrounds components with multi class probabilities
K-means All 5 categories Identify similar profiles
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
bit.ly/cybads
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
Naïve BayesMulti class probabilities
Random Forest
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
K-meansclustering
Next Steps
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
Get more data:- Other websites- Indeed- User input on
Web app
- Fine grained parsing of education- Experiment with additional features (industry, years of experience)
• Extend feature set and test more models
• Fuzzy C-means
• Add interactive data collection
• Personalized links for skills
• Explanation about similarity results
Close the loop by analyzing job offers and suggest matching profiles
Thank you!
Technologies
Web App: Flask, jQuery, Vega, MongoDB
NMF, HC, RF ,DT, NB, K-means models:: scikit-learn
Visualizations:Vincent, Vega, NetworkX, Gephi
Acknowledgementyatish27 : Ruby Linkedin public profile Web Scraperozgut : Linkedin API Python wrapper
Recommended