Upload
ajay-ohri
View
114
Download
6
Tags:
Embed Size (px)
DESCRIPTION
an internship report by Decisionstats.com intern and IIT Student - Chandan R.
Citation preview
INTERNSHIP REPORT
At Decision Stats Consultancy 18th June 2014 - Present
Chandan Kumar Routray
Second Year Undergraduate Student
IIT Kharagpur, West Bengal
Table of Contents
I. Summary ................................................................ 2
II. An Overview: Thing that I learned ....................... 3
III. Blog Posts: During the Internship ........................ 4
IV. Appendix (Day wise Work) ................................... 5
Day 1 ................................................................................................... 5
Day 2 ................................................................................................... 6
Day 3 ................................................................................................... 7
Day 4-5 ............................................................................................... 8
Day 6-7 ............................................................................................... 9
Day 8-9 ............................................................................................. 10
Day 9-11 ........................................................................................... 11
Day 12-13 ......................................................................................... 12
Day 14-18 ......................................................................................... 14
Day 19-20 ......................................................................................... 15
Day 20-26 ......................................................................................... 16
Day 40+ ............................................................................................ 17
Summary
I have completed almost a month of this internship with Decision Stats
Consultancy. This internship has been a roller coaster ride for me from the
very beginning and the past four weeks were the most productive period
of my life in terms of learning. It has helped me changed into a more
disciplined and professional person. This internship helped me to acquire
a wide set of skills like analytics, web development, technical blog writing
etc. The daily update calls of Mr. Ajay Ohri, my guide for this internship
were scheduled at around 9 pm through Skype in which he reviews my
daily assignments, gives me some useful tips on how to manage my work
and to make it more presentable followed by the assignment for the next
day. Every day after these calls I found myself with a new target to
achieve in a given period of time, according to which I plan my next day
so as to achieve the same.
A typical assignment consist of learning a new thing like
coding or understanding a package in R or Python and performing an
exercise on the same, writing an informative blogpost on the things I
learned that day and a life experience blogpost. In this due course I
learned a lot of things: Coding in Python, R & JavaScript, Using packages
like Shiny, Rpy2, ggvis etc. in both R and Python, How to write a query in a
database using MySql, How to protect your website by SQL Injection,
Making your own website using Bootstrap, Automated extraction of data
from web and network, Working in the cloud and many other things.
Earlier I used to run away from writing stuffs but now I have become a
blogger and I am enjoying it too. I have also learned how to write an
informative blogpost, people have also started asking doubts on the
same and few of my blogposts are also re-blogged by some bloggers. Till
now I have written 14 tech blogpost about various thing that I have
learned, all of them were made very reader friendly by me.
I have been very excited about his internship from the very
beginning and now Mr. Ajay Ohri has offered me to continue this
internship for some more time for which I am very grateful to him.
An Overview: Thing that I learned
Programming Web Development
• Python
(www.codecademy.com)
• Java Script
• R (Swirl Package and
www.datacamp.com)
• Bootstrap (www.jetstrap.com)
• SQL and SQL Injection
• JavaScript(D3.js)
• Hosting via Dropbox
Writing Software
• Technical Blog Writing
www.python4analytics.wor
dpress.com
• Report Making
• Virtualization Software: Oracle
Virtual Box
VM Ware Player
• Database Management: My
SQL Workbench
Analytics Working on Cloud
• Data Extraction: Wireshark,
iMacros
• Analysing Data: Rstudio,
Python (Pandas,Rpy2)
• Result Presentation:
Rstudio(Shiny, ggvis, slidify),
d3.js
• AWS EC2: Starting an Instance,
Accessing the instance,
Installing Rstudio Server &
Ipython on it etc.
Big Data Other
• Apache Hadoop: On
Hortonworks Sandbox
• Hue: Hive, Pig, HCatalog
• Git • Infographics: Infogr.am
Blog Posts: During the Internship
Topics Links
Python https://python4analytics.wordpress.com/2014/06/18/python
-for-analytics-intro/
Installing Ipython https://python4analytics.wordpress.com/2014/06/19/installing-
ipython-on-anaconda/
Introduction to R https://python4analytics.wordpress.com/2014/06/20/introdu
ction-to-r-language-installing-swirl/
Pandas Library https://python4analytics.wordpress.com/2014/06/22/statistical-
python-pandas-library/
D3.js(JavaScr
ipt)
https://python4analytics.wordpress.com/2014/06/23/presen
ting-the-results-working-with-d3-js-a-javascript-library/
Datacamp vs. Swirl https://python4analytics.wordpress.com/2014/06/24/learning-r-
datacamp-com-vs-swirl-package/
Shiny https://python4analytics.wordpress.com/2014/06/26/shiny-
rstudio-web-application-framework-for-r/
Git https://python4analytics.wordpress.com/2014/06/26/using-git-for-
projects/
SAS https://python4analytics.wordpress.com/2014/06/30/intro-
to-sas-and-installation/
iMacros https://python4analytics.wordpress.com/2014/07/07/web-
scrapingdata-extraction-from-web-using-imacros/
SQL https://python4analytics.wordpress.com/2014/07/07/web-
scrapingdata-extraction-from-web-using-imacros/
EC2 https://python4analytics.wordpress.com/2014/07/18/setting-up-
rstudio-server-on-aws-ec2-instance/
Infogr.am https://python4analytics.wordpress.com/2014/07/25/infogra
m-infographics-made-easy/
Wireshark https://python4analytics.wordpress.com/2014/07/25/infogram-
infographics-made-easy/
Bootstrap https://python4analytics.wordpress.com/2014/07/01/make-
responsive-website-with-bootstrap/
Apache Hadoop https://python4analytics.wordpress.com/2014/08/06/installin
g-hortonworks-sandbox-hadoop
Appendix (Day wise Work)
Day 1
Task Given 1) Create a blog on http://blogger.com and http://wordpress.com
2) Start an account on code academy and send screenshot of initial Page. You will
be learning Python
3) Download and Install R from www.r-project.org
4) Write a blog post on your experience on Day 1 of internship
Work Update 1) Codecademy account started. Completed 23% of the beginner’s course on very
first day. Check my progress by visiting this link :- www.codecademy.com/imeckr
2) R downloaded and installed on my system.
3) Created a blog on Blogger.com. I have also posted my first Blog on it
http://analyticsinternship.blogspot.com/2014/06/day-1.html
Reference used
None
Remarks 1) Proper editing of the day1 blog, write more information oriented blog
2) URL of a web should be answer to a question(SEO)
3) How to select a good theme for your blog
Day 2
Task Given
1) Create a new blog on Wordpress.com with a catchier name, same content, better
editing, and its title URL should be the answer to a question on Google Search. Please
send me screenshots. What should be the reason for choosing an appropriate
theme?
2) Tags should be used and then you should share it on your Facebook, LinkedIn,
Twitter and Google Plus profiles- please send me screenshots of this.
3) Please earn at least 5 badges in Python in Code academy for tomorrow’s
submission
4) Read this page please - http://pandas.pydata.org/. Download and Install Pandas
5) Download and Install Ipython-http://ipython.org/
6) Blog on Day 2 (besides your existing edited and refined Day 1 blog)
Work Update
1) Created Wordpress blog https://python4analytics.wordpress.com/
Link to Day 1 blog :- https://python4analytics.wordpress.com/2014/06/18/python-for-
analytics-intro/
Link to Day 2 blog :- https://python4analytics.wordpress.com/2014/06/19/installing-
ipython-on-anaconda/
2) Earned 6 badges on Day 2 on codecademy
http://www.codecademy.com/imeckr
3) Ipython and Pandas downloaded and installed on system
4) Blog shared facebook, google+, linkedin accounts.
References used www.pandas.pydata.org, www.ipython.org. Also the respective documentation.
Remarks
1) Maintain two different blogs one on Blogger.com for work experience and one on
Wordpress.com for tech blogging on things I learn daily
Screenshots
Day 3
Task Given
1) Create accounts on topcoder, kaggle, github. Write one paragraph summary of
what these websites are, what advantages can you have by an account on this
2) Install swirl package in R (use Google on how to). Do one exercise. Show
screenshot
3) Go to Datacamp.com and create account. Do one exercise and show
screenshot.
4) Get 4 badges in Python and 2 badges in Java Script on Code Academy
5) Blog on this. Show screenshots of analytics of each blog- answer this question-
which are the metrics I should track for my blog if I want to make it better
Work Update 1) Codecademy status: Python completed 50%, Java script 21% with 23 badges, 187
points and 4 day streak.
2) Blogged about R on Wordpress
https://python4analytics.wordpress.com/2014/06/20/introduction-to-r-language-
installing-swirl/
3) Swirl package installed on system. Done few exercises
4) Created account on Datacamp. Done few exercises there too.
5) Creating account on sites like Topcoder, Kaggel and Github helps a user in many
ways. As, these sites already have a lot registered user from across the world, it act as
an online community of coders, designers, analyst, innovators etc. where users can
discuss their problems and ideas among themselves. It also helps a user to see where
exactly he/she stands now and how can he/she develop his/her talent in their
respective field. A user can also take up various courses, projects and even also
compete with other user.
6) Keeping track on following will make one's blog better
(i) Referrers: - From where are my visitors are getting redirected, where should i share
my blog more often?
(ii)Region of visitors: - Which region does most of my visitors belong?
(iii)Tags and Categories: - Shows which topic is more trending on search engines.
References used
www.swirlstats.com . Also its documentation.
Screenshots
Day 4-5
Task Given
1) CODING- Get to 60 % in Python and 40% in Java Script on Code Academy
2) STATISTICAL PYTHON -Go to http://pandas.pydata.org/pandas-
docs/stable/10min.html#min Blog on the experience
3) PRESENTATION OF RESULTS Go to http://d3js.org/ . Read it and Blog on it. (Part 2 is
shiny package in R from http://shiny.rstudio.com/tutorial/, Part 3 will http://slidify.org/
packages in R)
4) CODING- Do one modules in Swirl. Write a tech blog on what you have learnt
5) CODING- Go to Datacamp.com. Do one exercise and show screenshots.
Work Update
1) Completed 60% in Python and 40% in Java Script
2) Blog on Statistical Python :
https://python4analytics.wordpress.com/2014/06/22/statistical-python-pandas-library/
Blog on D3.js : https://python4analytics.wordpress.com/2014/06/23/presenting-the-
results-working-with-d3-js-a-javascript-library/
3) One module completed in Swirl
4) Completed one exercise on Datacamp.com
5) Blog on experience:
Day 3: http://analyticsinternship.blogspot.in/2014/06/day-3.html
Day 4-5:http://analyticsinternship.blogspot.in/2014/06/day-4-5.html
References used
www.d3js.org. Also its documentation.
Screenshots
Day 6-7
Task Given
1) Do one more module in Swirl
2) Do one exercise in Data Camp
3) Write Technical Blog Post on how the two are different, including plus and minus of
both (Swirl vs. Data Camp)
4) Read about using JS within R here http://timelyportfolio.blogspot.in/2013/04/d3-r-
with-rcharts-and-slidify.html
5) Complete 4 badges each in Python and JS
Work Update 1) Codecademy status: Python - 70% and Java Script - 50%
2) One module completed in Swirl Package.
3) One exercise completed on Datacamp
4) Read about the link that you had given
5) Blogged on Swirl Vs. Datacamp
http://python4analytics.wordpress.com/2014/06/24/learning-r-datacamp-com-vs-
swirl-package/
References used http://timelyportfolio.blogspot.in/2013/04/d3-r-with-rcharts-and-slidify.html
Screenshots
Day 8-9
Task Given
1) Make a demo app on Shiny. How is population of India and China changing over
time? How is the per capita GDP changing over time? Google for datasets. Send me
initial draft.
2) Install and Load SAS University Edition
http://www.sas.com/en_us/software/university-edition.html
3) Complete the exercises at https://try.github.io/
4) 3 tech blog posts on Shiny, GIT and SAS
Work Update 1) Made a demo app on shiny which can show one plot at time. Made a dataframe,
which I have used in the app.
2) Completed the GIT exercise.
3) Blog on Git https://python4analytics.wordpress.com/2014/06/26/using-git-for-
projects/
Blog on Shiny https://python4analytics.wordpress.com/2014/06/26/shiny-rstudio-web-
application-framework-for-r/
References used www.ggvis.rstudio.com, shiny.rstudio.com Also their documentations.
Screenshots
Shiny App
Day 9-11
Task Given
1) Use http://shiny.rstudio.com/gallery/ for troubleshooting your Shiny App
2) Use ggvis package somehow in your app http://ggvis.rstudio.com/ and
also use d3.js (hint - read this http://www.xavierdupre.fr/blog/2013-11-30_nojs.html)
3) Use and create one small demo showing data flow and calls from python and R
using ryp2. For example load some JSON data using python and then call a R
package.
4) Complete all pending blog posts
5) Make a small demo website using https://jetstrap.com/
6) Create an infographic for the same dataset that you are using in shiny dataset
using http://infogr.am/
7)Try and download and install this- this will help check for the VMware and also start
off big data efforts : http://hortonworks.com/products/hortonworks-sandbox/
Work Update 1) Build the Shiny app. Used ggvis but not D3js till now
2) Used rpy2 in python to import a built-in dataset from R and plotting a graph of that.
3) Tech blogpost on SAS : http://python4analytics.wordpress.com/2014/06/30/intro-to-
sas-and-installation/
4) Made an Infographic
5) Demo Website : I tried to re-create my blog http://jetstrap.io/share/cfcd9bc36a
References used
www.shiny.rstudio.com, www.xavierdupre.fr/blog/2013-11-30_nojs.html
Screenshots
Shiny App
Day 12-13
Task Given
1) Create Demo Website - Read this and try and create a website for Decision Stats
Consulting. Take content from the image in the post, and
http://decisionstats.com/about-decisionstats/ page.
2) For tomorrow Read about bootstrap http://getbootstrap.com/ and blog on it
3) Install MYSQL on your system (full installation). Learn SQL. Create a table with all
teams remaining round of 16 of all players. It should have player name, player
surname, football club, position he plays, one more additional column based on your
discretion. Then answer using SQL queries the following answers programmatically
Which World Cup team is now the tallest? Which is the oldest? Which is the shortest?
Which is the youngest? Which striker is the fattest/youngest?
4) Python - Make it to 90% by Wednesday
5) R- Finish swirl (all modules) by Wednesday
Work Update 1) Completed 90% Python course on Codecademy.
2)Learned about Bootstrap and revised HTML & CSS
3) Made "About Page" for Decision Stats by editing existing templates and adding
some new elements(Hosted the same using dropbox.com
http://imeckrdemo.kissr.com/)
4)Blogged on Bootstrap (https://python4analytics.wordpress.com/2014/07/01/make-
responsive-website-with-bootstrap/)
5)Completed all modules of R programming in Swirl
6)Installed MySQL on my system. Read about MySQL and currently learning it, will do
the assignment of the same after clearing doubts with you.
References used www.getbootstrap.com and its documentation
Screenshots
Mad
e T
his
Dem
o W
ebsi
te
Day 14-18
Task Given
1) Read on SQL Injection and SQL http://decisionstats.com/2013/03/26/how-to-learn-
sql-injection/ and try and do the demos at http://sqlzoo.net/hack/
2) What was the problem with SAS Installation? Blog on this AFTER you have successfully
installed it and shown screenshots
3) Compile everything you have learnt in 1 page essay. With appendix of day wise
submissions that you did.
4) Edit all the Blogs
Work Update 1) Learned Web Scrapping using iMacros, still facing some problem in extracting data
from some sites. Also wrote a blogpost on the same
https://python4analytics.wordpress.com/2014/07/07/web-scrapingdata-extraction-
from-web-using-imacros/
2) Created a basic table (in Database) using MySQL Workbench, practiced some basic
queries on it.
3) Learned about SQL injection by resources provided by you. Also blogged on the
same http://python4analytics.wordpress.com/2014/07/08/sql-and-sql-injection-a-web-
attack-technique/
References used http://decisionstats.com/2013/03/26/how-to-learn-sql-injection/ http://sqlzoo.net/hack/
Screenshots
Day 19-20
Task Given
1) Read these papers: http://www.slideshare.net/ajayohri/using-r-for-cyber-security-part-
1 and http://www.sis.pitt.edu/jjoshi/courses/IS2621/Spring2014/Lab3.pdf
2 ) Use Wireshark and/or Silk to capture some dummy data from a network ( wifi or
wherever)
3) Use the paper 1 to import the data in R and visualize it
4) Additional download and install wireshark and use the instructions from
http://www.ict.kth.se/courses/II2202/II2202-quantitative-chip-R-20110918.pdf to help you
with the analysis
Work Update
1) Learned Web Scrapping using iMacros, still facing some problem in extracting data
from some sites. Also wrote a blogpost on the same
https://python4analytics.wordpress.com/2014/07/07/web-scrapingdata-extraction-from-
web-using-imacros/
2) Created a basic table (in Database) using MySQL Workbench, practiced some basic
queries on it.
3) Learned about SQL injection by resources provided by you. Also blogged on the same
http://python4analytics.wordpress.com/2014/07/08/sql-and-sql-injection-a-web-attack-
technique/
References used
http://decisionstats.com/2013/03/26/how-to-learn-sql-injection/ http://sqlzoo.net/hack/
Screenshots
Day 20-26
Task Given
1) Complete python on codecademy.
2) Setup RStudio server on AWS and Blog on the same
3) Giving user rights to you, choosing the appropriate user rights.
4) Setup Ipython on AWS
Work Update
1) Completed Python on Codecademy
2) Created AWS account, set up RStudio server on it.
3) Blogged on the same https://python4analytics.wordpress.com/2014/07/18/setting-up-
rstudio-server-on-aws-ec2-instance/
References used
http://www.s-anand.net/blog/ssh-tunneling-through-web-filters/
http://www.r-bloggers.com/instructions-for-installing-using-r-on-amazon-ec2/
Screenshots
Day 40+
Task Given
1) Read these
http://www.slideshare.net/ajayohri/decision-making-in-the-era-of-cloud-computing-and-
big-data
http://www.slideshare.net/ajayohri/big-data-big-analytics
2) Explore Hadoop
3) Complete the tutorials on Hortonworks Sandbox
Work Update 1) Read the two papers provided by you
http://www.slideshare.net/ajayohri/decision-making-in-the-era-of-cloud-computing-and-
big-data
http://www.slideshare.net/ajayohri/big-data-big-analytics
2) Explored Hadoop: What is HDFS, Map Reduce, Pig, Hive etc.
3) Resolved that problem that I was having with Pig and Sandbox
4) Completed first two tutorials on Hortonworks Sandbox and Learned following things:
Basics commands in Pig(Grunt Shell), Downloaded a sample data and performed basic
Hive Queries on it
Blog Written
https://python4analytics.wordpress.com/2014/08/06/installing-hortonworks-sandbox-
hadoop/
Screenshots
Thank You