10
Data Science Bootcamp, Day 1 Presented By: Chetan Khatri, Volunteer Teaching assistant, Data Science Lab. Guidance By: Prof. Devji D. Chhanga, University of Kachchh

Data science bootcamp day1

Embed Size (px)

Citation preview

Page 1: Data science bootcamp day1

Data Science Bootcamp, Day 1Presented By: Chetan Khatri, Volunteer Teaching assistant, Data Science Lab.Guidance By: Prof. Devji D. Chhanga, University of Kachchh

Page 2: Data science bootcamp day1

AgendaAn Introduction to Data Science with Industrial perspective.

An Introduction to Distributing System

CAP Theorem

Collection frameworks in Java

Page 3: Data science bootcamp day1

Querying in Distributed MySQL

Example

●Facebook CEO Mark Zukerberg wants to Analyze Data, he wants to know How many Daily Users ? , How many Daily Messages ?

●Assume that, Facebook’s Data center is available at California, Germany, Japan, Bangalore, Kenya.

●Assume that, Facebook is using MySQL as their Data storage RDBMS, How can he get it ?

●SQL ???

Page 4: Data science bootcamp day1

Querying in Distributed MySQL (conti…)

Let’s think ! You have to Query for Single Node ! Let’s Start !

Mark wants to have Analytics Chart so he can do Analytics on top of that, so he can have ratios of customer behaviour such as how many users are churning / leaving his platform which includes the Facebook + WhatsApp !

Assume, Table Structure are as below.user_master

User_id (PK)

Created_on(DATE)

last_updated_on(DATE)

transaction_master

trans_id (PK)

user_id(FK)

timespan(DATE)

Page 5: Data science bootcamp day1

Querying in Distributed MySQL (conti…)

1)Daily Users

Desired Output:Date Daily Users

12-05-2016 32,00,000

13-05-2016 21,00,854

14-05-2016 22,54,246

15-05-2016 32,51,230

Query:

Select last_updated_on as “Date” , count(user_id) as “Daily Users” from user_master group by last_updated_on order by last_updated_on;

Page 6: Data science bootcamp day1

Querying in Distributed MySQL (conti…)

1)Daily Messages by User

Desired Output:User Date Messages

Drew Houston 12-08-2016 700

Satya Nadella 12-08-2016 652

Sundar Pichai 12-08-2016 352

Tim Cook 12-08-2016 154

Query:

Home work !

Page 7: Data science bootcamp day1

Collection Framework in Java

How could you think About Hashset in Java?

How could you think About ArrayList in Java?

Page 8: Data science bootcamp day1

Concurrency in Java

Why Threading ?

How it can help you to optimize the performance / throughput ?

Page 9: Data science bootcamp day1

Q & A sessionQuestions Please !!

Page 10: Data science bootcamp day1

Than

k yo

u Chetan Khatri, Volunteer Teaching Assistant, Data Science Lab, University of Kachchh.

Email: [email protected]

Github Data Science Lab: https://github.com/dskskv

CCCS936 Repository: https://github.com/dskskv/CCCS936