24
Previously known as Think Big. Move Fast.

Ds01 data science

Embed Size (px)

Citation preview

Page 1: Ds01   data science

Previously known as

Think Big. Move Fast.

Page 2: Ds01   data science

Template designed by

brought to you by

Page 3: Ds01   data science

SolidQ

• Born in 2002 in USA and Spain

• Established in 2007 in Italy

• More than 1000 customers and more than 200 consultants worldwide

• Dedicated to Data Management on the Microsoft Platform

• Books Authors, Conference Speakers, SQL Server MVPs and Regional Directors

• www.solidq.com

Page 4: Ds01   data science

Davide Mauri

• 18 Years of experience on the SQL Server Platform

• Specialized in Data Solution Architecture, Database Design, Performance Tuning, Business Intelligence

• Microsoft SQL Server MVP

• President of UGISS (Italian SQL Server UG)

• Mentor @ SolidQ

• Video, Book & Article Author

• Regular Speaker @ SQL Server events

• Projects, Consulting, Mentoring & Training

Page 5: Ds01   data science

Data ScienceReinassance 2.0

Page 6: Ds01   data science

“Companies are collecting mountains of information about

you, to predict how likely you are to buy a product,

and using that knowledge to craft a marketing message

precisely calibrated to get you to do so”

Page 7: Ds01   data science

Data Science

• Extraction of knowledge from data

• So, what’s new?

• Nothing. Except that it’s now economic and fast.

• It’s now applicable to everything. And we have a lot of data produced everyday that can be used to extract knowledge

Page 8: Ds01   data science

Data Science

DecisionsKnowledgeInformationData

Page 9: Ds01   data science

Data Science

• A Sum Of• Statistics• Mathematics• Machine Learning• Data Mining• Computer Programming• Data Engineering• Visualization• Data Warehousing• High Performance Computing

• To support (Informed) Decision Making• Data-Driven Decisions

Page 10: Ds01   data science

Data Scientist

• IBM• A data scientist represents an evolution from the business or data analyst role.

• The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math.

• What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge.

• It's almost like a Renaissance individual who really wants to learn and bring change to an organization.

Page 11: Ds01   data science

• Algorithms are the new gatekeepers

• They decided• What we find

• What we see

• What we buy

Page 12: Ds01   data science

Modern Data Environment

MasterData

EDWData Mart

Big Data

UnstructuredData

BI Environment

Analytics Environment

StructuredData

Page 13: Ds01   data science

Big Data

The 3 V

No, the 4 V!!!

No, no, the 5 V!!!!!

Page 14: Ds01   data science

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Page 15: Ds01   data science

Big Data

• Volume, Velocity, Variety, Veracity….V<your-v-here>

• Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time

• Grid Computing, Parallel Computing needed• keep processing time reasonable

• provide scalability

Page 16: Ds01   data science

Big Data Data

• Paradigm: “Store Now, Figure Out Later”• Data is the new resource. Never throw it away!

• Unstructured Data• Text Files

• Images

• Sounds

• Structured/Semi Structured Data• Sensors

• Transactions

• Logs

Page 17: Ds01   data science

Data Storage

• RDBMS• SQL Server

• Hadoop• HDInsight

• Hortonworks Data Platform

• Distributed File (Eco)System• CSV

• JSON

• *.*

Page 18: Ds01   data science

Data Storage

• Hadoop Ecosystem

http://hortonworks.com/hadoop-modern-data-architecture/

Page 19: Ds01   data science

Data Science & Big Data

• Data Science != Big Data

• Data Science Not Only on Big Data

• Data Science can be applied to Big Data

• Data Science starts from Small Data• 1) find the algorithm that extract knowledge

• 2) measure algorithm results and in terms of probability

Page 20: Ds01   data science

Machine Learning

• Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. (Wikipedia)• For example, a machine learning system could be trained on email messages to learn to

distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders.

• Flavors• Supervised

• Unsupervised

Page 21: Ds01   data science

Data Analysis

• Common Data Scientists Tools• R

• Weka

• Octave

• Scikit-Learn

• Common Data Scientists Languages• Python

• Scala

• F#

Page 22: Ds01   data science
Page 23: Ds01   data science

Resources

• https://www.coursera.org/• Data Scientist Specialization

• https://www.khanacademy.org/• Math

• http://www.osservatori.net/business_intelligence• Italian Big Data Market Analysis Resources

• http://www.solidq.com/consulting/• Data Science Services

• Big Data / Business Intelligence / Data Warehousing

Page 24: Ds01   data science

Previously known as

Think Big. Move Fast.