Upload
dotnetcampus
View
139
Download
0
Tags:
Embed Size (px)
Citation preview
Previously known as
Think Big. Move Fast.
Template designed by
brought to you by
SolidQ
• Born in 2002 in USA and Spain
• Established in 2007 in Italy
• More than 1000 customers and more than 200 consultants worldwide
• Dedicated to Data Management on the Microsoft Platform
• Books Authors, Conference Speakers, SQL Server MVPs and Regional Directors
• www.solidq.com
Davide Mauri
• 18 Years of experience on the SQL Server Platform
• Specialized in Data Solution Architecture, Database Design, Performance Tuning, Business Intelligence
• Microsoft SQL Server MVP
• President of UGISS (Italian SQL Server UG)
• Mentor @ SolidQ
• Video, Book & Article Author
• Regular Speaker @ SQL Server events
• Projects, Consulting, Mentoring & Training
Data ScienceReinassance 2.0
“Companies are collecting mountains of information about
you, to predict how likely you are to buy a product,
and using that knowledge to craft a marketing message
precisely calibrated to get you to do so”
Data Science
• Extraction of knowledge from data
• So, what’s new?
• Nothing. Except that it’s now economic and fast.
• It’s now applicable to everything. And we have a lot of data produced everyday that can be used to extract knowledge
Data Science
DecisionsKnowledgeInformationData
Data Science
• A Sum Of• Statistics• Mathematics• Machine Learning• Data Mining• Computer Programming• Data Engineering• Visualization• Data Warehousing• High Performance Computing
• To support (Informed) Decision Making• Data-Driven Decisions
Data Scientist
• IBM• A data scientist represents an evolution from the business or data analyst role.
• The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math.
• What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge.
• It's almost like a Renaissance individual who really wants to learn and bring change to an organization.
• Algorithms are the new gatekeepers
• They decided• What we find
• What we see
• What we buy
Modern Data Environment
MasterData
EDWData Mart
Big Data
UnstructuredData
BI Environment
Analytics Environment
StructuredData
Big Data
The 3 V
No, the 4 V!!!
No, no, the 5 V!!!!!
http://www.ibmbigdatahub.com/infographic/four-vs-big-data
Big Data
• Volume, Velocity, Variety, Veracity….V<your-v-here>
• Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time
• Grid Computing, Parallel Computing needed• keep processing time reasonable
• provide scalability
Big Data Data
• Paradigm: “Store Now, Figure Out Later”• Data is the new resource. Never throw it away!
• Unstructured Data• Text Files
• Images
• Sounds
• Structured/Semi Structured Data• Sensors
• Transactions
• Logs
Data Storage
• RDBMS• SQL Server
• Hadoop• HDInsight
• Hortonworks Data Platform
• Distributed File (Eco)System• CSV
• JSON
• *.*
Data Storage
• Hadoop Ecosystem
http://hortonworks.com/hadoop-modern-data-architecture/
Data Science & Big Data
• Data Science != Big Data
• Data Science Not Only on Big Data
• Data Science can be applied to Big Data
• Data Science starts from Small Data• 1) find the algorithm that extract knowledge
• 2) measure algorithm results and in terms of probability
Machine Learning
• Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. (Wikipedia)• For example, a machine learning system could be trained on email messages to learn to
distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders.
• Flavors• Supervised
• Unsupervised
Data Analysis
• Common Data Scientists Tools• R
• Weka
• Octave
• Scikit-Learn
• Common Data Scientists Languages• Python
• Scala
• F#
Resources
• https://www.coursera.org/• Data Scientist Specialization
• https://www.khanacademy.org/• Math
• http://www.osservatori.net/business_intelligence• Italian Big Data Market Analysis Resources
• http://www.solidq.com/consulting/• Data Science Services
• Big Data / Business Intelligence / Data Warehousing
Previously known as
Think Big. Move Fast.