23
Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works. (GV) ESIP Federation Earth Science Data Analytics (ESDA) and Data Scientists February 20, 2014

Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Embed Size (px)

Citation preview

Page 1: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Presentation for Lawrence

Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation?Steve: See if this works. (GV)

ESIP Federation

Earth Science Data Analytics (ESDA) and Data Scientists

February 20, 2014

Page 2: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Presentation for Lawrence

Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation?Steve: See if this works. (GV)

Introductory Material

1. Analytics and Data Scientist...in the Federation (what we can contribute to the field)Flush out what expertise we have in the Federation on Analytics Techniques and Data Science.  This can result in a collection of text summarizing our experience/expertise.  Regarding Data Scientist, what would you like to see a Data Scientist do to help you  in your work. 

2.  Collaboration with the RDA Big Data Analytics Interest Group (Infrastructure Working Group) - Rahul will proivide us a briefing

3.  NIST Big Data Program - Wo Chang will provide us a briefing

RDA and NIST Big Data initiatives have jointly focused their interests on the Big Data Analytics Interest Group, and in particular, the Infrastructure Working Group ‘to establish best practices implementation guidelines for how to deploy and manage big data applications using NIST Big Data Reference Architecture (NBD-RA) and other big data architectures along with best technologies available today to meet the ever challenging big data application demands’

As a group, is this something we should collaborate with?   How can we contribute? 

4.  Data Scientist as a data user

Data Scientists are, obviously, data users.  Data scientists:  What are your Earth science data needs in regards to accessing and using Earth science data?  Looking for people with experience

Agenda

Page 3: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Relevant Links

Education for Data Scientists

http://www.mastersindatascience.org 

RDA Big Data Analytics Interest Group Charter

https://rd-alliance.org/groups/big-data-analytics-ig/wiki/big-data-analytics-interest-group-charter.html

NIST Big Data Program

http://bigdatawg.nist.gov/home.php

Page 4: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

ESDA Interest Group Concept

Federation Partners are forward thinking, by nature, and are industry leaders positioned to apply smart innovative ideas to conceptualizing, developing (implementing?) analytics tools and techniques that facilitate the use of large heterogeneous data sets, unique to serving large heterogeneous datasets.  

It truly appears that analytics, and data scientists to usher in analytics, is the logical next step in our quest up the pyramid to 'knowledge'.

We are not necessarily interested in repurposing existing tools and techniques, but understanding innovative tools and techniques that can be uniquely applied to large heterogeneous datasets.

Page 5: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

The Ultimate Long Term Goal

Page 6: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

On the continuum of ever evolving data management systems, we need to understand and develop ways that allow for data relationships to be examined, and information to be manipulated, such that knowledge can be enhanced, to facilitate science.

In short, we have a lot of data that we really have not provided opportunity for users to holistically ‘mine’.

Starting Point

Page 7: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Gartner’s big data definition –

“Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. (source: http://www.gartner.com/it-glossary/big-data/)

Page 8: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

The V’sConsider the “3 V’s”:

Data Management: Controlling Data Volume, Velocity and Variety. Current business conditions and mediums are pushing traditional data management principles to their limits, giving rise to novel and more formalized approaches

Big data spans four dimensions: Volume, Velocity, Variety, Veracity

o Volume: Enterprises are awash with ever-growing data of all typeso Velocity: Big data must be made available and used in a reasonable

time periodo Variety: Big data is any type of data - structured and non-structuredo Veracity: Trusting the information to make decisions.

(Source: http://www-01.ibm.com/software/data/bigdata/)

Page 9: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Thus…

Big data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times

The next frontier is learning how to manage Big Data throughout its entire lifecycle.

(Source: http://www.forbes.com/sites/ciocentral/2012/07/05/best-practices-for-managing-big-data/)

Big data is only as good as your analytics(Source: http://www.theserverside.com/news/2240178385/Big-data-trends-Big-things-in-store-for-2013)

Page 10: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Defining Terms: A Note

January 14:

At our kickoff session in January, it was noted that ‘large data’ is a relative term and it is something Earth sciences information management has been able to grow with.

Bringing ‘heterogeneous data’ together is the real challenge that faces Earth science research and applications

Page 11: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Analytics

Analytics vs AnalysisAnalytics is a two-sided coin. On one side, it uses descriptive and predictive models to gain valuable knowledge from data - data analysis. On the other, analytics uses this insight to recommend action or to guide decision making - communication. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology (Source: http://en.wikipedia.org/wiki/Analytics)

Page 12: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Analytics(http://steinvox.com/blog/big-data-and-analytics-the-analytics-value-chain/)

Page 13: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Data Scientist in the context of analytics

Data ScientistA data scientist possesses a combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and statistical skills as well as experience with algorithms and coding. Perhaps the most important skill a data scientist possesses, however, is the ability to explain the significance of data in a way that can be easily understood by others.   (Source: http://searchbusinessanalytics.techtarget.com/definition/Data-scientist)

Rising alongside the relatively new technology of big data is the new job title data scientist. While not tied exclusively to big data projects, the data scientist role does complement them because of the increased breadth and depth of data being examined, as compared to traditional roles. (Source: http://www-01.ibm.com/software/data/infosphere/data-scientist/)

Page 14: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Analytics Master's Degrees Programs

Page 15: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

What Defines a Data Scientist?

Organizational Leadership, Statistics, Data analysis, Probability, Algorithms for Data Science, Data Engineering, Machine Learning, Exploratory Data Analysis, HealthCare Analytics, Business Analytics

- Woodstock, AGU IN43A-1638: The role of the Data Scientist is a hybrid one, of not quite belonging and yet highly valued. With the skills to support domain scientists with data and computational needs and communicate across domains, yet not quite able to do the domain science itself. Role of the data scientist: Provides access to unified technology (open standards); know seamless access, training scientists in IT, build community (with scientists)

- Evans, AGU IN43A-1641: Data Scientist, who has a greater capacity in mathematical, numerical modeling, statistics, computational skills, software engineering and spatial skills and the ability to integrate data across multiple domains.

Page 16: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Some Noteworthy Comments From Jan. 14

• The purpose of the analytics is not to improve on the product but to use it to extract new information from datasets that are not typically analyzed together, holistically.

• How do we create enough metadata, so that there is one tool that everyone can use? We need better metadata.

• Discovery vs. modeling: Discovery issues are being addressed in other venues and that might not be something we need to worry about here.  Instead, we ask: How to cut across one or multiple data sets and come up with an answer that transcends these datasets?

• Analytics can provide the ability for the data to tell stories that have not been thought about.

• Hdo we make sure the bad analytics manipulations gets handled as well?•

Data scientists with appropriate domain expertise would know techniques that scientifically, not just mathematically, relate heterogeneous datasets.

Page 17: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Take away messages from Jan. 14:

• Address the ‘heterogeneous’ components of data/information, not the ‘big’ part.

• We have all this data, how do we make it more usable?• Provide a generalized framework, albeit different from

today’s tools and services framework structure, that will lead to advances to converting information to knowledge

• Examine use cases (see action) to flush out what is needed to architect an analytics ‘framework’.

Page 18: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

1. Analytics and Data Scientist...in the Federation (what we can contribute to the field)

• What expertise do we have in the Federation on Analytics Techniques and Data Science? 

This can result in an inventory of our experience/expertise and areas of interest to further pursue

• What would you like to see a Data Scientist do to help you  in your work?

I seriously do not think we can answer these questions in a telecom, but I would like ‘sign up’ folks to choose one or both of these questions of their interest, to brainstorm together (at a later date) for answers.

Page 19: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

1. Analytics and Data Scientist...in the Federation (what we can contribute to the field)

• What expertise do we have in the Federation on Analytics Techniques and Data Science? 

This can result in an inventory of our experience/expertise and areas of interest to further pursue

• What would you like to see a Data Scientist do to help you  in your work?

I seriously do not think we can answer these questions in a telecom, but I would like ‘sign up’ folks to choose one or both of these questions of their interest, to brainstorm together (at a later date) for answers.

Hmmm… OK, then e-mail me: [email protected]

Page 20: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

2. RDA, and 3. NIST, Activities

2.  Collaboration with the RDA Big Data Analytics Interest Group (Infrastructure Working Group) - Rahul will provide us a briefing

3.  NIST Big Data Program - Wo Chang will provide us a briefing

RDA and NIST Big Data initiatives have jointly focused their interests on the Big Data Analytics Interest Group, and in particular, the Infrastructure Working Group ‘to establish best practices implementation guidelines for how to deploy and manage big data applications using NIST Big Data Reference Architecture (NBD-RA) and other big data architectures along with best technologies available today to meet the ever challenging big data application demands’

Page 21: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Another look at Analytics(http://steinvox.com/blog/big-data-and-analytics-the-analytics-value-chain/)

Page 22: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

4.  Data Scientist as a Data User

Here we address the Data Scientists in the house:

Data Scientists are, obviously, data users. 

Data scientists:  What are your Earth science data needs in regards to accessing and using Earth science data? 

This information is very important for data providers to know how to best satisfy the needs of one of its users (See User Needs Analysis work, Lynnes/Wolf).

Again, we won’t answer it here, but I am looking for Data Scientists for contributions. Please let me know who you are.

Page 23: Presentation for Lawrence Chris: Do you know how to paste Gilberto’s sample presentation format into this Google Presentation? Steve: See if this works

Thank you for your attendance and attention

• I am sure I was not able to note everybody who attended this telecom… so please, e-mail that you did:

[email protected]

• Also, please e-mail your interest in applying your expertise to answering the earlier questions. Do it now, while you are thinking about it. (If you don’t respond, we’ll be losing an opportunity)

• Notes will be posted on the Wiki

• Thanks again