33
1) DATAVERSE INITIATIVE; THE HARVARD DATAVERSE 2) DATA SCIENCE ดร.ทัดทอง พราหมณี Dr.Thadthong Bhrammanee [email protected] Apr 2, 2015 1 สัมมนาเชิงปฏิบัติการ คลังข้อมูลที่เปิดเผย Open Data, Dataverse, Data Science” วันที2 – 3 เมษายน 2558 มหาวิทยาลัยสุโขทัยธรรมาธิราช .นนทบุรี

Data hv seminar_thadthong_v05_slshr

Embed Size (px)

Citation preview

1)����������� ������������������  DATAVERSE����������� ������������������  INITIATIVE;����������� ������������������  THE����������� ������������������  HARVARD����������� ������������������  DATAVERSE����������� ������������������  

����������� ������������������  ����������� ������������������  2)����������� ������������������  DATA����������� ������������������  SCIENCE

ดร.ทัดทอง พราหมณี Dr.Thadthong Bhrammanee

[email protected]

Apr 2, 2015

1

สัมมนาเชิงปฏิบัติการ “คลังข้อมูลที่เปิดเผย Open Data, Dataverse, Data Science” วันที่ 2 – 3 เมษายน 2558

ณ มหาวิทยาลัยสุโขทัยธรรมาธิราช จ.นนทบุรี

PART1:����������� ������������������  DATAVERSE����������� ������������������  INITIATIVE;����������� ������������������  THE����������� ������������������  HARVARD����������� ������������������  DATAVERSE

2

http://dataverse.org

https://thedata.harvard.edu/dvn/

“A Dataverse is a container for research data studies, customized and managed by its owner.”

[http://dataverse.org]

3

The Harvard Dataverse Network: A collaboration with the Institute for

Quantitative Social Science, the Harvard Library, Harvard University Information Technology and Harvard-Smithsonian Center for Astrophysics

(CfA)

Source: http://commons.wikimedia.org/wiki/File:Dataverse_Network_Diagram.jpg

4

DDI : Data Documentation Initiative

OAI-PMH : The Open Archives Initiative Protocol for Metadata Harvesting

Source: http://www.ddialliance.org/

Source: http://www.openarchives.org/pmh/

Source: https://www.oaforum.org/tutorial/image/DP-architecture.gif

5

6

7

8

9

Add data

Find data

Get recognition

For researcher

For journals

Search data across Dataverse Repositories

through Harvard Dataverse.

Data citation

Search Scope drop-down list:

1. Title - Title field of studies’ Cataloging Information.

2. Author - Author fields of studies’ Cataloging Information.

3. (Study) Global ID - ID assigned to studies.

4. Other ID - A different ID previously given to the study by another archive.

5. Abstract - Any words in the abstract of the study.

6. Keyword - A term that defines the nature or scope of a study. For example, elections.

7. Keyword Vocabulary - Reference to the standard used to define the keywords.

8. Topic Classification - One or more words that help to categorize the study.

9. Topic Classification Vocabulary - Reference used to define the Topic Classifications.

10. Producer - Institution, group, or person who produced the study.

11. Distributor - Institution that is responsible for distributing the study.

12. Funding Agency - Agency that funded the study.

13. Production Date - Date on which the study was created or completed.

14. Distribution Date - Date on which the tudy was distributed to the public.

15. Date of Deposit - Date on which the study was uploaded to the Network.

16. Time Period Cover Start - The beginning of the period covered by the study.

17. Time Period Cover End - The end of the period covered by the study.

18. Country/Nation - The country or countries where the study took place.

19. Geographic Coverage - The geographical area covered by the study. For example, North America.

20. Geographic Unit - The smallest geographic unit in which the study took place, such as state.

21. Universe - Universe of interest, population of interest, or target population.

22. Kind of Data - The type of data included in the file, such as survey data, census/enumeration data, or aggregate data.

23. Variable Information - The variable name and description in the studies’ data files, given that the data file is subsettable and contains tabular data. It returns the studies that contain the file and the variable name where the search term was found.

Source: http://thedata.harvard.edu/guides/dataverse-user-main.html#harvesting-section

Data citation

Source: http://best-practices.dataverse.org/data-citation/ 11

Joint Declaration of Data Citation Principles

Human-readable VS.Machine-readable

components

DATASET VERSIONING

Source: http://best-practices.dataverse.org/data-citation/12

Source: http://datascience.iq.harvard.edu/files/datascience/files/rda2015-data-publishing-workflows-ecastro.pdf13

14

15

16

Example:

17

Example:

18

Example:

Example:

Future:

Events:

22

PART����������� ������������������  2:����������� ������������������  DATA����������� ������������������  SCIENCE

23

• Variety (structured -> unstructured)• Volumn (Terabyres -> zettabytes)• Velocity (batch -> streaming data)

Information

Informationoverload

Filtering

• Linking• Tagging (metadata)• Taxonomies, folksonomies• Collective intelligence• Social tagging

• Social filter• Who, What, When, Where, Why

• Gather/Harvest• Extract/Analyze• Explore/Visualize

24

Source: http://projects.iq.harvard.edu/files/iqss-harvard/files/data-science_jan2015.png25

Source: https://www.coursera.org/course/datasci

Course Syllabus

Part 0: Introduction

• Examples, data science articulated, history and context, technology landscape

Part 1: Data Manipulation at Scale

• Databases and the relational algebra

• Parallel databases, parallel query processing, in-database analytics

• MapReduce, Hadoop, relationship to databases, algorithms, extensions, languages

• Key-value stores and NoSQL; tradeoffs of SQL and NoSQL

Part 2: Analytics

• Topics in statistical modeling: basic concepts, experiment design, pitfalls

• Topics in machine learning: supervised learning (rules, trees, forests, nearest neighbor, regression), optimization (gradient descent and variants), unsupervised learning

Part 3: Communicating Results

• Visualization, data products, visual data analytics

• Provenance, privacy, ethics, governance

Part 4: Special Topics

• Graph Analytics: structure, traversals, analytics, PageRank, community detection, recursive queries, semantic web

• Guest Lectures

26

Source: http://i2.wp.com/sciencereview.berkeley.edu/wp-content/uploads/2014/04/spring_2014_azam_01.jpg

The Berkeley Institute for Data Science (BIDS)

27

Source: http://i2.wp.com/sciencereview.berkeley.edu/wp-content/uploads/2014/04/ds_staticmap.png

Data science at UC Berkeley

28

29

30

31

32

Thank you :)

33