Upload
thadthong-bhrammanee-ja-ay
View
298
Download
0
Embed Size (px)
Citation preview
1)����������� ������������������ DATAVERSE����������� ������������������ INITIATIVE;����������� ������������������ THE����������� ������������������ HARVARD����������� ������������������ DATAVERSE����������� ������������������
����������� ������������������ ����������� ������������������ 2)����������� ������������������ DATA����������� ������������������ SCIENCE
ดร.ทัดทอง พราหมณี Dr.Thadthong Bhrammanee
Apr 2, 2015
1
สัมมนาเชิงปฏิบัติการ “คลังข้อมูลที่เปิดเผย Open Data, Dataverse, Data Science” วันที่ 2 – 3 เมษายน 2558
ณ มหาวิทยาลัยสุโขทัยธรรมาธิราช จ.นนทบุรี
PART1:����������� ������������������ DATAVERSE����������� ������������������ INITIATIVE;����������� ������������������ THE����������� ������������������ HARVARD����������� ������������������ DATAVERSE
2
http://dataverse.org
https://thedata.harvard.edu/dvn/
“A Dataverse is a container for research data studies, customized and managed by its owner.”
[http://dataverse.org]
3
The Harvard Dataverse Network: A collaboration with the Institute for
Quantitative Social Science, the Harvard Library, Harvard University Information Technology and Harvard-Smithsonian Center for Astrophysics
(CfA)
Source: http://commons.wikimedia.org/wiki/File:Dataverse_Network_Diagram.jpg
4
DDI : Data Documentation Initiative
OAI-PMH : The Open Archives Initiative Protocol for Metadata Harvesting
Source: http://www.ddialliance.org/
Source: http://www.openarchives.org/pmh/
Source: https://www.oaforum.org/tutorial/image/DP-architecture.gif
9
Add data
Find data
Get recognition
For researcher
For journals
Search data across Dataverse Repositories
through Harvard Dataverse.
Data citation
Search Scope drop-down list:
1. Title - Title field of studies’ Cataloging Information.
2. Author - Author fields of studies’ Cataloging Information.
3. (Study) Global ID - ID assigned to studies.
4. Other ID - A different ID previously given to the study by another archive.
5. Abstract - Any words in the abstract of the study.
6. Keyword - A term that defines the nature or scope of a study. For example, elections.
7. Keyword Vocabulary - Reference to the standard used to define the keywords.
8. Topic Classification - One or more words that help to categorize the study.
9. Topic Classification Vocabulary - Reference used to define the Topic Classifications.
10. Producer - Institution, group, or person who produced the study.
11. Distributor - Institution that is responsible for distributing the study.
12. Funding Agency - Agency that funded the study.
13. Production Date - Date on which the study was created or completed.
14. Distribution Date - Date on which the tudy was distributed to the public.
15. Date of Deposit - Date on which the study was uploaded to the Network.
16. Time Period Cover Start - The beginning of the period covered by the study.
17. Time Period Cover End - The end of the period covered by the study.
18. Country/Nation - The country or countries where the study took place.
19. Geographic Coverage - The geographical area covered by the study. For example, North America.
20. Geographic Unit - The smallest geographic unit in which the study took place, such as state.
21. Universe - Universe of interest, population of interest, or target population.
22. Kind of Data - The type of data included in the file, such as survey data, census/enumeration data, or aggregate data.
23. Variable Information - The variable name and description in the studies’ data files, given that the data file is subsettable and contains tabular data. It returns the studies that contain the file and the variable name where the search term was found.
Source: http://thedata.harvard.edu/guides/dataverse-user-main.html#harvesting-section
Data citation
Source: http://best-practices.dataverse.org/data-citation/ 11
Joint Declaration of Data Citation Principles
Human-readable VS.Machine-readable
components
Source: http://datascience.iq.harvard.edu/files/datascience/files/rda2015-data-publishing-workflows-ecastro.pdf13
PART����������� ������������������ 2:����������� ������������������ DATA����������� ������������������ SCIENCE
23
• Variety (structured -> unstructured)• Volumn (Terabyres -> zettabytes)• Velocity (batch -> streaming data)
Information
Informationoverload
Filtering
• Linking• Tagging (metadata)• Taxonomies, folksonomies• Collective intelligence• Social tagging
• Social filter• Who, What, When, Where, Why
• Gather/Harvest• Extract/Analyze• Explore/Visualize
24
Source: https://www.coursera.org/course/datasci
Course Syllabus
Part 0: Introduction
• Examples, data science articulated, history and context, technology landscape
Part 1: Data Manipulation at Scale
• Databases and the relational algebra
• Parallel databases, parallel query processing, in-database analytics
• MapReduce, Hadoop, relationship to databases, algorithms, extensions, languages
• Key-value stores and NoSQL; tradeoffs of SQL and NoSQL
Part 2: Analytics
• Topics in statistical modeling: basic concepts, experiment design, pitfalls
• Topics in machine learning: supervised learning (rules, trees, forests, nearest neighbor, regression), optimization (gradient descent and variants), unsupervised learning
Part 3: Communicating Results
• Visualization, data products, visual data analytics
• Provenance, privacy, ethics, governance
Part 4: Special Topics
• Graph Analytics: structure, traversals, analytics, PageRank, community detection, recursive queries, semantic web
• Guest Lectures
26
Source: http://i2.wp.com/sciencereview.berkeley.edu/wp-content/uploads/2014/04/spring_2014_azam_01.jpg
The Berkeley Institute for Data Science (BIDS)
27
Source: http://i2.wp.com/sciencereview.berkeley.edu/wp-content/uploads/2014/04/ds_staticmap.png
Data science at UC Berkeley
28