Report on Data Activities in China
Vice-President of CODATA-ChinaGeneral Director of CNIC, CAS
Dr. Yan Baoping [email protected]
CODATA-DSAO, Bangkok, Thailand, Jan.12-12,2006
Outline
• Requirements on Scientific Data
• Main Data Activities
• National Programs on Data Activities
• Summary
Requirements on Scientific Data
• Scientific discovery & innovation– Data form the foundation of scientific discovery– In the past, Scientific data explains the observable
world• Extraction of Essence• Explanation of the Complex• Prediction from Data
– Today, we have exciting new capability to observe nature
• Requires and generates large amounts of quality data• The information revolution coming with Computer and
internet technology• Data at the very heart of the revolution
Requirements on Scientific Data
• Data sharing and exchanging– Valuable national strategic resource– Full & open in a timely and equitable manner
to public– Closer cooperation and communication
among scientists– The frontiers of science, big science plan
need large-scale and large-scope data supporting
Main Data Activities
Scientific Data Activities
Data sharing and exchanging
Data sharing policies
Data and metadata
specification
Database Construction and
integration
Data quality control and assessment
Information systems
and platform
Applications and services
NetworkSuper computer
Storage
Support and consultant
National Programs on Data Activities
1. Scientific Database and Information System Founded by CAS since 1982
4. Land and Resource Data Center Founded by MLR since 2003
2. Platform Construction for National S&T Infrastructure Founded by MOST since 1999, Supported by MOST, MOE, MOF and NDFC since 2004
3. Scientific Basic Resource Platform of MOE Founded by MOE since 2004
5. NSF Scientific Data Collection and Sharing
Scientific Database and Information System (SDB)
• Founded by Chinese Academy of Sciences since 1982
• 1986-2000– 725GB– 180 databases– 19 member institutes involved
• 2001-2005– CAS initiated the Informatization Project– SDB is one of the infrastructures of CAS informatization
Scientific Database and Information System (SDB)
• 2002 Initiated three sub project of SDB– Scientific Data Resource Construction– Scientific Data Standard and Specification Making– System Platform Construction (Scientific Data Grid -SDG )
• 2003 Prof. Jiang Mianheng, Deputy president of CAS signed to formally initiate “Scientific Database and Information System Project”
• 2000-2005, CAS supports US$ 7.50 Million
Objectives of SDB
• Objectives– Expand and strengthen the data accumulation and
integration, sharing– Improve the digital environment for S&T research – Set up information and data service system for S&T
research and social development– Promote to transfer data to acknowledge
Organization of SDB
• Expert Committee (EC)• Executive Office (EO)• Scientific Database
Center (SDC)• Member Institute
EC
Inst. Of G
eography
Inst. of E
ngin
eer and
P
rocess
Inst. of M
icrobiology
······
SDC
CNIC
CAS
EO
Technical Training and Exchanging
• Biennial SDB Technical Symposium
– During “Ten-Five”, three symposiums and three publications
• Annual technical training– 5 technical training
Achievements and Activities SDB
1.Metadata Specification
3. Quality ControlAnd Assessment
4.Storage
5.Portal
SDB10. e-Science
网站服务正常率
0%10%20%30%40%50%60%70%80%90%
100%
2004
7年
月
2004
8年
月
2004
9年
月
2004
10
年月
2004
11
年月
2004
12
年月
2005
1年
月
2005
2年
月
2005
3年
月
2005
4年
月
2005
5年
月
2005
6年
月
2005
7年
月
2005
8年
月
2005
9年
月
2005
10
年月
2005
11
年月
2005
12
年月
2. Databases
6.Super Computer
7.Application8. High speed network
9. Supports
Growth of SDB Data Volume
• By 2005– 45 institutes of CAS– 503 databases– a gross volume of
16.6TB– 10TB are available on
the Internet
SDB Standards & Specifications
SDB Metadata Specification Framework
科学数据库元数据标准框架( 1.0 版、 2.0 版)
SDB Metadata Specifications
科学数据库核心元数据标准( 1.0 版、 1.1 版、 2.0 版)
大气科学数据元数据规范( 1.0 版)
生态研究科学数据元数据规范( 1.0 版、 1.1 版)
科学数据库植物图像元数据规范( 1.0 版)
羊八井宇宙射线科学数据元数据规范( 1.0 版)
生物物种编目数据元数据规范( 1.0 版、 2.0 版)
天然气水合物科学数据标记语言( GHML )(征求意见稿)
禽流感科学数据元数据规范( 1.0 版)
图像科学数据通用元数据规范( 1.0 版)
SDB Data Quality Control & Assessment Specifications
数据质量研究报告
科学数据库数据质量控制和评估框架体系
科学数据库数据质量评价过程
科学数据库数据质量管理办法(试行)
地学领域数据质量标准体系
SDB Data Sharing Policies
科学数据库共享政策研究报告
科学数据库数据共享办法(试行)
化学、生物学和地球科学领域数据库共享指南与数据库建设规范
SDB Metadata Registry System
Registry Search
Explore
Mapping
SDB System Platform
• Software and Hardware– Super server– TB Storage– Visualization System– High Speed Network– System Software
Super Server- Lenovo6800
59 Nodes ( 4-way )SAN : 20TB , 50TB Tape
2Gbps Network bands
High Speed Network
CAS e-Infrastructure
Infrastructure Item By 2000 2001 to now
Networking
core 1Gbps 2.5Gbps
backbone 2Mbps N*155M+2.5G
Oversea link 55Mbps 620M+12G
HPC
Peak TFLOPS 0.13 5.5
Linpack TFLOPS 0.05 4.3
Storage 2.1TB 182TB
Scientific Database
Member institutes 21 >45
Databases 180 503
Data volume 725GB 16.6TB
SDB Portal
Http:// www.csdb.cn • 7X24
• Login one time, access all
databases
SDB On-line Data Serving• 2003.8-2005.10
– Visiting number: 2.50Million– Page-viewing number: 18.50Million
Stat. of SDB portal
SDB Typical Applications
• 2000-2005, more than 100 application cases.
• Scientific research– Space Environment data is used to
guarantee the safety of the spaceflight with people
– Chemical data is used in the study of the anti-SARS
– Identify Avian Flu by using the data in the Virus Database
– Natural resource data is used to research the evolution of the ecology and environment and the sustainable development of soil and water in West China
– Genetic data is used in the SARS gene study
– …• Social and economy development
Applications Based on SDB
• International cosmic ray data processing system– IHEP of CAS、 CNICof CAS
• Avian Bird Flu Information Platform and Alarm System– IMB of CAS, IV of CAS,IZ OF CAS, CNIC OF
Yang Bajing Cosmic Ray System
Avian Bird Flu Prediction System
Platform Construction for National S&T Infrastructure (MOST)
Experimental base and large-scale
scientific apparatus sharing platform
Founded by (MOST) since 1999
Platform Construction for National S&T Infrastructure (MOST)
Natural scientific resource sharing
platform
Scientific data sharing Platform (SDSP)
Achievement-transfer and public serving
platform
Science research network environment
Scientific literature sharing platform
Scientific Data Sharing Platform (SDSP)
• More than US$ 12 Million• The whole project could be divided into three
sub-systems– 1 portal system– 20 scientific data centers or scientific data nets– over 300 main databases.
• By 2005, – 12 sharing trial scientific data centers have been
established.– Data sharing policies are made– Metadata specifications are made– The Portal system are developed
Portal of SDSP
http://www.sciencedata.cn
12 Science Data Centers (SDC)
SDSP
Basic SDC Earthquake SDC
Oceanic SDC
Sustainable Develop
Hydrological SDC
Rural S&T SDC
Medicine SDC
Earth System SDC
Forest SDC
Survey & Mapping SDC
Meteorological SDC
Agriculture SDC
SDC Website List
National Science Data Sharing Platform http://www.sciencedata.cn
Data-sharing Network of meteorological research http://cdc.cma.gov.cn/
Data-sharing Network of Chinese Earth System Science http://www.geodata.cn
Data-sharing Network for the Science of Surveying and Mapping
http://sms.webmap.cn/
Data-sharing Network of Chinese information on sustainable development
http://www.sdinfo.net.cn/
Chinese research net for Forestry http://www.caf.ac.cn/
Data-sharing Center of national Earthquake Sciences http://www.csdi.ac.cn/
China Hydro-information Network http://www.hydroinfo.gov.cn/
China fundamental database for Agricultural Sciences http://casdb.caas.net.cn/
China Rural Scientific Data Center http://www.crst.cn/
China Oceanic Data Center http://mds.coi.gov.cn/
China Medicine and Health Data Center http://www.bmi.ac.cn/sjgx/
Basic Science Center (CODATA-China involved) http://www.nsdc.cn/
Scientific Basic Resource Platform of MOE
• Founded by Ministry of Education since 2004.
• Integrating distributed scientific base resources in the various university with high information and communication technology.
• 17 subsystems, about 100 universities are involved
List of the Universities Involved号 项目编号 科技基础条件平台名称 牵头单位 参加单位
1 505001 畜禽种质资源标准化整理、整合及共享 中国农业大学 南京农业大学、华中农业大学、西北农林科技大学
2 505002 林木和花卉种质资源标准化整理、整合及共享 北京林业大学 东北林业大学、南京林业大学
3 505003 海洋地质环境数据整合及共享信息平台 同济大学 南京大学、中国地质大学(北京)
4 505004 中国大学数字博物馆共享平台与规范标准 南京大学 北京航空航天大学、中山大学、复旦大学、中国地质大学(武汉)、北京大学
5 505005 农作物特种遗传资源标准化整理、整合及共享 南京农业大学 华中农业大学、中国农业大学、浙江大学
6 505006 工业微生物资源标准化整理、整合及共享信息平台 江南大学 山东大学
7 505007 濒危野生动物基因资源库教育部共享平台 浙江大学 西北大学、东北林业大学
8 505008 海洋数据信息共享平台 中国海洋大学 中山大学、厦门大学、南京大学
9 505009 高校微生物资源标准化整理、整合与信息共享平台 武汉大学 山东大学、南开大学、中国农业大学、云南大学
10 505010 人类遗传基因信息数据整合及共享信息平台 华中科技大学 西安交通大学、复旦大学、东南大学、西北大学
11 505011 中国高校地学创新基础条件平台 中国地质大学 (武汉 )
中国地质大学(北京)、中国矿大学 (北京 )、北京大学、南京大学、西北大学
12 505012 中国人类遗传相关疾病资源库及共享信息平台 中南大学 西安交通大学、复旦大学
13 505013 生物标本整理、整合及共享信息平台 中山大学 四川大学、兰州大学、内蒙古农业大学、东北师范大学、南京大学
14 505014 中国妇女儿童疾病监控系统及生物资源平台 四川大学 山东大学
15 505015 中华民族群体遗传资源数据整合共享平台 西安交通大学 复旦大学、中山大学、哈尔滨医科大学、四川大学、兰州大学、新疆医科大学、西藏大学等
16 505016 西部地区特色植物种质资源数据平台 兰州大学 西北农林科技大学、青海大学、内蒙古农业大学、宁夏大学、新疆大学、西藏大学
17 505017 全国高校科技成果推广转化信息平台 科技发展中心
Land and Resource Data Center (LRDC)
• Founded by Ministry of Land and Resources since 2003– Information Center of the Ministry of Land and
Resources (Leading organization)– Bureau of Geological Survey– Chinese Academy of Geological Sciences – Chinese Land Surveying and Planning Institute under
the Ministry of Land and Resources
• 37 databases
Activities and Achievements LRDC
Founded by (MLR)
since 2003Land and Resource Data Center
(LRDC)
Land and resource data standardization
and integration
Information service system platform
Land and resource data sharing policies
and specifications
Public service system, such as data retrieval based on metadata
System Platform of LRDC
http://www.mlr.gov.cn/pub/sjgx/index.htm
NSFC Scientific Data Collection and Sharing
• In recent years, all projects supported by NSFC are required to provide the complete scientific data
• NSFC also made a series of regulations to manage and share the scientific data
• Scientific data are playing a key role in the national S&T innovation
Summary
• Main data activities in China, which includes metadata specification, data integration, information system, application and service, network and supercomputer.
• Scientific Data Sharing Platform (NSDSP) spent over US$12million to support data activities in the past 5 years.
• In 2005, Ministry of Science and Technology, Ministry of Finance, Ministry of Education and National Development and Reform Commission carried out “Suggestion on Implementing the Platform Construciton for National S&T Infrastructure during the 11th Five Year Plan”, which will strengthen the construction of National Science Research Foundation. Especially, in the next 5 years the budget for Scientific Data Sharing reaches US$375 millions.
Summary
• In the soon future, China government will organize a leading committee under Chinese State Department to guide and supervise the scientific data activities. Meanwhile all the databases such as SDB, LRDC and Data Platform of MOE will be gradually integrated into SDSP, which will serve as the support platform of S&T data in the national S&T innovation system.
• SDB achieved a lot in the past over 20 years, we made the SDB metadata specifications, established the super computing and high speed network environment, accumulated and integrated 16.6TB database and developed an unified information platform.
Summary
• Now SDB are serving as the support system of S&T data for CAS S&T innovation.
• In the next five years, CAS will promote the science research and S&T innovation by establishing the e-Science environment. And we believe the whole society will benefit from the e-Science.
• We have gotten some experience and lessons in the process of scientific data activities, we hope facilitate cooperation with other countries in scientific data activities, also in the science research.
Welcome Comments!And
Thanks A lot