[IEEE 2013 IEEE 9th International Conference on eScience (eScience) - Beijing, China (2013.10.22-2013.10.25)] 2013 IEEE 9th International Conference on e-Science - Keynote abstracts

Keynote Speakers

The Data Bonanza: Mining the Benefits while Escaping the Pitfalls

Malcolm Atkinson

School of Informatics, University of Edinburgh

Thanks to the rapid global digital revolution mankind has an ever-growing wealth of data. This burgeoning data wealth delivers a goldmine of opportunities and we see the frenzy of a new digital gold rush. Those societies, organizations and individuals who are agile and smart in their exploitation of data are leaping ahead. Today’s pressing global challenges demand even better use of data. The talk will illustrate the potential and then pose the question: “Are we going about this in the right way?” We will see a mixed story – wonderful achievements and worrying disappointments. A strategy will be presented to improve our performance in exploiting the data bonanza. Its key ingredients include:

Partitioning the conceptual space Balancing investment Establishing professional behavior Transforming education

Bio

Malcolm Atkinson PhD, FRSE, FBCS, CITP is Professor of e-Science in the School of Informatics at the University of Edinburgh. He is also Data-Intensive Research group leader. He is currently involved in six projects advancing the use of data in biology, medicine, seismology and rock physics. He was the driving force behind: “The DATA Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business”, published by John Wiley & Sons, April 2013. His principal goal is to stimulate a successful and professional approach to understanding and exploiting data.

He began his career in computing in 1966. He has worked at seven universities: Glasgow, Pennsylvania, Edinburgh, UEA, Cambridge, Rangoon and Lancaster; and for two companies: Sun Microsystems (at SunLabs in California) and O2 (an Object-Oriented DB company in its early years in Versailles). He led the development of the Department of Computing Science in Glasgow and returned to Edinburgh in 2001. Until recently, he was Director of the National e-Science Centre, Director of the e-Science Institute and the UK e-Science Envoy. He has more than 150 publications and has taken leading roles in national strategic research and infrastructure committees and been a member of many international advisory boards.

xv

Computing in High Energy Physics, beyond the Higgs

Ian Bird

CERN

It is more than 10 years since the first ideas of using a computing grid to solve the LHC computing problem were suggested. In this talk I will highlight some of the work that went into taking those computer science ideas and turning them into a real production infrastructure that played such a key role in helping LHC physicists produce such exciting results on the Higgs boson in such a short time. Of course, during that time technology has advanced, and together with some of the lessons learned during the building of the global computing infrastructure for LHC we can suggest pointers to what a future computing infrastructure for high energy physics may look like. At the same time many other sciences now start to have the capability of producing their own “big data” problems. I will discuss how High Energy Physics can work together with those other sciences to help build the next generation of e-infrastructure to manage data intensive science and research.

Bio

Dr. Ian Bird is currently the CERN Large Hadron Collider Computing Grid Project Leader, and also has responsibility in the CERN IT Department for all Physics Computing activities. He joined CERN in 2002 to participate in the LCG project to set up and deploy the worldwide grid in support of LHC computing. Prior to joining CERN he spent 6 years at Jefferson Lab in Virginia, USA, where he was head of the computing group and responsible for all aspects of computing for the laboratory. His background and Ph.D. are in particle physics, and he has many years of experience in software and computing for High Energy Physics experiments as well as in data analysis. Current research interests are in the areas of applying modern technologies to the management of distributed computing and distributed multi-petabyte scale data volumes for particle physics.

xvi

Strengthen Science and Technology Platform Construction, Promote S&T Resources Opening

Jing Su

National Science and Technology Infrastructure Centre

Scientific and technological resources, which include S&T instruments, S&T data, S&T literature and S&T network collaborative environments, etc, as the important national strategic resources, are necessary materials and information resources for S&T activities, and the foundation for promoting scientific and technological progress as well as innovation.

To promote opening and sharing of scientific and technological resources which are generated from researches funded by national finance, is an important base and method to support the development of national science and technology, is of great significance to the effort of building science and technology innovation capacity.

The Chinese government attaches great importance to opening and sharing of S&T resources, and consider it as a major strategic project to enhance our technological competitive capacity. The Ministry of Science and Technology, Ministry of Finance launched the NSTIP (Nation S&T infrastructure programmes), which is aimed to promote S&T resources opening and sharing and enhance S&T innovation capability.

The presentation will introduce the status, progress and effectiveness of NSTIP, under the background of informatization, china’s science and technology resources open and sharing face new situation and new requirements. We will analyze them and propose overall thinking and key tasks for next steps of platform construction and resource open and sharing, to promote the latter and efficient use of S&T resources. Meanwhile, we will show our views and suggestions on international science and technology resources open and sharing.

Bio

Dr. Su Jing, born in 1968, is the deputy director of National Science and Technology Infrastructure Center (NSTIC), the Ministry of Science and Technology (MOST) of China.Dr. Su worked in the department of Policy, Regulations and Reform of the MOST before the NSTIC, where he specialized in studies and formulation of national S&T development policies and regulations. During work, Dr. Su has involved in affairs of drafting the National Outline for Medium and Long Term S&T Development, formulating major national science and technology policies, implementing national technology innovation program and others, as industrial technology innovation strategic alliances, industrial technology innovation service platforms, etc.

xvii

In NSTIC, Dr. Su mainly engaged in promoting the construction of S&T Infrastructure platform to promote the S&T resources opening and sharing, directing studies on informationization and standardization of S&T resources, which of large-scale scientific instruments and equipments, science data, S&T literatures, and other information. He also committed to establish related evaluating and incentives mechanisms, to give impetus to open & sharing of S&T resources.

xviii

From Genes to Stars

Alexander Szalay

Johns Hopkins University

The talk will describe how science is changing as a result of the amazing amounts of data we are collecting from gene sequencers to telescopes and supercomputers. This “Fourth Paradigm of Science”, predicted by Jim Gray, is moving at full speed, and is transforming one scientific area after another. The talk will present various examples on the similarities of the emerging new challenges and how Jim Gray’s vision is realized by the scientific community.

Scientists are increasingly limited by their ability to analyze the large amounts of complex data available. These data sets are generated not only by instruments but also computational experiments; the sizes of the largest numerical simulations are on par with data collected by instruments, crossing the petabyte threshold this year. The importance of large synthetic data sets is increasingly important, as scientists compare their experiments to reference simulations. All disciplines need a new “instrument for data” that can deal not only with large data sets but the cross product of large and diverse data sets.

While the largest data sets have captured most of the public attention, they only represent the tip of the iceberg. What is often missed is that scientific data sets have a power law distribution. At one end are the very large data collections compiled by hundreds of scientists collaborating over multiple years. These projects typically have coherent data management plans and organization to ensure that the data products are accessible to a wide community. Nevertheless, the long-term curation of the data is still an unsolved problem.

At the other end of the distribution, in the “long tail,” are the very large numbers of small data sets, such as the images, spreadsheets and tables collected in laboratories and field studies. While the individual files are small, their numbers add up; in fact, there is as much data aggregated in these small items as in the biggest collections. On the other hand, these data sets are often not as well documented as their bigger counterparts. For most scientists there is little reward in becoming a data management expert and devoting the time required to documenting the data for later reuse. In fact, the process of manually cleaning data sets has been called the strip mining of big data: an ugly and resource intensive effort that leaves big scars.

xix

Our group at JHU currently maintains and hosts the SDSS SkyServer database, which has become the world’s most used astronomy facility. The SDSS3 project is still under way; Data Release 9 is soon to be released. The system is quite extensive, and operating it is less than trivial. There is a real-time monitoring system in place to detect errors as they occur.

The talk will also describe how we are turning several of our simulations into publicly accessible numerical laboratories. These projects span across several types of data sets, from turbulence to cosmology, and soon to include the output of ocean circulation models and atmospheric dynamics. These laboratories are being housed within our data-intensive instrument, the Data-Scope, supported by a recent MRI. There is an ongoing Teragrid project on MHD that will generate a several hundreds of TB datasets. There are several multi-faceted challenges related to this conversion, e.g. how to move, visualize, analyze and in general interact with Petabytes of data.

A prime example of data complexity comes from our environmental sensing deployments. Over the past five years we deployed sensor networks at Baltimore Ecosystem Study sites, the Atlantic rainforest of Brazil, Ecuador, the Atacama Desert, and multiple locations around the Chesapeake Bay. Even though these deployments largely shared purpose and equipment, it took considerable effort to plan, deploy, and manage every new network.

The talk will present a general collaborative frame-work that provides the simplicity of DropBox, offers a free and safe storage for users’ data, and allows us to harvest the metadata from the files and derive their broader context in a fairly automated fashion. We are focusing initially on astronomy and environmental science, but later explore a broader set of relevant disciplines. This technology will accomplish multiple goals: (a) make it easy for scientists to save their data into a common framework, (b) make it easy to extract and organize the metadata related, (c) enable joint analyses among the many data sets, small and large.

Bio

Alexander Szalay is the Alumni Centennial Professor of Astronomy at the Johns Hopkins University, and Professor in the Department of Computer Science. He is the Director of the Institute for Data Intensive Science. He is a cosmologist, working on the statistical measures of the spatial distribution of galaxies and galaxy formation. He is a Corresponding Member of the Hungarian Academy of Sciences, and a Fellow of the American Academy of Arts and Sciences. In 2004 he received an Alexander Von Humboldt Award in Physical Sciences, in 2007 the Microsoft Jim Gray Award. In 2008 he became Doctor Honoris Causa of the Eotvos University, Budapest. He enjoys playing with Big Data.

xx

Documents

[IEEE 2013 IEEE 9th International Conference on eScience (eScience) - Beijing, China (2013.10.22-2013.10.25)] 2013 IEEE 9th International Conference on e-Science - Keynote abstracts