Mingfang Wu, Stefanie Kethers, Andrew Treloar Getting from
managed to reused: Making it easier for researchers to do something
useful with data
Slide 2
What is ANDS? ANDS is supported by the Australian Government
Began in 2009, currently funded to mid 2015 Collaboration between
Monash University, CSIRO and the Australian National University
Staff in 6 cities across the country Funded 200+ projects across 68
institutions ANDS aims to make data more valuable to researchers,
research institutions and the nation 2
Slide 3
So that researchers can easily publish, discover, access and
use research data through the Australian Research Data Commons. How
Do We Make Data More Valuable? Value
Slide 4
ANDS Programs Underpinning infrastructure for discovery and
citation (ARDC Core) Enable rich metadata about data to be managed
and accessible (Metadata Stores) Make new data and associated
metadata available from range of instruments (Data Capture) Make a
selection of existing data and associated metadata available from
Australias research-producing universities (Seeding the Commons)
Make data and associated metadata available from government
departments (Public Sector Data) Provide the overall policy and
practice frameworks to support better data management and re-use
(Frameworks and Capabilities) Demonstrate the value of doing all
these (Applications) 4
Slide 5
Tools for Data-reuse 5 Data Collections Metadata Data Form
Hypothesis Design & Run Experiment Publish Paper, Data,
Software Research Activities Look Up Data Analyse Data/Results
Discover Data Transform Data Visualise Data Analyse Data Register
Data Workflow Integrate Data Extract Data Computing
Slide 6
6 The ANDS Applications Program Funded through EIF (Education
Infrastructure Fund) Focus on Software Infrastructure to enable
research Goal of the Applications program: to produce compelling
demonstrations of the value of having data available for re-use
(i.e. enabling research across many sources of data that was not
previously possible).
Slide 7
Developed software might empower researchers to solve important
problems build new connections enable important problems to be
solved enable new questions to be answered simplify problems
accelerate solving problems, or analysing data 7
Slide 8
What have been funded under the apps program? 7 projects in
bio/characterisation 8 projects in climate change adaptation 10
others (urban planning, marine research, public health, humanity )
For a completed list of the apps projects and their profiles,
please visit ANDS project registry:
https://projects.ands.org.au/getAllProjects.php?start=app
https://projects.ands.org.au/getAllProjects.php?start=app 8
Slide 9
What kind of tools have been developed? Data transformation
Data linkage and integration Data service Data analysis and
modelling Data visulisation Data manipulation workflow . 9
Slide 10
Example Applications Climate Model Downscaling Data for Impacts
Research Cancer Genomics Linkage Application Brain Mapping National
Resource POSITIVE PLACES: Spatial Analysis of Public Open Space
10
Slide 11
Climate Model Downscaling Data for Impacts Research Regional
Climate Model Data Collection 11 Very big! High spatial and
temporal resolution Large region Many climate variables Many
atmospheric layers Multiple simulations Data on an irregular model
grid Stored in netCDF
Slide 12
12 Regional Climate Model Downscaling Data Agricultural Impacts
Researchers Hydrological Impact Researchers Health Impacts
Researchers Ecological Impacts Group
Slide 13
13 Climate Change Impact Researchers: I see some problems! What
is a Regional Climate Model? I dont have enough disk space for this
dataset on my computer I cant find data for the sites Im interested
in My software tools cant handle this irregular grid. I cant read
this netCDF data format This data set doesnt contain data for my
site This data gives me strange results for the current climate
This dataset is great! How can I share my work on it with others?
Impacts-relevant high res Very big! High spatial and temporal
resolution Large region Many climate variables Many atmospheric
layers Multiple simulations Data on an irregular model grid Stored
in netCDF Regional Climate Model Downscaling Data
Slide 14
14 Data service Climate Model Downscaling Data for Impact
Research (CliMDDIR) (AP04, UNSW) http://www.climddir.org/node/33
Provide open source software to transform RCM data Extract subsets
of data (e.g. variables, regions) Regrid or interpolate data to
sites Reformat data (e.g. GIS, ASCII, CSV) Calculate derived
variables (e.g. pan evaporation) Apply statistical corrections (if
necessary)
Slide 15
CliMDDIR Service 15 Collection Description at RDAService
Description at RDA
Slide 16
CliMDDIR Service Portal 16 Climate impact researchers can
select region select time coverage select variables select
simulation models select output format share (sub-set) data to
other researchers
Slide 17
Agricultural Impact Researchers 17 Assess how climate change
impact on wheat cropping in NSW using the APSIM agriculture model
Climate Modellers IT Specialists
Slide 18
Workflow - Cancer Genome Linkage Project 18 Challenges faced by
biologists and Clinicians: The manual process required to
integrated their research data with other data sets No availability
of standarised analytical processes The delay in transitioning from
analysis to publication ready result
http://ap27-cgla.blogspot.com.au/ Raw data tttctgaaga ccatggacta
tgagacctct Derived Data (i.e. mutation info) is released through
the ICGC Data Portal
Slide 19
Workflow - Cancer Genome Linkage Project 19 Variant detection
pipeline in Galaxy Provide software/infrastructure to enable
integration/transformation of multiple datasets within the GVL
environment Software Development by QFAB (Queensland Facility for
Advanced Bioinformatics, UQ) Development aligned with that of the
NeCTAR GVL Inclusion of the very large raw ICGC Pancreatic Dataset
into the NeCTAR GVL Development of (reusable) Galaxy Workflows for
easier mutation searching
Slide 20
Workflow - Cancer Genome Linkage Project 20 Screenshots of
output data
Slide 21
Workflow - Cancer Genome Linkage Project 21
Slide 22
Data Visualisation Brain Mapping National Resource Funded at
QCIF and Centre for Advanced Imaging, UQ Developed TissueStack that
can link to specific parts of the data,, and rapidly view and
collaboratively annotate on very large 3D datasets via a web
browser. For detail, please go to Dr. Andrew Jankes presentation on
Wed. 12:05 12:25, Room:P1 22
Slide 23
POSITIVE PLACES: spatial analysis of public open space Are the
current provisions of POS and parks adequate for the projected
urban densification and population growth? Will there be enough
POS? (i.e. will it meet the 10% land provision still?) Will the
provision of different park types and facilities that encourage use
by different population demographics (i.e. small pocket parks with
play equipment for young children) or for different uses (i.e.
active or passive recreation) be adequate? What more / less will be
needed? Is there sufficient large open space for active recreation
and sporting needs? What type of POS can promote increase social
connectedness within communities? Challenge: lack of a
comprehensive and consistent digital datasets of public open space
23
Slide 24
24 http://positiveplaces.blogspot.com.au/ Data integration and
interrogation: Public Open Space (POS) Tool developed at UWA With
advance features, users can: define area of interest directly on
screen upload a user defined region as a GIS shapefile scenario
test the relationship between changes in population structure for a
user defined area and the provision of POS POS statistics of a
searched suburb or LGA can be downloaded as an Excel spreadsheet
7624 areas of POS 3813 parks (up to 43 different facilities and
amenities per park) 820 school grounds/playing fields 1860 natural
and conservation or bushland areas 771 areas of residual green
space
Slide 25
Who benefit from the applications projects? Researchers Conduct
existing research more efficiently Enable new research Increase
research collaboration opportunities Strength relationship with
government agencies and industries Connect science to the public
Government agencies, urban planner, and infrastructure planner, The
public 25 Prof. Charles Watson, from Curtin University and
neuroscience Research Australia commented that The ability to share
data from cloud, access it through TissueStack, would make a huge
difference to the way we are able to interact, the ability for all
participates to access the same dataset, to annotate it and to have
a discussion on the way forward. Max De Antoni Migliorati (PhD
Candidate from QUT) on Semaphore: monitoring and Modelling
Australian Gas Emissions: It is much more time effective, it is
much more easier to get our result with Semaphore. Now I can run 5
simulation today, while a previous method, it took me one day to
get one simulation done.
Slide 26
Summary Substantial data infrastructures have been built to
enable data sharing and data reuse The ANDS application program has
demonstrated the value of data sharing and data reuse 26
Slide 27
Information ANDS project registry:
https://projects.ands.org.au/getAllProjects.php?start=all
https://projects.ands.org.au/getAllProjects.php?start=all Project
blogs: http://andsapps.blogspot.com.au/p/project- feed.html
http://andsapps.blogspot.com.au/p/project- feed.html Demonstrations
of value: http://andsapps.blogspot.com.au/p/resources.html
http://andsapps.blogspot.com.au/p/resources.html 27
Slide 28
Thanks To Ian Macadam (from UNSW) for providing some slides
about CliMDDIR project To all who have participated in and
contributed to the program 28