Upload
john-wilbanks
View
787
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Talk given to the meeting of the CENDI group in early November 2013. CENDI is a volunteer-powered membership organization that serves the federal information community - that is, all those who create, manage, aggregate, organize, and provide access to federally-funded data and publications resulting from the nation’s $150 billion annual investment in federal R&D. Member organizations represent a cross-section of federal data and publication providers, including libraries, data centers, aggregators, information technology developers, and content management providers.
Citation preview
1. the policy
environment. it is not sufficient.
http://www.systemswiki.org/images/8/8a/Wisdom.png
“is it open?” is perhaps not the
right frame.
accessibility
adaptability
leverage
ease of mastery
accessibility
adaptability
leverage
ease of mastery
EASY TO USE NO OPEN LICENSE
�17
�19
accessibility
adaptability
leverage
ease of mastery
NO OPEN LICENSE DOWNLOAD AVAILABLE DOCUMENTATION IN PDF
2. doing research in the open: early returns. it is not sufficient.
“how accurately can we predict if a female breast cancer survivor will develop a second tumor?”
may the best (statistical) model win
code sharing a prerequisite.
accuracy of model jumped three orders of magnitude in nine days.
�27
76% accurate.
�28
(not a biologist)
21 february 2013
17 april 2013
ongoing...
SHOW ME THE CODE!
...
...
...
...
...
if we don’t have the article in machinable form with rights to tranform? doesn’t happen.
can we predict clinical utility from genetics of arthritis?
can we predict scores on alzheimers cognitive tests from existing data?
accessibility
adaptability
leverage
ease of mastery
0
25
25
25
25
THREE OPTIONS TO DOWNLOAD NO CLEAR LICENSE PRIVACY RESTRICTIONS METADATA
accessibility
adaptability
leverage
ease of mastery
IMPACT OF PRIVATE INTERVENTION
68core projects
248researchers
28institutions
1070datasets
1723results
Omberg, et al. Nature Gene*cs
colorectal cancer subtyping
A
B
C
D
E
F
1
2
3
4
5
6
datasets subtypesanalysis groups
A
B
C
D
E
F
1
2
3
4
5
6
datasetsanalysis groups
G ...
subtypes
analysis groups
G
A
B
C
D
E
F
1
2
3
4
5
6
datasetsanalysis groups
G ...
subtypes
3. research and culture are
on a collision course, driven by data.
tension between anonymity and utility.
“more like plutonium than gold”
tension between expectation and reuse.
68% want their data shared for science
tension between value of individual and value of
aggregate.
$.50 to $2.50 for SSN, birthdate, etc.
$5 to $15 for credit, background checks.
~40 records for $2100
tension between “research” data and
“consumer” data.
https://www.scienceexchange.com/
it’s likely that we will end up with a data network
effect of some sort.
a. the incremental institution.
b. the walled garden.
c. big networks of small things.