Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
This is not the final version of the paper. The final version is available here: https://journals.sagepub.com/doi/10.1177/0539018419895456#_i10. Please cite as: Pinel, C. 2020. When more data means better results: Abundance and scarcity in research collaborations in epigenetics. Social Science Information [online first: doi.org/10.1177/0539018419895456]
When more data means better results: Abundance and scarcity in research
collaborations in epigenetics
Clémence Pinel
Centre for Medical Science and Technology Studies, University of Copenhagen, Denmark
Correspondence to
Dr Clémence Pinel
Centre for Medical Science and Technology Studies
Department of Public Health
University of Copenhagen
Øster Farimagsgade 5
1014 Copenhagen K
Denmark
Abstract
Drawing upon ethnographic findings from an epigenetics research laboratory in the United Kingdom,
this paper explores practices of research collaborations in the field of epigenetics, and epigenomics
research consortia in particular. I demonstrate that research consortia are key scientific infrastructures
that enable the aggregation of masses of data deemed necessary for the production of results and the
fostering of epistemic value. Building on STS scholarship on value production, and the concept of
asset, I show that the production of valuable research within epigenomics research consortia rests on
the active organisation and management of abundance and scarcity. It involves shaping and
standardising the masses of data gathered in consortia, while it also entails research teams enclosing
1
their data within their laboratories’ walls. As they do so, research teams construct data into scarce and
monopolised assets, which they can put to productive use in collaborative endeavours against a
revenue. In addition to contributing empirical and critical insights into the ways epigenetics
knowledge is formed and negotiated in specific research contexts, this paper offers conceptual tools to
examine and problematise knowledge production practices in data-intensive research more broadly. In
particular, it points out that while contemporary big biology is marked by the generalised imperative
to ‘share’ data and ‘open’ science, collaborative endeavours within research consortia are built around
forms of exclusions.
Keywords: epigenetics, scientific collaborations, research consortium, data, value, rent, asset
Introduction
Epigenetics research refers to the study of the processes that control gene expression but do not entail
a change in DNA sequence (Armstrong, 2014). Research in this field studies the genome in relation to
an extensive and complex developmental machinery. More specifically, epigenetics research
underlines the reactivity of the genome to environmental signals, and explores how environmental
factors, ranging from child abuse, to smoking or exposure to chemicals, can impact gene regulation by
leaving marks on the epigenome (Pinel et al., 2018). In other words, in epigenetics, the point of focus
is not genes per se, but what surrounds the genes – the ‘epi’ to the genes.
Epigenetics has been a rapidly growing field in the world of bioscience (Haig, 2012). This field of
research builds on specific technologies, such as next generation sequencing technologies through
which large-scale epigenomic maps and datasets can be obtained. For example, a common approach
used to detect DNA methylation – the most studied epigenetic change – and enable the production of
large datasets is methylated DNA immunoprecipitation sequencing (MeDIP-seq) (Pinel et al., 2018).
Research collaborations, which enable the putting together of resources (Hackett, 2005b), and in
particular, the ‘sharing’ of data, are also central to the growth of the epigenetics field. National and
international consortia1 specifically dedicated to epigenetics research enable teams across continents
to put together their data (some of these consortia are formed to collect new data, while some organise
the sharing of existing data) and conduct meta-analyses. The purpose of such studies is to examine
data across a number of independent cohorts and determine overall trends. Teams explore associations
between variables and ensure that their results are statistically significant, that is, that the likelihood
that the relationship observed between variables is caused by something other than chance.
1 The term consortium (a Latin word meaning “partnership” or “association”) is commonly used in academic research to refer to the coming together of research groups with an interest in one topic area. Well-known examples of this type of research collaboration in epigenetics involve the International Human Epigenome Consortium (IHEC), the NIH Roadmap Epigenomics Mapping Consortium, the BLUEPRINT project, the Genetics of DNA Methylation Consortium (GoDMC).
2
In this paper, I study epigenetics research by taking a close look at research collaborations and data
sharing practices together, specifically unpacking how knowledge is produced within epigenomics
research consortia. Collaborative endeavours within research consortia are often based on the
assumption that “more data means better results”. That is, a common way to ensure statistically
significant results and detect associations between, for example, DNA methylation and rare or
complex traits is to increase the sample size of the study by aggregating more data. Research consortia
enable teams to do just this by reaching out to others who have similar data and working together to
create “bigger” studies with larger sample sizes. Put differently, research consortia enable the
abundance of data across the epigenetics research community and the production of valuable
knowledge. By valuable knowledge I mean knowledge which can be fostered to provide different
forms of value2, according to the multiple evaluative frameworks that coexist within 21 st century
universities3, and in particular economic value through the production of wealth, health value which
entails the enhancement of health, and epistemic value through the pursuit of truth. As will become
apparent in the later sections of this paper, epigenetic scientists recurrently engage in research
consortia driven by concerns for epistemic value, as they hope to produce scientific results, which are
worthy of publications in high impact-factor journals. In this paper, I concentrate on the processes
through which research consortia lead to the production of epistemic value, and the ways in which
epigenomic data are mobilised, worked on and managed in this process. Specifically, I focus on the
ways the abundance of data is fostered by research consortia, examining what happens to data once it
is ‘shared’. I analyse how the masses of data are shaped into standardised research resources and
foreground the organisational arrangements that enable research teams to manage the use of their data
in the collaborative endeavour. I show that while research consortia are built to create an abundance
of data, the production of epistemic value in these collaborative endeavours rests on the construction
of scarcity.
While this case study is concerned with epigenomic research data, and epigenetics research more
broadly, the observations I make in this field are relevant to other contexts where masses of research
data are created and mobilised towards the production of knowledge. There are, however, some
characteristics in the field of epigenetics research that make the data and research collaborations
2 Rose and Novas (2004) discuss three types of biovalue: economic value through the production of wealth, health value which entails the enhancement of health, and ethical value.3 As Pinel (2019) pointed out, academic science is characterised by the coexistence of different understandings of what constitutes valuable knowledge, and a plurality of criteria for evaluating research. For example, academic research is increasingly understood as an activity that can be fostered to seek economic profit through the commercialisation of research (Mirowski, 2011, Slaughter and Leslie, 1997). This represents an evaluative framework based on values of competition and production. It coexists with the conception of research as a collective endeavour in search for intellectual debate, based on the value of communalism, education and truth (Etzkowitz, 2011). It also coexists with an evaluative framework centred around productivity, which is operationalised in terms of publications or recorded citation.
3
practices described in this paper particularly salient. I address these points throughout the paper and in
the discussion.
I begin this paper by briefly discussing how science policy and Science and Technology Studies
(STS) literatures have attended to the topic of research collaborations and data sharing. Next, I discuss
the analytical framework I employ in this paper to study knowledge production practices in research
consortia, emphasising what the lens of value production and the concept of asset enables us to do
when thinking about data-intensive research collaborations. This is followed by a brief discussion of
the research methods and the laboratory in which the study was carried out. I then turn to an analysis
of the practices of research collaborations within epigenomics research consortia.
Large-scale research collaborations in policy and STS literature
At science policy level, research collaborations are positively connoted: they are thought to enhance
the quality of research, leading to more relevant and efficient research (The Royal Society, 2011),
while they are also understood as opportunities to strengthen the competitiveness of the economy by
bringing together different researchers and bodies of knowledge in order to foster innovation
(European Commission, 2012). Such policies thus assume a relationship between scientific
collaborations and valuable research. Science policy interest for research collaborations is reflected in
funding schemes. For example, the European Union’s Research Framework Programmes require
applicants to establish a consortium with at least three partners in three eligible countries. Data
sharing practices have also been an area of interest for science policy, and this is in part linked to the
understanding that efforts to put together and share data are central to contemporary large-scale
collaborations. A number of policies have been developed to promote data sharing and open access
(Organization for Economic Cooperation and Development, 2009, General Secretariat of the Council,
2016). They have focused on the removal of so-called ‘barriers to access’ by ensuring principles of
data sharing or examining potential ethical issues relating to data sharing such as the protection of
privacy. These policies encouraging, controlling or restricting data flows as part of collaborative
efforts understand data as the “oil wells” of the future (e.g. Anonymous, 2017, Puschmann and
Burgess, 2014), that is, they are seen as highly valuable resources which research teams, commercial
firms or data registries should harvest and work on to create different forms of value, being economic,
health or scientific.
In the STS literature, a common way to define research collaborations is to see it as a process that
starts by collaborators sharing their respective resources. Hackett defines a collaboration as “a family
of purposeful working relationship between two or more people, groups, or organisations.
Collaborations form to share expertise, credibility, material and technical resources, symbolic and
4
social capital” (2005b: 671). In another paper, Hackett (2005a) unpacks what collaborations within a
research group entail. He stresses that “members of a group work together using an arrangement of
materials, techniques, instruments, ideas, and enabling theories that I call an ensemble of research
technologies.” (ibid.: 788) For him, a research collaboration is based on tangible or intangible
resources, which are shared among different actors. However, what Hackett does not tell us is the
modalities according to which these resources are shared, nor does he describe how they are
mobilised between researchers to produce valuable research.
Contemporary collaborative arrangements are also often discussed in terms of size or growth, for
example framing those as ‘supersize science’ (Vermeulen, 2009) or more generally as ‘big science’
(Price, 1963, Borgman et al., 2008) while pointing to the increasing manpower and expensive
equipment required to achieve research goals. Beyond such accounts, STS scholarship has provided
more nuanced and critical discussions of large-scale and collaborative biological research, studying,
for example, how contemporary ‘big science’ differs from historical manifestation of large-scale
science (Vermeulen and Penders, 2010), while also exploring the normative assumptions underlying
large-scale biology (Davies et al., 2013). Others have examined what it takes to produce knowledge
within large-scale collaborations by exploring its forms of governance. For example, Hilgartner
(2013) studied the Human Genome Project (HGP) – an international research effort that aimed to
sequence and map a full human genome – and showed that sequence data was central to the
management and coordination of research centres spread out over several geographical locations.
Each participating institution contributing to the collaborative effort competed in the production and
dissemination of sequence data. Collaborative research was thus turned into a matter of incremental
production with the aim of aggregating commensurable and quantifiable masses of data across space,
while speed became a key factor for evaluating the success of contributions by each centre (Davies et
al., 2013).
Authors have paid particular attention to networked structures in contemporary collaborative biology
(Baker and Millerand, 2010, Vermeulen et al., 2013, Vermeulen and Penders, 2010, Zimmerman and
Nardi, 2010). These networks loosely connect a high number of individuals and groups of scientists to
work on a shared goal: they exchange materials, divide work, while still remaining independent and in
control of their own project within the network. Central to many of these networks are data
repositories that allow multiple researchers, laboratories or institutions to collaborate on the creation,
use and reuse of data. Leonelli (2016) greatly contributed to the study of data repositories by
specifically pointing to the human labour involved in making a database. She discussed the work of
‘curators’ who mobilise their expertise and skills to select, prepare and classify data to make it
available in databases. She argued that what is central to the work of curators is to enable the global
circulation of data and its widespread use in a diversity of projects, at the same time that they enable
5
the local adoption of data in specific research contexts (Leonelli, 2013). For data to circulate across a
wide network of data users, data are increasingly standardised in the ways it is produced and
‘packaged’ with detailed metadata describing its origins (Baker and Millerand, 2010, Edwards et al.,
2011). Contributing to this debate, Decker (2018), in this journal, discussed what is at stake when
constructing a database in historical climatology for individual researchers, their teams and research
field. When data producers deposit ‘their’ data to feed the collective repository and make it freely
accessible, they are concerned with how their data will be ‘treated’ by others. Decker argues that this
does not simply stem from researchers’ concern over academic credit and recognition for their work
producing the data, but also from researchers’ “immense personal involvement” with data as they help
produce it. Or as he puts it, the data are “somehow part of them (…) given the arduous work invested
in it.” (Decker, 2018: 22)
This critical body of literature emphasises that efforts to put together and share data as a research
resource are central to contemporary large-scale collaborations. Researchers, laboratories and
institutions come together and make their data accessible within a network. Such collaborative efforts
enable the abundance of data often deemed necessary for the production of valuable research results.
At the same time, we learn from this body of work that sharing data in a research collaboration entails
work on the part of researchers and curators, who mobilise their skills, expertise and affect to produce,
process and package data. In this process, data become highly valued objects for researchers because
of the labour and personal involvement they put in it to create it, at the same time that data gain
evidential value. This suggests a tension between the individual value data hold for those producing it
and its collective value when data are shared and enable the production of knowledge within the
collaboration. However, this body of work does not tell us how this tension is negotiated in practice,
nor does it analyse the modalities according to which teams share and mobilise their respective data as
part of the collaborative endeavour to produce valuable knowledge. That is, how do research teams
cooperate and share their data with others to produce valuable knowledge? What happens to data once
it is shared in a research consortium? How is the abundance of data managed towards the production
of valuable knowledge? In the rest of this paper, I offer answers to help close this research gap.
This paper thus speaks to broader questions about how we produce valuable knowledge within large-
scale collaborations based on a data sharing ethos. To answer these questions, I draw upon analytical
tools from STS scholarship on value production in the bioeconomy, and in particular the concept of
asset.
Accumulation, assets and value production
6
STS scholars (Birch, 2017, Birch and Tyfield, 2013, Martin, 2015, Chiappetta and Birch, 2018) have
used the term bioeconomy4 to refer to the set of economic activities derived from biotechnology and
biosciences. These authors stress that capitalist logics are at play in contemporary academic research,
by underlining that different forms of capital (e.g. economic, academic credibility) are accumulated
through scientific work. For example, Fochler (2016) describes a capitalist cycle whereby researchers
mobilise their expertise, skills and technologies as resources to produce results, which can be
converted into publications and serve as assets in the competition for grant funding, a form of
economic capital, which could then be reinvested in the laboratory. He terms this cycle of
accumulation ‘epistemic capitalism’.
For these authors, value in the biosciences is a social process that involves a network of actors, objects
and processes, rather than something that is inherent to biological materials and processes, which is
what, for example, Waldby (2002) discusses with the concept “biovalue”. They specifically
emphasise that value is linked to a series of assets, which are mobilised as resources towards the
production of knowledge and governed according to specific organisational and managerial
arrangements (Birch, 2017, Birch and Tyfield, 2013, Chiappetta and Birch, 2018, Martin, 2015, Birch,
2020). An asset is a tradable resource which can be tangible (e.g. a sequencing technology) or
intangible (e.g. expertise in epigenetics; networks). There are two main ways in which assets can be
mobilised as resources towards the production and accumulation of value. First, assets can be used by
actors to produce commodities, which are then sold to create value. Second, assets have value as
properties, and actors can produce and accumulate value through the ownership of valuable assets,
which they rent to others in exchange of a revenue.
Birch (2017) argues that value in the bioeconomy mainly results from processes of assetization by
which knowledge is turned into a property that yields an income stream. For example, a firm turns its
technoscientific knowledge into Intellectual Property Rights (IPRs). This means that knowledge
becomes an intangible and scarce asset from which it can earn royalties. In such scenario, value is
specifically derived from the exclusion of other people from accessing this knowledge: while
knowledge is enclosed, those who own IPRs distribute rights to use this knowledge against a revenue.
Authors note that the production of value in the bioeconomy is mainly asset-based, rather than
commodity-based: value is constituted predominantly by ownership and control of assets, such as 4 For these authors, the term bioeconomy refers to the increasing importance of biotechnology research, its applications and commercialisation in different sectors of the economy. There are however several understandings of the term bioeconomy (Bugge et al., 2016, Pavone and Goven, 2017). For example, Pavone and Goven (2017) identify three main understandings of the term: the biotechnology vision, which emphasises the importance of biotechnology research and its commercialisation in different sectors of the economy; the biomass economy vision, which stresses the substitution of biomass for fossil sources of energy, thus presenting the biomass economy as a more sustainable economy; and finally bioeconomy as a form of capitalism, with authors exploring the relations between life sciences and capitalism in specific areas like pharmaceuticals (Rajan, 2006), or in the life sciences in general (Birch, 2017, Birch and Tyfield, 2013).
7
IPRs, datasets or brand equity, that underpin potential new products. These observations lead Birch
and Tyfield (2013) to argue that the bioeconomy functions as a “rentier regime of accumulation”
whereby knowledge is made into scarce assets from which actors extract rent to produce value.
To understand the origins of value in the bioeconomy, STS scholars therefore not only encourage us
to pay attention to the set of resources and assets mobilised in productive systems, but also to the
ways these assets and resources are controlled and governed. Or as Hilgartner, in his study of the HGP
reminds us (2017, for a summary see also Pinel, 2018), epistemic concerns of securing knowledge
come together with socio-political concerns of securing control. He suggests unpacking the set of
“control-relationships”, defined as “regimes and practices that allocate entitlements burdens among
agents” (p.7), which are built onto ‘knowledge-objects’, such as data, biomaterials or results.
What we learn from such accounts is that academic science is marked by strategies of accumulation as
teams seek to gain academic credibility, funding or expand their networks. Towards that end, they
build resources, such as technologies or data, and turn them into assets towards the production of
value. This body of literature reminds us that there are different ways of producing value from assets
within the biosciences: either as resources in the production of commodities (e.g. data are used within
the laboratory to produce knowledge and lead to publications5), or as properties that can yield rents
(e.g. data are rented out to other laboratories against a revenue). While the production of valuable
knowledge requires an array of resources, in a rentier regime of accumulation, an actor does not need
to own all the necessary resources. Instead, actors can collaborate and rent one another the necessary
assets. A collaboration thus enables teams to gather remote assets, and once value is produced, it is
redistributed to the different collaborators, often in the form of authorship on publications. In
addition, while an array of resources needs to be aggregated to produce value, in a rentier regime of
accumulation, value is captured when assets are made scarce, and authors encourage us to consider
the specific organisational arrangements and processes through which scarcity is constructed and
control over assets and resources managed. This body of work can therefore help us unpack the
process through which, within epigenomics research consortia, more data means better research, and
specifically, it can help us explain the modalities according to which data are mobilised and shared in
a collaboration to produce valuable knowledge.
Drawing on this body of work, I conceptualise research and clinical data produced within data-
intensive research as resources research teams turn into assets, while I study how collaborators
mobilise them within research consortia to produce value. I pay particular attention to the
collaborative arrangements through which they are accumulated, thus enabling an abundance of data
5 In a citation regime, publications can also function as assets, as they may deliver rent in the form of citations.
8
to be created in the research consortia, at the same time that I explore the processes of control through
which the supply of data in is managed.
The laboratory and research methods
This article draws upon findings from an ethnographic study conducted in an UK-based laboratory
carrying out genetics and epigenetics research on twins. For the purpose of this article, I call the
laboratory Twinomics6. Research in the laboratory explored complex diseases with a particular
interest in age-related diseases, including osteoporosis, diabetes and cardiovascular diseases.
Twinomics was located within a highly-ranked, research-led university in the UK and held a number
of competitive public and private grants to support its research and infrastructures.
Work at Twinomics was centred around a large database of clinical and research data gathered over
several decades. The database included data from thousands of twins, with clinical, physiological and
lifestyle data, as well as hundreds of phenotypes related to common diseases. Twinomics was mostly
a ‘dry-lab’, which meant that scientists conducted computational or applied mathematical analyses of
twins’ data, and using an epidemiological approach, looked for the incidence and distribution of
specific traits in populations.
I carried out ethnographic fieldwork at Twinomics between January and June 2016. I observed and
participated in the daily life of the lab, spending time with researchers at their desks as they ‘ran’
computational analysis of their datasets, observing the production of data and its processing by staff,
sitting in the weekly lab meetings and journal clubs, or sharing lunch and coffee breaks with staff. The
ethnographic data I draw upon in this paper consist of fieldnotes from participant observation,
numerous informal conversations with staff, as well as in-depth semi-structured interviews with
members of Twinomics, from laboratory lead though to junior researchers (17 in total). All interviews
and observations from fieldwork were transcribed, coded and thematically analysed (Attride-Stirling,
2001). The theme of research collaborations, and research consortia in particular, appeared significant
early on during the study, which guided further investigation into the set of concerns motivating such
collaborative endeavours, as well as the practices enabling them.
In what follows, I draw upon this ethnographic data to discuss how researchers at Twinomics
collaborate with others and ‘share’ their epigenetics data in research consortia. I describe three stages
in the process of working together in a research consortium, analysing what this process entails for the
laboratory’s data and how it leads to the production of valuable knowledge.
6 Names and identifying details of places and individuals have been changed.
9
Aggregating masses of data
The epigenetics team at Twinomics focuses on the study of DNA methylation datasets. These are
comprehensive genome-wide profiling of human DNA methylation, which are produced by analysing
blood, skin and adipose samples from twins through a methylation array. Using an epidemiological
approach, researchers explore the causes and consequences of changes in DNA methylation at
population level. There are two broad areas the epigenetics team conducts research in: on the one
hand, research concerned with characterising the extent to which genetic variation shapes epigenetics,
and on the other hand, research exploring the role of the environment on epigenetics, with the aim of
finding associations between DNA methylation levels and environmental factors. Mark, the Principal
Investigator (PI) of the epigenetics team, explains that they have been focusing on specific
environmental factors:
we're taking things we know have an impact, like smoking, and we're trying to explore that in
more depth. … A number of people in my group are trying to identify other environmental
factors that may have a big signature on the epigenome. And it's been difficult to find things
that strongly influence the epigenome. … we've collaborated with other groups to increase
power, to detect these effects, so alcohol is one. Exercise I'm also very keen to work on that.
Diet also. … I mean there are lots of things you can look at there, but I suppose when you boil
it down to, I'm looking at genetics and environment using previously, sort of, clearly defined
environmental factors.
They focus on “strong environmental factors”, that is, factors which can be defined and measured
relatively well, and which show significant effects on DNA methylation markers. They do so to
ensure their research will lead to the production of results which could be easily interpreted, to then be
converted into publications. For example, smoking is considered a “strong environmental factor”
because it is rather simple for researchers to identify smokers from non-smokers using a questionnaire
and a “Yes” or “No” type of question. Smoking data is considered trustworthy and researchers have
been able to identify significant associations between smoking status and DNA methylation. In
contrast, diet is considered a difficult environmental factor to work on because, first, it is challenging
to capture the content of participants’ diet over a long period of time, which leads to imprecise data,
and second, epigenetics researchers have been struggling to find significant associations between diet
and epigenetic markers. Researchers often speak of such data as “noisy” data, because it is difficult to
identify a signal amid the imprecisions in the data. For these projects studying associations between
DNA methylation and environmental factors that can prove difficult for researchers to precisely
define and capture with clinical or research data, the epigenetics team at Twinomics has been working
in collaboration with other teams, and in particular, through research consortia. As Mark put it,
10
working in consortia provides research teams with “power to find things.” That is, it enables teams to
gain access to a higher number of samples, which can balance out the ‘noise’ associated with the
imprecise factor studied.
Four or five years ago, we had study designs to look at smoking, alcohol, diet, exercise. … So
we did all these things, and out of all of them, smoking is the only one that gave us positive
answers. The others didn't. Now, obviously smoking has a bigger impact on epigenetics, the
other option is that they all have an impact, it's just that numbers were too low to detect the
signals. And as a result of this, we're now contributing towards bigger studies trying to
identify, as I said, alcohol, there is also diet.
Research consortia enable Twinomics to have “bigger” studies, whereby a number of research teams
bring together their data, thus increasing the overall sample size of their research projects. At the
laboratory, the phrase “bigger is better” was often used by researchers when discussing their work. As
Mark explained, a study becomes bigger when it is based on samples from a higher number of
individuals, and it is assumed to be of better quality “because the bigger the number [of individuals in
the study], then the more likely you are to pick up the individual change [on the epigenome]”. For
researchers in environmental epigenetics dealing with hard-to-define environmental factors, research
consortia are thus particularly valuable because it can help them accumulate more data, which means
enhanced possibilities of identifying research results, and therefore higher chances of creating
epistemic value.
David’s experience of research consortia is illustrative. David is a PhD student in his final year, who
worked closely with Mark on a research project exploring associations between DNA methylation and
women’s age at menarche7 and menopause. They first started examining this association on their own
twins’ data:
When I started with the project looking at if there were DNA methylation changes linked to
the age at menopause and the age at menarche, and the difference of time between those two
points … we used around 500 samples that we had in our cohort. We found things that
seemed to be interesting, but we were not reaching that genome-wide significance threshold.
So we thought that increasing the sample size could give us a better view to see if there is
really a DNA methylation change associated with this. (David)
7 Menarche refers to the first menstrual period. The period between menarche and menopause is understood as women’s reproductive time. For David, this time window represents a proxy for studying women’s time of exposure to estrogen. This is considered an interesting research area because a number of studies have shown associations between estrogen and breast cancer (Key et al., 2001).
11
For this project, Twinomics’ own data was insufficient to provide answers and create epistemic value.
While it pointed to an interesting “region” on the epigenome, it could not be used to produce
statistically significant results that could then be published in scientific journals. As this example
suggests, research consortia are used by research teams as a scientific infrastructure to aggregate vast
quantities of data in order to keep creating and accumulating epistemic value. Or as Katherine,
another member of the epigenetics team, puts it, “One day they just realised that a single cohort can’t
publish very good papers. So they developed this kind of consortium.”
This model of accumulation based on the aggregation of vast quantities of data to identify research
results in epigenetics is influenced by Genome-Wide Association Studies (GWAS). This approach
consists of identifying associations between genetic variations and particular diseases or phenotypes.
It involves scanning the genome of a vast number of individuals and searching for genetic markers or
single nucleotide polymorphisms (SNPs) (Bush and Moore, 2012). In epigenetics research, such
studies are called Epigenome-Wide Association Studies (EWAS): the data being used concern the
epigenome rather than the genome, with researchers looking to identify epigenetic markers
associating with specific traits, such as diet or age at menopause and menarche. Mark suggests that
such studies, which are commonly undertaken in research consortia, are based on a standardised
model, and what changes from project to project is the scale of data mobilised to answer the research
questions:
What we're doing now, we’re just doing it on a bigger scale, which is a little bit boring. …
Interviewer: Why do you say that those studies are boring?
They are just boring to me! [Laughs] I suppose because five years ago we designed them all
on a small scale, and now it's just the same thing, it's just “bigger and better”, “bigger and
better”. When does it end? It's a bit like the GWAS.
In data-intensive research, the stream of data must keep flowing and growing, and it is through data
accumulation that new knowledge is constructed, and value created. Such processes of value
production resonate with data capitalism, which as Sadowski (2019) notes, is characterised by logics
of accumulation whereby actors looking to find new ways to produce value through the gathering of
more and more data.
Researchers at Twinomics prefer collaborating with other teams and using their data as part of a
consortium rather than producing more data on their own. This is linked to the fact that data
production is an expensive and time-consuming endeavour as it entails PIs securing funding, research
12
nurses and PhD students collecting samples from twins or researchers processing and organising data
to transform it into a valuable research resource that can be used to foster results (for a detailed
description of the different forms of work underlining data production within data-intensive research,
see Pinel et al., In Press). As David puts it, research consortia are a convenient and efficient way to
reach data that is already “out there”:
If we want to increase our power to discover new associations, we could do that by just
increasing our sample sizes. And if there are people out there that have a similar cohort and
have interrogated DNA methylation with the same technology, it is convenient to pull our
samples together and we don't have to spend more resources on collecting more samples and
then doing more arrays.
At the same time, the scientific practices within research consortia reflect normative assumptions
whereby science should be “Bigger and faster” (Calvert, 2013, Davies et al., 2013). Underlying much
of contemporary science is the expectation that expanding the scale and speed of scientific enquiry
will lead to improvements in quality and efficiency, and thus lead to more valuable research. Research
consortia represent a convenient way for teams to expand the scale of their research by bringing
together collaborators who can share their data, to create bigger studies, and divide the work among
themselves, to enable faster research.
When results plateaued, and research teams were not able to foster evidential value from their own
resources and data, they found new ways to create epistemic value by aggregating data through
research consortia. Enabling an abundance of data, research consortia became key research
infrastructures of data-intensive research that scientific research teams regularly use to circumvent
limitations in their own resources, thus facilitating the production of value. While this observation is
not specific to epigenetics research and can be made in other data-intensive research fields, I argue
that the type of data environmental epigenetics deals with, namely broad and hard-to-define
environmental factors, makes research consortia and data accumulation practices particularly salient.
Shaping the abundance of data
Data are the key resources mobilised within research consortia to produce valuable knowledge.
However, for data to be effectively used towards the production and accumulation of value in
epigenetics, it first needs to be turned into assets. In this section, I discuss in more detail what happens
to the masses of data in a research consortium, and specifically analyse the different forms of labour
required to transform data into valuable assets that can be mobilised to produce knowledge. I show
13
that while research consortia enable the abundance of data, this abundance is carefully shaped as staff
work with other research teams to make their data valuable and produce scientific results.
The process of value production and accumulation in research consortia starts with relational labour
as teams connect with one another and exchange ideas. Once David and Mark had identified age at
menopause and menarche and its association with DNA methylation as an interesting research area to
explore, they approached a consortium bringing together research teams carrying out work in
epigenetics. They looked through the consortium’s research portfolio and observed that teams had
conducted together association studies based on DNA methylation data. Mark and David formally
reached out to the consortium by putting forward a proposal summarising their project’s aims and the
type of data it would be based on. David explains:
We proposed [the project] to the consortium. Around eight different cohorts showed interest.
There was another group that was already working in a similar project, well, they were only
focusing on age at menarche. So we decided to work together, linking the projects together – I
would focus more on the menopause part and they would focus on the menarche part. In the
end, we are co-leading this project.
The research consortium described by David functions as a loose network structure, whereby teams
affiliated to the consortium can volunteer to participate in collaborative projects. The consortium is
understood as an inclusive infrastructure in that it welcomes any research team to take part in
collaborative endeavours, no matter how big their datasets are. In particular, researchers value
consortia because, as Juliette, a PhD student in the lab, puts it, they allow “small teams” with “small
datasets” to take part in collaborative projects and gain “better projects” than the ones they would
have been able to conduct on their datasets alone. However, to participate in a consortium, teams must
have some data to share, however little this might be. The inclusive and sharing ethos of the
consortium therefore comes with forms of exclusion (Lezaun and Montgomery, 2015): to become part
of it, one must have something to offer.
In the above example, David and Mark found a number of partners who agreed to take part in their
project. Some laboratories who had relevant data to share became ‘participating cohorts’: their main
role was to provide this data for the project so as to increase the sample size, and in exchange, they
gained secondary authorship on the publication. Another laboratory, who had an aligned research
interest and carried out preliminary work in the area, became a closer partner: they proposed an
analytical focus and mobilised their scientific expertise, as well as their data, towards the production
of knowledge. David and this other team decided to share leadership in this project, which meant that
14
they worked together to design the project and identify the research question, while they both gained
primary authorship on the publication.
Leadership in a research consortium also means writing an analysis plan. It details how to go about
analysing large datasets computationally, with steps and sub-steps and the necessary “scripts” to use
in each step. The analysis plan starts by providing definitions of the key factors and processes studied.
In David’s project as part of the consortium, partners worked towards agreeing on a common
definition of menopause and menarche, as well as ways to capture it as data. While menarche can be
defined easily as it refers to a clear point in time (age of the first menstrual cycle), menopause is more
difficult to capture. It is often defined as the time when women have had no menstrual periods for 12
consecutive months, however, changes in a women’s menstrual cycle can begin about six years before
the actual menopause. As such, it can prove difficult to capture the exact date of menopause for
women, and this may result in inconsistent data across teams participating in a consortium. When
writing up the analysis plan, David and his partner discussed different ways to define menopause. A
telephone meeting was scheduled to specifically address this issue. Both David and Mark took part in
the call, and on the other end of the line, their partner was represented with the PI and a postdoc.
David shared how they, at Twinomics, had been defining menopause in their work, arguing that they
adopted this definition because the “data are there” and it aligned with work on the topic that had been
published in the scientific community. Or as Mark put it, “we don’t want to reinvent the wheel.” Their
collaborator seemed happy with that definition but insisted that they checked whether this would fit
the data that the participating cohorts already had. The rest of the meeting was spent agreeing on
detailed instructions about how to format and ‘clean’ the data.
Formatting the data means turning the “raw” data that are produced by the sequencing array into a set
of numbers and processing those in ways that will make possible the use of statistical tests and
computational analysis. During my time at Twinomics, every researcher I spent time with was, at one
point or another, involved in cleaning and formatting data. This was the case of Olivia, a postdoc in
the epigenetics team, who was taking part in an epigenomics research consortium bringing together
six different research teams. One afternoon, I observed Olivia as she worked at her desk. She
alternated between enthusiastically typing on her keyboard and nervously staring at her screen. A
number of programmes were running on her computer including ‘R’, the software used to “play with
the data” and “run analyses” and ‘Github’, the online platform used by the consortium to share the
analysis plan. As she explained, she was referring to the consortium’s analysis plan to process and
clean her data in a standardised fashion, which had been agreed upon by members of the consortium.
More specifically, Olivia was copying scripts from the analysis plan onto ‘R’, adding a few lines of
codes to make the scripts specific to her data, and then making the programme run. Some seconds
later, a plot appeared on her screen, which, she explained, showed the distribution of the data. She
15
specifically put her finger on two points on the plot that stand out and explained that these are
“inconsistencies” in her data that needed to be “optimised”. Referring once more to the consortium’s
analysis plan, Olivia then typed in a few additional lines of code. After letting the programme run, a
new plot came up on the screen, and, this time, the two inconsistencies were gone. According to
Olivia, in order to best format and optimise the data for this analysis, she had to delete these data
entries that were inconsistent with the rest of the data. As this example suggests, cleaning and
formatting data therefore entails detecting incomplete, inaccurate or irrelevant parts of the data by
replacing, modifying or deleting them. Inconsistencies in the data may be caused, for example, by
errors in data entry or the use of multiple definitions of what relevant data are.
These different forms of labour allow standardised data to be aggregated in the consortium, regardless
of their origin. The definitional and standardisation work is particularly important in environmental
epigenetics, where the environmental factors studied are not clear cut and can be defined in a number
of ways. For research teams in environmental epigenetics to come together and study the impact of
the environment onto the epigenome, they first need to agree on what the environment is and how to
measure it in order to produce results of significance.
The analyses plans used by consortium to format, clean and analyse datasets function like
‘standardised packages’ (Fujimura 1978, 1996) that can be used by non-specialists in the laboratory.
For example, in the case of David’s project, it does not require scientists to be familiar with the
biological mechanisms of DNA methylation in reproductive health. However, it requires
computational expertise and skills working with large datasets to be able to manipulate data, run
statistical tests and detect anomalies in the data. In addition, what is required of scientists is
knowledge of their data and its specificities. Researchers in each cohort develop in-depth knowledge
of their data, as they help create it by cleaning and organising it, and work with it to produce
knowledge. As I demonstrated elsewhere (Pinel et al., In Press), researchers form caring relationships
with their data, and this personal involvement in the data is a way of knowing the data and how to
work with it in order to produce valuable knowledge. Researchers “know” their datasets, that is, they
have contextual and practical knowledge about what a dataset seeks to represent, its strengths and
weaknesses. When taking part in a research consortium and putting to use their data, researchers in
each team thus mobilise this in-depth knowledge of their data to shape it.
Through this labour standardising the masses of data accumulated throughout the different research
teams, the raw data are turned into valuable research assets that can be mobilised by the groups to
produce scientific results and valuable knowledge. As Juliette suggests, it is thanks to this work
preparing the data that the results obtained later on during the analysis can be “trusted” as a true
reflection of a biological phenomenon, otherwise “[you] might make a wrong interpretation”.
16
These insights indicate that the creation of valuable knowledge within a research consortium does not
simply originate from an abundance of data, but this abundance is enabled by specific organisational
structures and shaped by different forms of labour. Researchers first need to connect with others and
agree to put together their resources and data for a specific project. The masses of data aggregated
then need to be shaped and processed by each team following detailed protocols. Researchers put
together analysis plans and mobilise their in-depth knowledge of the data to format and process it. It is
through these different forms of labour that researchers in a consortium transform the data
accumulated as valuable research assets that can be used to produce knowledge and create epistemic
value. While a research consortium enables an abundance of data, this abundance is shaped by
researchers mobilising their skills, expertise and know-how to produce value.
Organising scarcity
Once data are processed and formatted in a standardised fashion across the participating cohorts,
analysis takes place. A specific organisational arrangement oversees how data are analysed, while it
allows collaborators to govern and manage value in the consortium by tightly controlling the use of
their assets. David explains:
So after we wrote the analysis plan and sent it to the other cohorts, we waited for them to
collect the samples, do the analysis, and then upload these results. And now I am in the stage
of the meta-analysis.
In this particular arrangement, the participating cohorts use the analysis plan provided by David and
his partner to analyse their data. This means that the labs participating in the research consortium keep
their datasets separate and conduct the analysis on their own, using their intimate knowledge of their
data to apply the ‘analysis scripts’ and produce results, which they then send to David and his partner.
What teams share across the consortium is therefore not data per se, but the results they obtain from
analysing their data. In an interview, I asked David about this particular arrangement, whereby each
team conducts the analysis on their own dataset, instead of bringing those datasets together to then
conduct the analysis in one go:
Every cohort knows their datasets better. So if they have batch effects or things that they
already know can be tricky in their datasets, they can work it out. But yeah of course, it would
be probably better to pull all the samples together and get a result out of that. But it's also
more efficient if you divide the work in different groups, and probably save some time.
17
This collaborative arrangement serves several functions. First, as David observes, keeping datasets
separate and analysis divided among collaborators is a way to foster each cohort’s in-depth
knowledge of their data. The research consortium recognises that researchers in each team are
personally involved with their data – that is, they know what it can and can’t do, how it “behaves” or
how it prefers to be “treated” – and seeks to foster that knowledge towards the creation of value.
Second, this way of working is also deemed efficient and motivated by a desire to go faster
(Vermeulen et al., 2013), as it divides the work of analysing data among the different teams involved.
Finally, this way of working enables collaborators to tightly control their assets, and in particular, to
organise limits and exclusion on the use of data as a resource. Teams own valuable data, which they
put to productive use by taking part in the collaborative research effort. However, they enclose their
data within their laboratories’ walls as they each conduct the analysis on their own datasets. The
research consortium enables teams to exclude others from accessing their data, thus constructing their
data as scarce. It is from this constructed scarcity that the teams derive value from their data (Birch,
2020). By enforcing exclusion rights over their data, teams construct their data as monopoly assets
only they have access to, which then allows them to capture monopoly rents.
I use here the term monopoly to denote a specific form of monopoly. There are two main forms of
monopoly assets and monopoly rents. First, a position of monopoly can be derived from the unique
quality of an asset. When an actor owns a unique asset, they have exclusive control over it on the
market, and this comes with privileges, such as the opportunity to influence prices and extract
monopoly rents from anyone wishing to use his/her unique asset. For example, in the data-intensive
research community, if a research team is the only one in possession of a rare dataset or sequencing
technology, that team can be said to be in a position of monopoly. Second, a position of monopoly can
be derived from constructing an asset as scarce and organizing limits and exclusion rights over its use.
In this paper, the DNA methylation data shared in consortia conform with this second type of
monopoly assets. Teams coming together in a consortium have similar data, which means they do no
gain monopoly rents from the unique quality of their asset. Instead, the teams gain monopoly rents by
enclosing their data within their laboratories’ walls and restricting access to it.
Monopoly rents are captured once the meta-analysis is conducted and valuable knowledge produced.
In the example above, the participating cohorts sent results originating from their datasets to David
and his partner leading the collaboration. They combined these individual results into a large study,
thus increasing the sample size and improving the estimates of the size of the effect. David and his
partner were able to gather over 4,000 samples and reach a statistically significant result, which they
wrote up into a manuscript for publication. As such, the consortium enabled the production of
scientific results and the creation of value. The peer-reviewed paper in this configuration functioned
as a token of credibility for the teams involved, while it came to recognise the epistemic value
18
fostered in the consortium. It is through authorship on the publication that the value created from the
meta-analysis was distributed to the participating cohorts: David’s and his partner’s team were
granted first and senior authorship for their work initiating the project, putting together the analysis
plan, managing the collaboration and conducting the meta-analysis, while the participating cohorts
gained middle authorship. By providing their data to the consortium, the participating cohorts thus
receive a revenue, in terms of authorship on publications. It represents a form of rent is that it is an
income derived from the ownership of valuable and scarce assets.
This collaborative arrangement within research consortia resembles a rentier regime of accumulation:
research teams participating in the consortium own valuable assets – their large-scale datasets which
they formatted and processed – and construct them as monopoly assets by restricting their use; they
put these assets to productive use in the consortium for the conduct of research and the creation of
valuable research; finally they extract monopoly rents from their assets through authorship on
publications. Rentiership in research consortia thus rests on the active and ongoing organisation and
management of value through the construction of scarcity. This observation comes in complement of
existing work on assetization and rentiership in the bioeconomy. STS scholars have mostly applied
the concept of rent to examine specific resources and assets in technoscience such as intellectual
property, with for example discussions about the use of IPRs (Birch, 2017, Birch, 2020, Birch and
Tyfield, 2013). Here, I broaden the conceptual applicability of rentiership to resources like
epigenomic data. I show that a similar regime of accumulation is in place in academic science and
epigenetics research, with research consortia enabling assetization and monopolisation practices.
The particular collaborative arrangements in place within research consortia shed light on an
interesting tension. To enhance the epistemic value of their research, laboratories require an
abundance of data. But in contemporary technoscience, laboratories turn their data into scarce and
monopoly assets by organising limits and exclusions rights on the use of their data. Data are therefore
accumulated, at the same time that they are made into scarce assets through practices of
monopolisation. Data are not actually shared between collaborators, but they are autonomously put to
productive use with the aim of maximising the value that can be extracted from them.
Discussion
Over the years, epigenetics has received high public and scientific attention and some have even gone
so far as to argue that “epigenetics is now the hottest thing in bioscience” (Jirtle, 2012). Social
scientists and humanities scholars, including scholars in the STS field, have taken an interest in
epigenetics. For them, epigenetics is interesting because it represents a new ‘style of reasoning’
(Hacking, 2002) according to which the body, health and illness are more open to the social, thus
19
symbolising a move away from gene-centric approaches (Lock, 2013, Mukherjee, 2016). However,
few studies have yet empirically examined epigenetics research in practice (Lloyd and Müller, 2018).
This means that little is known about how epigenetics knowledge is formed, negotiated and
interpreted in specific research contexts. This paper, together with other papers in this special issue,
help to close this gap by providing an empirical and critical account of epigenetics research through
an analysis of scientists’ collaboration practices in the field.
I took a close look at the ways teams produce epigenetics knowledge within research consortia and
showed that what is at stake in these collaborative arrangements is the aggregation of masses of data.
Epigenetics research teams, no longer able to foster scientific results and value from their data alone,
turn to research consortia. These scientific infrastructures loosely connect research teams, who can put
together their datasets to investigate associations between DNA methylation markers and specific
traits or environmental factors. Enabling an abundance of data within the field of epigenetics, research
consortia are key scientific infrastructures that facilitate the production of epistemic value. For data to
become a valuable research resource, it needs to be worked on in a standardised fashion by staff
across the different teams participating in the collaborative endeavour. The masses of data are shaped
by researchers mobilising their computational expertise and in-depth knowledge of their data to
format, process and clean the data before analysis can take place. Data are also turned into scarce
assets, with teams enclosing their respective data within their laboratories’ walls and enacting limits
and exclusions on the use of their resource. Participating cohorts thus turn their data into monopolised
assets only they have control over and can decide to put them to productive use against a revenue, in
the form of authorship on publications. This is a form of rent in that it is derived from the ownership
of valuable and scarce assets. The creation of value within research consortia is therefore carefully
organised, managed and governed, and rests on a set of practices, human labour and knowledges that
render data into valuable and scarce assets.
While discussing epigenetics research practices in research consortia, I underlined that scientists in
this field focus their studies on specific research questions, in particular exploring associations
between DNA methylation and a number of environmental factors such as smoking, diet or alcohol.
They do so because these specific articulations of the notion of environment can be defined and
studied well thanks to the availability of vast amounts of data throughout the epigenetics research
community. As such, the sort of research undertaken in epigenetics and the ways the notion of
environment is defined in this field are influenced by the availability of data and what is likely to
foster epistemic value. These insights resonate with ongoing debates about the social production of
ignorance, with scholars interrogating the social context shaping what we know and don’t (Kleinman
and Suryanarayanan, 2013).
20
These findings provide a ‘de-romanticised’ picture of epigenetics research. It comes in stark contrast
with some of the social science and humanities accounts of epigenetics that tell us about the
revolutionary potential of this research field for the opportunity it represents to examine the body in
its environmental, historical and sociocultural context (Lock, 2015, Landecker and Panofsky, 2013).
By exploring epigenetics research in the making as part of research consortia, I point to what it takes
to produce valuable knowledge in this field of research in terms of practices and knowledges. I show
that the questions being asked in epigenetics are shaped by knowledge infrastructures like research
consortia, are unpack how exactly these knowledge infrastructures enable the production of valuable
knowledge.
A question that emerges is what is particular about epigenetics and the collaboration and valuation
practices discussed in this paper? One could argue that there is nothing specific about epigenetics.
Data accumulation is a common thread of contemporary postgenomics research (Richardson and
Stevens, 2015), with scientists looking to study genetics in their wholeness, while this is made
possible by the use of new sequencing technologies enabling the production of large-scale genomic or
epigenomic maps (Ankeny and Leonelli, 2015). In any field, data need to be processed and cleaned in
order to be made valuable. Scholars in critical data studies (e.g. Leonelli, 2016, Neff et al., 2017,
Ribes and Jackson, 2013, Gitelman, 2013) have contributed to this debate by pointing to the series of
practices, expertise and tacit knowledge required to curate data. In addition, the production of value is
a social process that involves a network of actors and objects. While, in epigenomics research
consortia, value is linked to specific assets (e.g. epigenomic data; computational expertise), similar
processes of value production could be observed in neighbouring fields such as gene expression or
genomics.
In some respects, however, epigenetics lends itself well to the practices described in this paper, and
this for a number of reasons. First, epigenetics has received high public policy and media attention, as
well as high levels of public and private funding. This means that an important number of research
teams have invested in epigenetics, acquiring and developing the necessary resources (e.g. datasets;
expertise; skills) to produce scientific results in this field. These are the research teams that, having
something to share, turn to consortia for the production of valuable research. Second, research teams’
move to epigenetics was facilitated by the fact that epigenetics is a flexible concept with fluid
boundaries (Meloni and Testa, 2014, Pickersgill, 2016, Pinel et al., 2018). This means that a range of
research groups with a diversity of disciplinary backgrounds and expertise can attach themselves to
epigenetics research and participate in collaborative endeavours in consortia. Third, in environmental
epigenetics, what is defined as ‘the environment’ varies greatly from one research team to another,
and many of the environmental factors studied, such as diet or menopause, are difficult to define and
measure, and as such, they can prove difficult to study. Research consortia provide ways for research
21
teams to bypass this problem by, first, aggregating masses of data that can balance the imprecisions in
the ways the environment is defined, second, by adopting consistent definitions across the research
teams involved of what is understood as the environment and, third, by standardising the processing
and cleaning of data.
This paper not only contributes empirical insights into the practice of epigenetics research, it also
offers conceptual tools to examine and problematise large-scale research collaborations. By bringing
into dialogue the body of literature on research collaborations and data repositories together with STS
work on value production, assetization and rentiership, I unpacked the assumed relationship, present
in both scientists’ discourses and science studies (Dietz and Bozeman, 2005, Lee and Bozeman, 2005,
Subramanyam, 1983), between research collaborations and the production of valuable research.
Specifically, I took a closer look at the normative assumption within ‘big biology’ whereby expanding
the scale of biological enquiry through, for example, the aggregation of more research materials, leads
to better research. Conceptualising data as assets, I examined how these are mobilised within research
consortia towards the production of research results and the creation of epistemic value, pointing in
particular to a rentier regime of accumulation. This analytical frame was instrumental in uncovering
monopolisation practices at play in research consortia, and led to the observation that in order to
create value from the abundance of data within these scientific infrastructures, teams construct their
assets as scarce. While contemporary big biology is marked by the generalised imperative to share
data to create abundance (Lezaun, 2013, Lezaun and Montgomery, 2015), collaborative endeavours
within research consortia are in fact built around forms of exclusions: exclusion of those outside the
collaboration who do not own valuable properties that can be shared and exclusion of those inside the
collaboration from using the ‘shared’ resources.
This paper also contributes to discussions on valuation, providing both empirical and analytical
grounding to the question of what gets valued and how in the knowledge economy. Demonstrating
that knowledge production is entangled with valuation processes, I unpack what it takes for
epigenetics researchers to produce knowledge that can be deemed valuable. Value production in
academic research laboratories takes places through a series of assets, which are mobilised as
resources towards the production of knowledge, while they also function as private properties in their
own right, used by laboratories to extract a revenue, through a rentier regime of accumulation. Such a
regime of accumulation comes together with tight processes of control, whereby research laboratories,
concerned with maximising the revenue that can be extracted from their assets, enact limits and
exclusions on the use of their resources. Finally, this paper offers analytical tools to understand the
origins of value in the knowledge economy. I suggest thinking of the laboratory as a productive
system, which entails, first, unpacking the set of resources making up the laboratory; second,
analysing the set of practices through which resources are turned into valuable assets; and third,
22
paying attention to the ways in which these come together in the productive system that is the
laboratory towards the production of value.
The collaborative arrangements discussed in this paper, based on principles of scarcity and private
property, contrast with the contemporary Open Science movement (Levin and Leonelli, 2017), which
encourages researchers to disclose a variety of outputs from their work, ranging from datasets, to
biological materials and publications (European Commission, 2015, The Royal Society, 2012,
Research Councils UK, 2013), on the basis that openness will enhance the transparency of research
and promote the reusability of research outputs within and beyond the research community. Within
the epigenetics research community, research consortia demonstrate a selective approach to openness
by organising the scarcity and monopolisation of data on the one hand, and the sharing of research
results, on the other hand. It is through these selective processes of openness and exclusion, that data
are rendered highly valuable resources, and the creation of valuable research enabled.
Acknowledgements
I first would like to thank the staff in the laboratory who shared their time, space and thoughts with
me. I am indebted to Christopher McKevitt and Barbara Prainsack, for their guidance and insightful
comments on this work. I also thank the editors of this special issue for their input on the manuscript.
Finally, my thanks go to David Wyatt for stimulating discussions and providing valuable comments
on earlier versions of this article.
Funding
This work was supported by the Wellcome Trust [grant number WT108574MA].
References
Ankeny, R. & Leonelli, S. 2015. Valuing data in postgenomic biology. In: Richardson, S. & Stevens, H. (eds.) Postgenomics: Perspectives on Biology after the Genome. Durham, NC: Duke University Press.
Anonymous. 2017. The world’s most valuable resource is no longer oil, but data. The Economist (6 May) [Online]. Available: https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource [Accessed 26 March 2019].
Armstrong, L. 2014. Epigenetics, London, Garland Science.Attride-Stirling, J. 2001. Thematic networks: an analytic tool for qualitative research. Qualitative
Research, 1, 385-405.Baker, K. & Millerand, F. 2010. Infrastructuring ecology: challenges in achieving data sharing. In:
Parker, J., Vermeulen, N. & Penders, B. (eds.) Collaboration in the New Life Sciences. Farnham: Ashgate.
Birch, K. 2017. Rethinking Value in the Bio-economy: Finance, Assetization, and the Management of Value. Science, Technology, & Human Values, 42, 460-490.
23
Birch, K. 2020. Technoscience Rent: Toward a Theory of Rentiership for Technoscientific Capitalism. Science, Technology & Human Values, 45, 3-33.
Birch, K. & Tyfield, D. 2013. Theorizing the Bioeconomy: Biovalue, Biocapital, Bioeconomics or . . . What? Science, Technology, & Human Values, 38, 299-327.
Borgman, C. L., Wallis, J. & Enyedy, N. 2008. Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries. International Journal of Digital Libraries, 17, 17-30.
Bugge, M., Hansen, T. & Klitkou, A. 2016. What Is the Bioeconomy? A Review of the Literature. Sustainability, 8, 691-713.
Bush, W. & Moore, J. 2012. Chapter 11: Genome-wide association studies. PLoS Computational Biology, 8.
Calvert, J. 2013. Systems biology, big science and grand challenges. BioSocieties, 8, 466-479.Chiappetta, M. & Birch, K. 2018. Limits to biocapital. In: Gibbon, S., Prainsack, B., Hilgartner, S. &
Lamoreaux, J. (eds.) Handbook of Genomics, Health and Society. New York: Routledge.Davies, G., Frow, E. & Leonelli, S. 2013. Bigger, faster, better? Rhetorics and practices of large-scale
research in contemporary bioscience. BioSocieties, 8, 386-39.Decker, K. 2018. Data struggles: The life and times of a database in Historical Climatology. Social
Science Information, 57, 6-30.Dietz, J. S. & Bozeman, B. 2005. Academic careers, patents, and productivity: industry experience as
scientific and technical human capital. Research Policy, 34, 349-367.Edwards, P. N., Mayernik, M. S., Batcheller, A. L., Bowker, G. C. & Borgman, C. L. 2011. Science
friction: Data, metadata, and collaboration. Social Studies of Science, 41, 407–414.Etzkowitz, H. 2011. Normative change in science and the birth of the Triple Helix. Social Science
Information, 50, 549-568.European Commission 2012. Enhancing and focusing EU international cooperation in research and
innovation. In: Communication from the Commission to the European Parliament, T. C., The European Economic and Social Committee and the Committee of Regions. Tech. Rep. Com(2012) 497 Final. (ed.). Brussels: European Commission.
European Commission 2015. Validation of the results of the public consultation on Science 2.0: Science in Transition. In: Innovation, R. A. (ed.). Brussels: European Commision,.
Fochler, M. 2016. Variants of Epistemic Capitalism: Knowledge Production and the Accumulation of Worth in Commercial Biotechnology and the Academic Life Sciences. Science, Technology, & Human Values, 41, 922-948.
General Secretariat of the Council 2016. Council Conclusions on the Transition towards an Open Science System. Brussels, Belgium: Council of the European Union.
Gitelman, L. (ed.) 2013. Raw data is an oxymoron, Cambridge, MA: MIT Press.Hackett, E. J. 2005a. Essential Tensions: Identity, Control, and Risk in Research. Social Studies of
Science, 35, 787-826.Hackett, E. J. 2005b. Introduction to the Special Guest-Edited Issue on Scientific Collaboration.
Social Studies of Science, 35, 667-671.Hacking, I. 2002. Historical Ontology, Cambridge, MA, Harvard University Press.Haig, D. 2012. Commentary: The epidemiology of epigenetics. International Journal of
Epidemiology, 41, 13-16.Hilgartner, S. 2013. Constituting large-scale biology: Building a regime of governance in the early
years of the Human Genome Project. BioSocieties, 8, 397-416.Hilgartner, S. 2017. Reordering Life. Knowledge and Control in the Genomics Revolution, MA:
Cambridge, MIT Press.Jirtle, R. L. 2012. Epigenetics: How genes and environment interact. NIH Director’s Wednesday
Afternoon Lecture Series [Online], 18 April 2012. Available: http://videocast.nih.gov/launch.asp?17223 [Accessed 26 March 2019].
Key, T., Verkasalo, P. & Banks, E. 2001. Epidemiology of breast cancer. The Lancet Oncology, 2, 133-140.
Kleinman, D. L. & Suryanarayanan, S. 2013. Dying Bees and the Social Production of Ignorance. Science, Technology, & Human Values, 38, 492-517.
24
Landecker, H. & Panofsky, A. 2013. From Social Structure to Gene Regulation, and Back: A Critical Introduction to Environmental Epigenetics for Sociology. Annual Review of Sociology, 39, 333-357.
Lee, S. & Bozeman, B. 2005. The Impact of Research Collaboration on Scientific Productivity. Social Studies of Science, 35, 673-702.
Leonelli, S. 2013. Global data for local science: Assessing the scale of data infrastructures in biological and biomedical research. BioSocieties, 8, 449-465.
Leonelli, S. 2016. Data-Centric Biology: A Philosophical Study, Chicago, University of Chicago Press.
Levin, N. & Leonelli, S. 2017. How Does One “Open” Science? Questions of Value in Biological Research. Science, Technology, & Human Values, 42, 280–305.
Lezaun, J. 2013. The escalating politics of ‘Big Biology’. BioSocieties, 8, 480-485.Lezaun, J. & Montgomery, C. 2015. The Pharmaceutical Commons: Sharing and Exclusion in Global
Health Drug Development. Science, Technology and Human Values, 40, 3-29.Lloyd, S. & Müller, R. 2018. Situating the biosocial: Empirical engagements with environmental
epigenetics from the lab to the clinic. BioSocieties, 13, 675-680.Lock, M. 2013. The Epigenome and Nature/Nurture Reunification: A Challenge for Anthropology.
Medical Anthropology, 32, 291-308.Lock, M. 2015. Comprehending the body in the era of the epigenome. Current Anthropology, 56, 151-
177.Martin, P. 2015. Commercialising neurofutures: Promissory economies, value creation and the
making of a new industry. BioSocieties, 10, 422-443.Meloni, M. & Testa, G. 2014. Scrutinizing the epigenetics revolution. Biosocieties, 9, 431-456.Mirowski, P. 2011. Science-Mart: Privatizing American Science, Cambridge, MA, Harvard
University Press.Mukherjee, S. 2016. Same but different: How epigenetics can blur the line between nature and
nurture. The New Yorker [Online], 2 May 2016. Available: https://www.newyorker.com/magazine/2016/05/02/breakthroughs-in-epigenetics [Accessed 26 March 2019].
Neff, G., Tanweer, A., Fiore-Gartland, B. & Osburn, L. 2017. Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science. Big data, 5, 85-97.
Organization for Economic Cooperation and Development 2009. OECD Guidelines on Human Biobanks and Genetic Research Databases.
Pavone, V. & Goven, J. 2017. Introduction. In: Pavone, V. & Goven, J. (eds.) Bioeconomies. Life, Technologies, and Capital in the 21st Century. Cham, Switzerland: Palgrave Macmillan.
Pickersgill, M. 2016. Epistemic modesty, ostentatiousness and the uncertainties of epigenetics: on the knowledge machinery of (social) science. The Sociological Review Monographs, 64, 186-202.
Pinel, C. 2018. Hilgartner, S . Reordering Life: Knowledge and Control in the Genomics Revolution. Cambridge, MA: PB - MIT Press . 2017. 368 pp. £27.95 (hbk) $24 (ebk) ISBN 9780262035866. Sociology of Health & Illness, 40, 926-928.
Pinel, C. 2019. Enterprising environments: Knowledge production in epigenetics in two British laboratories. PhD thesis, King's College London.
Pinel, C., Prainsack, B. & Mckevitt, C. 2018. Markers as mediators: A review and synthesis of epigenetics literature. BioSocieties, 13, 276-303.
Pinel, C., Prainsack, B. & Mckevitt, C. In Press. Caring for data: Value creation in a data-intensive research laboratory. Social Studies of Science.
Price, J. D. 1963. Little Science, Big Science, New York, Columbia University Press.Puschmann, C. & Burgess, J. 2014. Big data, big questions. Metaphors of big data. International
Journal of Communication, 8.Rajan, K. S. 2006. Biocapital: The constitution of postgenomic life., Durham, Duke University Press.Research Councils Uk 2013. RCUK Policy on Open Access and Guidance.Ribes, D. & Jackson, S. 2013. Data Bite Man: The work of sustaining long-term data collection. In:
Gitelman, L. (ed.) '‘Raw Data’’ is an oxymoron. Cambridge, MA: MIT Press.Richardson, S. & Stevens, H. (eds.) 2015. Postgenomics: Perspectives on Biology after the Genome,
Durham: Duke University Press.
25
Sadowski, J. 2019. When data is capital: Datafication, accumulation, and extraction. Big Data & Society.
Slaughter, S. & Leslie, L. 1997. Academic Capitalism, Baltimore/London, The John Hopkins University Press.
Subramanyam, K. 1983. Bibliometric studies of research collaboration: A review. Journal of Information Science, 6, 33-38.
The Royal Society 2011. Knowledge, Networks and Nations: Global scientific collaboration in the 21st century. RS Policy document 03/11. London, UK: The Royal Society.
The Royal Society 2012. Science as an Open Enterprise. London: The Royal Society Science Policy Centre report 02/12.
Vermeulen, N. 2009. Supersizing Science: On Building Large-Scale Research Projects in Biology. PhD Thesis, Maastricht University.
Vermeulen, N., Parker, J. N. & Penders, B. 2013. Understanding life together: A brief history of collaboration in biology. Endeavour, 37, 162-171.
Vermeulen, N. & Penders, B. 2010. Collecting Collaborations: Understanding Life Together. In: Parker, J., Vermeulen, N. & Penders, B. (eds.) Collaboration in the New Life Sciences. Farnham: Ashgate.
Waldby, C. 2002. Stem Cells, Tissue Cultures and the Production of Biovalue. Health, 6, 305-23.Zimmerman, A. & Nardi, B. 2010. Two Approaches to Big Science: An Analysis of LTER and
NEON. In: Parker, J., Vermeulen, N. & Penders, B. (eds.) Collaboration in the New Life Sciences. Farnham: Ashgate.
26