This is not the final version - static-curis.ku.dk · Web viewassets, which they can put to productive use in collaborative endeavours against a revenue. In addition to contributing

This is not the final version of the paper. The final version is available here: https://journals.sagepub.com/doi/10.1177/0539018419895456#_i10. Please cite as: Pinel, C. 2020. When more data means better results: Abundance and scarcity in research collaborations in epigenetics. Social Science Information [online first: doi.org/10.1177/0539018419895456]

When more data means better results: Abundance and scarcity in research

collaborations in epigenetics

Clémence Pinel

Centre for Medical Science and Technology Studies, University of Copenhagen, Denmark

Correspondence to

Dr Clémence Pinel

Centre for Medical Science and Technology Studies

Department of Public Health

University of Copenhagen

Øster Farimagsgade 5

1014 Copenhagen K

Denmark

[email protected]

Abstract

Drawing upon ethnographic findings from an epigenetics research laboratory in the United Kingdom,

this paper explores practices of research collaborations in the field of epigenetics, and epigenomics

research consortia in particular. I demonstrate that research consortia are key scientific infrastructures

that enable the aggregation of masses of data deemed necessary for the production of results and the

fostering of epistemic value. Building on STS scholarship on value production, and the concept of

asset, I show that the production of valuable research within epigenomics research consortia rests on

the active organisation and management of abundance and scarcity. It involves shaping and

standardising the masses of data gathered in consortia, while it also entails research teams enclosing

1

their data within their laboratories’ walls. As they do so, research teams construct data into scarce and

monopolised assets, which they can put to productive use in collaborative endeavours against a

revenue. In addition to contributing empirical and critical insights into the ways epigenetics

knowledge is formed and negotiated in specific research contexts, this paper offers conceptual tools to

examine and problematise knowledge production practices in data-intensive research more broadly. In

particular, it points out that while contemporary big biology is marked by the generalised imperative

to ‘share’ data and ‘open’ science, collaborative endeavours within research consortia are built around

forms of exclusions.

Keywords: epigenetics, scientific collaborations, research consortium, data, value, rent, asset

Introduction

Epigenetics research refers to the study of the processes that control gene expression but do not entail

a change in DNA sequence (Armstrong, 2014). Research in this field studies the genome in relation to

an extensive and complex developmental machinery. More specifically, epigenetics research

underlines the reactivity of the genome to environmental signals, and explores how environmental

factors, ranging from child abuse, to smoking or exposure to chemicals, can impact gene regulation by

leaving marks on the epigenome (Pinel et al., 2018). In other words, in epigenetics, the point of focus

is not genes per se, but what surrounds the genes – the ‘epi’ to the genes.

Epigenetics has been a rapidly growing field in the world of bioscience (Haig, 2012). This field of

research builds on specific technologies, such as next generation sequencing technologies through

which large-scale epigenomic maps and datasets can be obtained. For example, a common approach

used to detect DNA methylation – the most studied epigenetic change – and enable the production of

large datasets is methylated DNA immunoprecipitation sequencing (MeDIP-seq) (Pinel et al., 2018).

Research collaborations, which enable the putting together of resources (Hackett, 2005b), and in

particular, the ‘sharing’ of data, are also central to the growth of the epigenetics field. National and

international consortia1 specifically dedicated to epigenetics research enable teams across continents

to put together their data (some of these consortia are formed to collect new data, while some organise

the sharing of existing data) and conduct meta-analyses. The purpose of such studies is to examine

data across a number of independent cohorts and determine overall trends. Teams explore associations

between variables and ensure that their results are statistically significant, that is, that the likelihood

that the relationship observed between variables is caused by something other than chance.

1 The term consortium (a Latin word meaning “partnership” or “association”) is commonly used in academic research to refer to the coming together of research groups with an interest in one topic area. Well-known examples of this type of research collaboration in epigenetics involve the International Human Epigenome Consortium (IHEC), the NIH Roadmap Epigenomics Mapping Consortium, the BLUEPRINT project, the Genetics of DNA Methylation Consortium (GoDMC).

2

In this paper, I study epigenetics research by taking a close look at research collaborations and data

sharing practices together, specifically unpacking how knowledge is produced within epigenomics

research consortia. Collaborative endeavours within research consortia are often based on the

assumption that “more data means better results”. That is, a common way to ensure statistically

significant results and detect associations between, for example, DNA methylation and rare or

complex traits is to increase the sample size of the study by aggregating more data. Research consortia

enable teams to do just this by reaching out to others who have similar data and working together to

create “bigger” studies with larger sample sizes. Put differently, research consortia enable the

abundance of data across the epigenetics research community and the production of valuable

knowledge. By valuable knowledge I mean knowledge which can be fostered to provide different

forms of value2, according to the multiple evaluative frameworks that coexist within 21 st century

universities3, and in particular economic value through the production of wealth, health value which

entails the enhancement of health, and epistemic value through the pursuit of truth. As will become

apparent in the later sections of this paper, epigenetic scientists recurrently engage in research

consortia driven by concerns for epistemic value, as they hope to produce scientific results, which are

worthy of publications in high impact-factor journals. In this paper, I concentrate on the processes

through which research consortia lead to the production of epistemic value, and the ways in which

epigenomic data are mobilised, worked on and managed in this process. Specifically, I focus on the

ways the abundance of data is fostered by research consortia, examining what happens to data once it

is ‘shared’. I analyse how the masses of data are shaped into standardised research resources and

foreground the organisational arrangements that enable research teams to manage the use of their data

in the collaborative endeavour. I show that while research consortia are built to create an abundance

of data, the production of epistemic value in these collaborative endeavours rests on the construction

of scarcity.

While this case study is concerned with epigenomic research data, and epigenetics research more

broadly, the observations I make in this field are relevant to other contexts where masses of research

data are created and mobilised towards the production of knowledge. There are, however, some

characteristics in the field of epigenetics research that make the data and research collaborations

2 Rose and Novas (2004) discuss three types of biovalue: economic value through the production of wealth, health value which entails the enhancement of health, and ethical value.3 As Pinel (2019) pointed out, academic science is characterised by the coexistence of different understandings of what constitutes valuable knowledge, and a plurality of criteria for evaluating research. For example, academic research is increasingly understood as an activity that can be fostered to seek economic profit through the commercialisation of research (Mirowski, 2011, Slaughter and Leslie, 1997). This represents an evaluative framework based on values of competition and production. It coexists with the conception of research as a collective endeavour in search for intellectual debate, based on the value of communalism, education and truth (Etzkowitz, 2011). It also coexists with an evaluative framework centred around productivity, which is operationalised in terms of publications or recorded citation.

3

practices described in this paper particularly salient. I address these points throughout the paper and in

the discussion.

I begin this paper by briefly discussing how science policy and Science and Technology Studies

(STS) literatures have attended to the topic of research collaborations and data sharing. Next, I discuss

the analytical framework I employ in this paper to study knowledge production practices in research

consortia, emphasising what the lens of value production and the concept of asset enables us to do

when thinking about data-intensive research collaborations. This is followed by a brief discussion of

the research methods and the laboratory in which the study was carried out. I then turn to an analysis

of the practices of research collaborations within epigenomics research consortia.

Large-scale research collaborations in policy and STS literature

At science policy level, research collaborations are positively connoted: they are thought to enhance

the quality of research, leading to more relevant and efficient research (The Royal Society, 2011),

while they are also understood as opportunities to strengthen the competitiveness of the economy by

bringing together different researchers and bodies of knowledge in order to foster innovation

(European Commission, 2012). Such policies thus assume a relationship between scientific

collaborations and valuable research. Science policy interest for research collaborations is reflected in

funding schemes. For example, the European Union’s Research Framework Programmes require

applicants to establish a consortium with at least three partners in three eligible countries. Data

sharing practices have also been an area of interest for science policy, and this is in part linked to the

understanding that efforts to put together and share data are central to contemporary large-scale

collaborations. A number of policies have been developed to promote data sharing and open access

(Organization for Economic Cooperation and Development, 2009, General Secretariat of the Council,

2016). They have focused on the removal of so-called ‘barriers to access’ by ensuring principles of

data sharing or examining potential ethical issues relating to data sharing such as the protection of

privacy. These policies encouraging, controlling or restricting data flows as part of collaborative

efforts understand data as the “oil wells” of the future (e.g. Anonymous, 2017, Puschmann and

Burgess, 2014), that is, they are seen as highly valuable resources which research teams, commercial

firms or data registries should harvest and work on to create different forms of value, being economic,

health or scientific.

In the STS literature, a common way to define research collaborations is to see it as a process that

starts by collaborators sharing their respective resources. Hackett defines a collaboration as “a family

of purposeful working relationship between two or more people, groups, or organisations.

Collaborations form to share expertise, credibility, material and technical resources, symbolic and

4

social capital” (2005b: 671). In another paper, Hackett (2005a) unpacks what collaborations within a

research group entail. He stresses that “members of a group work together using an arrangement of

materials, techniques, instruments, ideas, and enabling theories that I call an ensemble of research

technologies.” (ibid.: 788) For him, a research collaboration is based on tangible or intangible

resources, which are shared among different actors. However, what Hackett does not tell us is the

modalities according to which these resources are shared, nor does he describe how they are

mobilised between researchers to produce valuable research.

Contemporary collaborative arrangements are also often discussed in terms of size or growth, for

example framing those as ‘supersize science’ (Vermeulen, 2009) or more generally as ‘big science’

(Price, 1963, Borgman et al., 2008) while pointing to the increasing manpower and expensive

equipment required to achieve research goals. Beyond such accounts, STS scholarship has provided

more nuanced and critical discussions of large-scale and collaborative biological research, studying,

for example, how contemporary ‘big science’ differs from historical manifestation of large-scale

science (Vermeulen and Penders, 2010), while also exploring the normative assumptions underlying

large-scale biology (Davies et al., 2013). Others have examined what it takes to produce knowledge

within large-scale collaborations by exploring its forms of governance. For example, Hilgartner

(2013) studied the Human Genome Project (HGP) – an international research effort that aimed to

sequence and map a full human genome – and showed that sequence data was central to the

management and coordination of research centres spread out over several geographical locations.

Each participating institution contributing to the collaborative effort competed in the production and

dissemination of sequence data. Collaborative research was thus turned into a matter of incremental

production with the aim of aggregating commensurable and quantifiable masses of data across space,

while speed became a key factor for evaluating the success of contributions by each centre (Davies et

al., 2013).

Authors have paid particular attention to networked structures in contemporary collaborative biology

(Baker and Millerand, 2010, Vermeulen et al., 2013, Vermeulen and Penders, 2010, Zimmerman and

Nardi, 2010). These networks loosely connect a high number of individuals and groups of scientists to

work on a shared goal: they exchange materials, divide work, while still remaining independent and in

control of their own project within the network. Central to many of these networks are data

repositories that allow multiple researchers, laboratories or institutions to collaborate on the creation,

use and reuse of data. Leonelli (2016) greatly contributed to the study of data repositories by

specifically pointing to the human labour involved in making a database. She discussed the work of

‘curators’ who mobilise their expertise and skills to select, prepare and classify data to make it

available in databases. She argued that what is central to the work of curators is to enable the global

circulation of data and its widespread use in a diversity of projects, at the same time that they enable

5

the local adoption of data in specific research contexts (Leonelli, 2013). For data to circulate across a

wide network of data users, data are increasingly standardised in the ways it is produced and

‘packaged’ with detailed metadata describing its origins (Baker and Millerand, 2010, Edwards et al.,

2011). Contributing to this debate, Decker (2018), in this journal, discussed what is at stake when

constructing a database in historical climatology for individual researchers, their teams and research

field. When data producers deposit ‘their’ data to feed the collective repository and make it freely

accessible, they are concerned with how their data will be ‘treated’ by others. Decker argues that this

does not simply stem from researchers’ concern over academic credit and recognition for their work

producing the data, but also from researchers’ “immense personal involvement” with data as they help

produce it. Or as he puts it, the data are “somehow part of them (…) given the arduous work invested

in it.” (Decker, 2018: 22)

This critical body of literature emphasises that efforts to put together and share data as a research

resource are central to contemporary large-scale collaborations. Researchers, laboratories and

institutions come together and make their data accessible within a network. Such collaborative efforts

enable the abundance of data often deemed necessary for the production of valuable research results.

At the same time, we learn from this body of work that sharing data in a research collaboration entails

work on the part of researchers and curators, who mobilise their skills, expertise and affect to produce,

process and package data. In this process, data become highly valued objects for researchers because

of the labour and personal involvement they put in it to create it, at the same time that data gain

evidential value. This suggests a tension between the individual value data hold for those producing it

and its collective value when data are shared and enable the production of knowledge within the

collaboration. However, this body of work does not tell us how this tension is negotiated in practice,

nor does it analyse the modalities according to which teams share and mobilise their respective data as

part of the collaborative endeavour to produce valuable knowledge. That is, how do research teams

cooperate and share their data with others to produce valuable knowledge? What happens to data once

it is shared in a research consortium? How is the abundance of data managed towards the production

of valuable knowledge? In the rest of this paper, I offer answers to help close this research gap.

This paper thus speaks to broader questions about how we produce valuable knowledge within large-

scale collaborations based on a data sharing ethos. To answer these questions, I draw upon analytical

tools from STS scholarship on value production in the bioeconomy, and in particular the concept of

asset.

Accumulation, assets and value production

6

STS scholars (Birch, 2017, Birch and Tyfield, 2013, Martin, 2015, Chiappetta and Birch, 2018) have

used the term bioeconomy4 to refer to the set of economic activities derived from biotechnology and

biosciences. These authors stress that capitalist logics are at play in contemporary academic research,

by underlining that different forms of capital (e.g. economic, academic credibility) are accumulated

through scientific work. For example, Fochler (2016) describes a capitalist cycle whereby researchers

mobilise their expertise, skills and technologies as resources to produce results, which can be

converted into publications and serve as assets in the competition for grant funding, a form of

economic capital, which could then be reinvested in the laboratory. He terms this cycle of

accumulation ‘epistemic capitalism’.

For these authors, value in the biosciences is a social process that involves a network of actors, objects

and processes, rather than something that is inherent to biological materials and processes, which is

what, for example, Waldby (2002) discusses with the concept “biovalue”. They specifically

emphasise that value is linked to a series of assets, which are mobilised as resources towards the

production of knowledge and governed according to specific organisational and managerial

arrangements (Birch, 2017, Birch and Tyfield, 2013, Chiappetta and Birch, 2018, Martin, 2015, Birch,

2020). An asset is a tradable resource which can be tangible (e.g. a sequencing technology) or

intangible (e.g. expertise in epigenetics; networks). There are two main ways in which assets can be

mobilised as resources towards the production and accumulation of value. First, assets can be used by

actors to produce commodities, which are then sold to create value. Second, assets have value as

properties, and actors can produce and accumulate value through the ownership of valuable assets,

which they rent to others in exchange of a revenue.

Birch (2017) argues that value in the bioeconomy mainly results from processes of assetization by

which knowledge is turned into a property that yields an income stream. For example, a firm turns its

technoscientific knowledge into Intellectual Property Rights (IPRs). This means that knowledge

becomes an intangible and scarce asset from which it can earn royalties. In such scenario, value is

specifically derived from the exclusion of other people from accessing this knowledge: while

knowledge is enclosed, those who own IPRs distribute rights to use this knowledge against a revenue.

Authors note that the production of value in the bioeconomy is mainly asset-based, rather than

commodity-based: value is constituted predominantly by ownership and control of assets, such as 4 For these authors, the term bioeconomy refers to the increasing importance of biotechnology research, its applications and commercialisation in different sectors of the economy. There are however several understandings of the term bioeconomy (Bugge et al., 2016, Pavone and Goven, 2017). For example, Pavone and Goven (2017) identify three main understandings of the term: the biotechnology vision, which emphasises the importance of biotechnology research and its commercialisation in different sectors of the economy; the biomass economy vision, which stresses the substitution of biomass for fossil sources of energy, thus presenting the biomass economy as a more sustainable economy; and finally bioeconomy as a form of capitalism, with authors exploring the relations between life sciences and capitalism in specific areas like pharmaceuticals (Rajan, 2006), or in the life sciences in general (Birch, 2017, Birch and Tyfield, 2013).

7

IPRs, datasets or brand equity, that underpin potential new products. These observations lead Birch

and Tyfield (2013) to argue that the bioeconomy functions as a “rentier regime of accumulation”

whereby knowledge is made into scarce assets from which actors extract rent to produce value.

To understand the origins of value in the bioeconomy, STS scholars therefore not only encourage us

to pay attention to the set of resources and assets mobilised in productive systems, but also to the

ways these assets and resources are controlled and governed. Or as Hilgartner, in his study of the HGP

reminds us (2017, for a summary see also Pinel, 2018), epistemic concerns of securing knowledge

come together with socio-political concerns of securing control. He suggests unpacking the set of

“control-relationships”, defined as “regimes and practices that allocate entitlements burdens among

agents” (p.7), which are built onto ‘knowledge-objects’, such as data, biomaterials or results.

What we learn from such accounts is that academic science is marked by strategies of accumulation as

teams seek to gain academic credibility, funding or expand their networks. Towards that end, they

build resources, such as technologies or data, and turn them into assets towards the production of

value. This body of literature reminds us that there are different ways of producing value from assets

within the biosciences: either as resources in the production of commodities (e.g. data are used within

the laboratory to produce knowledge and lead to publications5), or as properties that can yield rents

(e.g. data are rented out to other laboratories against a revenue). While the production of valuable

knowledge requires an array of resources, in a rentier regime of accumulation, an actor does not need

to own all the necessary resources. Instead, actors can collaborate and rent one another the necessary

assets. A collaboration thus enables teams to gather remote assets, and once value is produced, it is

redistributed to the different collaborators, often in the form of authorship on publications. In

addition, while an array of resources needs to be aggregated to produce value, in a rentier regime of

accumulation, value is captured when assets are made scarce, and authors encourage us to consider

the specific organisational arrangements and processes through which scarcity is constructed and

control over assets and resources managed. This body of work can therefore help us unpack the

process through which, within epigenomics research consortia, more data means better research, and

specifically, it can help us explain the modalities according to which data are mobilised and shared in

a collaboration to produce valuable knowledge.

Drawing on this body of work, I conceptualise research and clinical data produced within data-

intensive research as resources research teams turn into assets, while I study how collaborators

mobilise them within research consortia to produce value. I pay particular attention to the

collaborative arrangements through which they are accumulated, thus enabling an abundance of data

5 In a citation regime, publications can also function as assets, as they may deliver rent in the form of citations.

8

to be created in the research consortia, at the same time that I explore the processes of control through

which the supply of data in is managed.

The laboratory and research methods

This article draws upon findings from an ethnographic study conducted in an UK-based laboratory

carrying out genetics and epigenetics research on twins. For the purpose of this article, I call the

laboratory Twinomics6. Research in the laboratory explored complex diseases with a particular

interest in age-related diseases, including osteoporosis, diabetes and cardiovascular diseases.

Twinomics was located within a highly-ranked, research-led university in the UK and held a number

of competitive public and private grants to support its research and infrastructures.

Work at Twinomics was centred around a large database of clinical and research data gathered over

several decades. The database included data from thousands of twins, with clinical, physiological and

lifestyle data, as well as hundreds of phenotypes related to common diseases. Twinomics was mostly

a ‘dry-lab’, which meant that scientists conducted computational or applied mathematical analyses of

twins’ data, and using an epidemiological approach, looked for the incidence and distribution of

specific traits in populations.

I carried out ethnographic fieldwork at Twinomics between January and June 2016. I observed and

participated in the daily life of the lab, spending time with researchers at their desks as they ‘ran’

computational analysis of their datasets, observing the production of data and its processing by staff,

sitting in the weekly lab meetings and journal clubs, or sharing lunch and coffee breaks with staff. The

ethnographic data I draw upon in this paper consist of fieldnotes from participant observation,

numerous informal conversations with staff, as well as in-depth semi-structured interviews with

members of Twinomics, from laboratory lead though to junior researchers (17 in total). All interviews

and observations from fieldwork were transcribed, coded and thematically analysed (Attride-Stirling,

2001). The theme of research collaborations, and research consortia in particular, appeared significant

early on during the study, which guided further investigation into the set of concerns motivating such

collaborative endeavours, as well as the practices enabling them.

In what follows, I draw upon this ethnographic data to discuss how researchers at Twinomics

collaborate with others and ‘share’ their epigenetics data in research consortia. I describe three stages

in the process of working together in a research consortium, analysing what this process entails for the

laboratory’s data and how it leads to the production of valuable knowledge.

6 Names and identifying details of places and individuals have been changed.

9

Aggregating masses of data

The epigenetics team at Twinomics focuses on the study of DNA methylation datasets. These are

comprehensive genome-wide profiling of human DNA methylation, which are produced by analysing

blood, skin and adipose samples from twins through a methylation array. Using an epidemiological

approach, researchers explore the causes and consequences of changes in DNA methylation at

population level. There are two broad areas the epigenetics team conducts research in: on the one

hand, research concerned with characterising the extent to which genetic variation shapes epigenetics,

and on the other hand, research exploring the role of the environment on epigenetics, with the aim of

finding associations between DNA methylation levels and environmental factors. Mark, the Principal

Investigator (PI) of the epigenetics team, explains that they have been focusing on specific

environmental factors:

we're taking things we know have an impact, like smoking, and we're trying to explore that in

more depth. … A number of people in my group are trying to identify other environmental

factors that may have a big signature on the epigenome. And it's been difficult to find things

that strongly influence the epigenome. … we've collaborated with other groups to increase

power, to detect these effects, so alcohol is one. Exercise I'm also very keen to work on that.

Diet also. … I mean there are lots of things you can look at there, but I suppose when you boil

it down to, I'm looking at genetics and environment using previously, sort of, clearly defined

environmental factors.

They focus on “strong environmental factors”, that is, factors which can be defined and measured

relatively well, and which show significant effects on DNA methylation markers. They do so to

ensure their research will lead to the production of results which could be easily interpreted, to then be

converted into publications. For example, smoking is considered a “strong environmental factor”

because it is rather simple for researchers to identify smokers from non-smokers using a questionnaire

and a “Yes” or “No” type of question. Smoking data is considered trustworthy and researchers have

been able to identify significant associations between smoking status and DNA methylation. In

contrast, diet is considered a difficult environmental factor to work on because, first, it is challenging

to capture the content of participants’ diet over a long period of time, which leads to imprecise data,

and second, epigenetics researchers have been struggling to find significant associations between diet

and epigenetic markers. Researchers often speak of such data as “noisy” data, because it is difficult to

identify a signal amid the imprecisions in the data. For these projects studying associations between

DNA methylation and environmental factors that can prove difficult for researchers to precisely

define and capture with clinical or research data, the epigenetics team at Twinomics has been working

in collaboration with other teams, and in particular, through research consortia. As Mark put it,

10

working in consortia provides research teams with “power to find things.” That is, it enables teams to

gain access to a higher number of samples, which can balance out the ‘noise’ associated with the

imprecise factor studied.

Four or five years ago, we had study designs to look at smoking, alcohol, diet, exercise. … So

we did all these things, and out of all of them, smoking is the only one that gave us positive

answers. The others didn't. Now, obviously smoking has a bigger impact on epigenetics, the

other option is that they all have an impact, it's just that numbers were too low to detect the

signals. And as a result of this, we're now contributing towards bigger studies trying to

identify, as I said, alcohol, there is also diet.

Research consortia enable Twinomics to have “bigger” studies, whereby a number of research teams

bring together their data, thus increasing the overall sample size of their research projects. At the

laboratory, the phrase “bigger is better” was often used by researchers when discussing their work. As

Mark explained, a study becomes bigger when it is based on samples from a higher number of

individuals, and it is assumed to be of better quality “because the bigger the number [of individuals in

the study], then the more likely you are to pick up the individual change [on the epigenome]”. For

researchers in environmental epigenetics dealing with hard-to-define environmental factors, research

consortia are thus particularly valuable because it can help them accumulate more data, which means

enhanced possibilities of identifying research results, and therefore higher chances of creating

epistemic value.

David’s experience of research consortia is illustrative. David is a PhD student in his final year, who

worked closely with Mark on a research project exploring associations between DNA methylation and

women’s age at menarche7 and menopause. They first started examining this association on their own

twins’ data:

When I started with the project looking at if there were DNA methylation changes linked to

the age at menopause and the age at menarche, and the difference of time between those two

points … we used around 500 samples that we had in our cohort. We found things that

seemed to be interesting, but we were not reaching that genome-wide significance threshold.

So we thought that increasing the sample size could give us a better view to see if there is

really a DNA methylation change associated with this. (David)

7 Menarche refers to the first menstrual period. The period between menarche and menopause is understood as women’s reproductive time. For David, this time window represents a proxy for studying women’s time of exposure to estrogen. This is considered an interesting research area because a number of studies have shown associations between estrogen and breast cancer (Key et al., 2001).

11

For this project, Twinomics’ own data was insufficient to provide answers and create epistemic value.

While it pointed to an interesting “region” on the epigenome, it could not be used to produce

statistically significant results that could then be published in scientific journals. As this example

suggests, research consortia are used by research teams as a scientific infrastructure to aggregate vast

quantities of data in order to keep creating and accumulating epistemic value. Or as Katherine,

another member of the epigenetics team, puts it, “One day they just realised that a single cohort can’t

publish very good papers. So they developed this kind of consortium.”

This model of accumulation based on the aggregation of vast quantities of data to identify research

results in epigenetics is influenced by Genome-Wide Association Studies (GWAS). This approach

consists of identifying associations between genetic variations and particular diseases or phenotypes.

It involves scanning the genome of a vast number of individuals and searching for genetic markers or

single nucleotide polymorphisms (SNPs) (Bush and Moore, 2012). In epigenetics research, such

studies are called Epigenome-Wide Association Studies (EWAS): the data being used concern the

epigenome rather than the genome, with researchers looking to identify epigenetic markers

associating with specific traits, such as diet or age at menopause and menarche. Mark suggests that

such studies, which are commonly undertaken in research consortia, are based on a standardised

model, and what changes from project to project is the scale of data mobilised to answer the research

questions:

What we're doing now, we’re just doing it on a bigger scale, which is a little bit boring. …

Interviewer: Why do you say that those studies are boring?

They are just boring to me! [Laughs] I suppose because five years ago we designed them all

on a small scale, and now it's just the same thing, it's just “bigger and better”, “bigger and

better”. When does it end? It's a bit like the GWAS.

In data-intensive research, the stream of data must keep flowing and growing, and it is through data

accumulation that new knowledge is constructed, and value created. Such processes of value

production resonate with data capitalism, which as Sadowski (2019) notes, is characterised by logics

of accumulation whereby actors looking to find new ways to produce value through the gathering of

more and more data.

Researchers at Twinomics prefer collaborating with other teams and using their data as part of a

consortium rather than producing more data on their own. This is linked to the fact that data

production is an expensive and time-consuming endeavour as it entails PIs securing funding, research

12

nurses and PhD students collecting samples from twins or researchers processing and organising data

to transform it into a valuable research resource that can be used to foster results (for a detailed

description of the different forms of work underlining data production within data-intensive research,

see Pinel et al., In Press). As David puts it, research consortia are a convenient and efficient way to

reach data that is already “out there”:

If we want to increase our power to discover new associations, we could do that by just

increasing our sample sizes. And if there are people out there that have a similar cohort and

have interrogated DNA methylation with the same technology, it is convenient to pull our

samples together and we don't have to spend more resources on collecting more samples and

then doing more arrays.

At the same time, the scientific practices within research consortia reflect normative assumptions

whereby science should be “Bigger and faster” (Calvert, 2013, Davies et al., 2013). Underlying much

of contemporary science is the expectation that expanding the scale and speed of scientific enquiry

will lead to improvements in quality and efficiency, and thus lead to more valuable research. Research

consortia represent a convenient way for teams to expand the scale of their research by bringing

together collaborators who can share their data, to create bigger studies, and divide the work among

themselves, to enable faster research.

When results plateaued, and research teams were not able to foster evidential value from their own

resources and data, they found new ways to create epistemic value by aggregating data through

research consortia. Enabling an abundance of data, research consortia became key research

infrastructures of data-intensive research that scientific research teams regularly use to circumvent

limitations in their own resources, thus facilitating the production of value. While this observation is

not specific to epigenetics research and can be made in other data-intensive research fields, I argue

that the type of data environmental epigenetics deals with, namely broad and hard-to-define

environmental factors, makes research consortia and data accumulation practices particularly salient.

Shaping the abundance of data

Data are the key resources mobilised within research consortia to produce valuable knowledge.

However, for data to be effectively used towards the production and accumulation of value in

epigenetics, it first needs to be turned into assets. In this section, I discuss in more detail what happens

to the masses of data in a research consortium, and specifically analyse the different forms of labour

required to transform data into valuable assets that can be mobilised to produce knowledge. I show

13

that while research consortia enable the abundance of data, this abundance is carefully shaped as staff

work with other research teams to make their data valuable and produce scientific results.

The process of value production and accumulation in research consortia starts with relational labour

as teams connect with one another and exchange ideas. Once David and Mark had identified age at

menopause and menarche and its association with DNA methylation as an interesting research area to

explore, they approached a consortium bringing together research teams carrying out work in

epigenetics. They looked through the consortium’s research portfolio and observed that teams had

conducted together association studies based on DNA methylation data. Mark and David formally

reached out to the consortium by putting forward a proposal summarising their project’s aims and the

type of data it would be based on. David explains:

We proposed [the project] to the consortium. Around eight different cohorts showed interest.

There was another group that was already working in a similar project, well, they were only

focusing on age at menarche. So we decided to work together, linking the projects together – I

would focus more on the menopause part and they would focus on the menarche part. In the

end, we are co-leading this project.

The research consortium described by David functions as a loose network structure, whereby teams

affiliated to the consortium can volunteer to participate in collaborative projects. The consortium is

understood as an inclusive infrastructure in that it welcomes any research team to take part in

collaborative endeavours, no matter how big their datasets are. In particular, researchers value

consortia because, as Juliette, a PhD student in the lab, puts it, they allow “small teams” with “small

datasets” to take part in collaborative projects and gain “better projects” than the ones they would

have been able to conduct on their datasets alone. However, to participate in a consortium, teams must

have some data to share, however little this might be. The inclusive and sharing ethos of the

consortium therefore comes with forms of exclusion (Lezaun and Montgomery, 2015): to become part

of it, one must have something to offer.

In the above example, David and Mark found a number of partners who agreed to take part in their

project. Some laboratories who had relevant data to share became ‘participating cohorts’: their main

role was to provide this data for the project so as to increase the sample size, and in exchange, they

gained secondary authorship on the publication. Another laboratory, who had an aligned research

interest and carried out preliminary work in the area, became a closer partner: they proposed an

analytical focus and mobilised their scientific expertise, as well as their data, towards the production

of knowledge. David and this other team decided to share leadership in this project, which meant that

14

they worked together to design the project and identify the research question, while they both gained

primary authorship on the publication.

Leadership in a research consortium also means writing an analysis plan. It details how to go about

analysing large datasets computationally, with steps and sub-steps and the necessary “scripts” to use

in each step. The analysis plan starts by providing definitions of the key factors and processes studied.

In David’s project as part of the consortium, partners worked towards agreeing on a common

definition of menopause and menarche, as well as ways to capture it as data. While menarche can be

defined easily as it refers to a clear point in time (age of the first menstrual cycle), menopause is more

difficult to capture. It is often defined as the time when women have had no menstrual periods for 12

consecutive months, however, changes in a women’s menstrual cycle can begin about six years before

the actual menopause. As such, it can prove difficult to capture the exact date of menopause for

women, and this may result in inconsistent data across teams participating in a consortium. When

writing up the analysis plan, David and his partner discussed different ways to define menopause. A

telephone meeting was scheduled to specifically address this issue. Both David and Mark took part in

the call, and on the other end of the line, their partner was represented with the PI and a postdoc.

David shared how they, at Twinomics, had been defining menopause in their work, arguing that they

adopted this definition because the “data are there” and it aligned with work on the topic that had been

published in the scientific community. Or as Mark put it, “we don’t want to reinvent the wheel.” Their

collaborator seemed happy with that definition but insisted that they checked whether this would fit

the data that the participating cohorts already had. The rest of the meeting was spent agreeing on

detailed instructions about how to format and ‘clean’ the data.

Formatting the data means turning the “raw” data that are produced by the sequencing array into a set

of numbers and processing those in ways that will make possible the use of statistical tests and

computational analysis. During my time at Twinomics, every researcher I spent time with was, at one

point or another, involved in cleaning and formatting data. This was the case of Olivia, a postdoc in

the epigenetics team, who was taking part in an epigenomics research consortium bringing together

six different research teams. One afternoon, I observed Olivia as she worked at her desk. She

alternated between enthusiastically typing on her keyboard and nervously staring at her screen. A

number of programmes were running on her computer including ‘R’, the software used to “play with

the data” and “run analyses” and ‘Github’, the online platform used by the consortium to share the

analysis plan. As she explained, she was referring to the consortium’s analysis plan to process and

clean her data in a standardised fashion, which had been agreed upon by members of the consortium.

More specifically, Olivia was copying scripts from the analysis plan onto ‘R’, adding a few lines of

codes to make the scripts specific to her data, and then making the programme run. Some seconds

later, a plot appeared on her screen, which, she explained, showed the distribution of the data. She

15

specifically put her finger on two points on the plot that stand out and explained that these are

“inconsistencies” in her data that needed to be “optimised”. Referring once more to the consortium’s

analysis plan, Olivia then typed in a few additional lines of code. After letting the programme run, a

new plot came up on the screen, and, this time, the two inconsistencies were gone. According to

Olivia, in order to best format and optimise the data for this analysis, she had to delete these data

entries that were inconsistent with the rest of the data. As this example suggests, cleaning and

formatting data therefore entails detecting incomplete, inaccurate or irrelevant parts of the data by

replacing, modifying or deleting them. Inconsistencies in the data may be caused, for example, by

errors in data entry or the use of multiple definitions of what relevant data are.

These different forms of labour allow standardised data to be aggregated in the consortium, regardless

of their origin. The definitional and standardisation work is particularly important in environmental

epigenetics, where the environmental factors studied are not clear cut and can be defined in a number

of ways. For research teams in environmental epigenetics to come together and study the impact of

the environment onto the epigenome, they first need to agree on what the environment is and how to

measure it in order to produce results of significance.

The analyses plans used by consortium to format, clean and analyse datasets function like

‘standardised packages’ (Fujimura 1978, 1996) that can be used by non-specialists in the laboratory.

For example, in the case of David’s project, it does not require scientists to be familiar with the

biological mechanisms of DNA methylation in reproductive health. However, it requires

computational expertise and skills working with large datasets to be able to manipulate data, run

statistical tests and detect anomalies in the data. In addition, what is required of scientists is

knowledge of their data and its specificities. Researchers in each cohort develop in-depth knowledge

of their data, as they help create it by cleaning and organising it, and work with it to produce

knowledge. As I demonstrated elsewhere (Pinel et al., In Press), researchers form caring relationships

with their data, and this personal involvement in the data is a way of knowing the data and how to

work with it in order to produce valuable knowledge. Researchers “know” their datasets, that is, they

have contextual and practical knowledge about what a dataset seeks to represent, its strengths and

weaknesses. When taking part in a research consortium and putting to use their data, researchers in

each team thus mobilise this in-depth knowledge of their data to shape it.

Through this labour standardising the masses of data accumulated throughout the different research

teams, the raw data are turned into valuable research assets that can be mobilised by the groups to

produce scientific results and valuable knowledge. As Juliette suggests, it is thanks to this work

preparing the data that the results obtained later on during the analysis can be “trusted” as a true

reflection of a biological phenomenon, otherwise “[you] might make a wrong interpretation”.

16

These insights indicate that the creation of valuable knowledge within a research consortium does not

simply originate from an abundance of data, but this abundance is enabled by specific organisational

structures and shaped by different forms of labour. Researchers first need to connect with others and

agree to put together their resources and data for a specific project. The masses of data aggregated

then need to be shaped and processed by each team following detailed protocols. Researchers put

together analysis plans and mobilise their in-depth knowledge of the data to format and process it. It is

through these different forms of labour that researchers in a consortium transform the data

accumulated as valuable research assets that can be used to produce knowledge and create epistemic

value. While a research consortium enables an abundance of data, this abundance is shaped by

researchers mobilising their skills, expertise and know-how to produce value.

Organising scarcity

Once data are processed and formatted in a standardised fashion across the participating cohorts,

analysis takes place. A specific organisational arrangement oversees how data are analysed, while it

allows collaborators to govern and manage value in the consortium by tightly controlling the use of

their assets. David explains:

So after we wrote the analysis plan and sent it to the other cohorts, we waited for them to

collect the samples, do the analysis, and then upload these results. And now I am in the stage

of the meta-analysis.

In this particular arrangement, the participating cohorts use the analysis plan provided by David and

his partner to analyse their data. This means that the labs participating in the research consortium keep

their datasets separate and conduct the analysis on their own, using their intimate knowledge of their

data to apply the ‘analysis scripts’ and produce results, which they then send to David and his partner.

What teams share across the consortium is therefore not data per se, but the results they obtain from

analysing their data. In an interview, I asked David about this particular arrangement, whereby each

team conducts the analysis on their own dataset, instead of bringing those datasets together to then

conduct the analysis in one go:

Every cohort knows their datasets better. So if they have batch effects or things that they

already know can be tricky in their datasets, they can work it out. But yeah of course, it would

be probably better to pull all the samples together and get a result out of that. But it's also

more efficient if you divide the work in different groups, and probably save some time.

17

This collaborative arrangement serves several functions. First, as David observes, keeping datasets

separate and analysis divided among collaborators is a way to foster each cohort’s in-depth

knowledge of their data. The research consortium recognises that researchers in each team are

personally involved with their data – that is, they know what it can and can’t do, how it “behaves” or

how it prefers to be “treated” – and seeks to foster that knowledge towards the creation of value.

Second, this way of working is also deemed efficient and motivated by a desire to go faster

(Vermeulen et al., 2013), as it divides the work of analysing data among the different teams involved.

Finally, this way of working enables collaborators to tightly control their assets, and in particular, to

organise limits and exclusion on the use of data as a resource. Teams own valuable data, which they

put to productive use by taking part in the collaborative research effort. However, they enclose their

data within their laboratories’ walls as they each conduct the analysis on their own datasets. The

research consortium enables teams to exclude others from accessing their data, thus constructing their

data as scarce. It is from this constructed scarcity that the teams derive value from their data (Birch,

2020). By enforcing exclusion rights over their data, teams construct their data as monopoly assets

only they have access to, which then allows them to capture monopoly rents.

I use here the term monopoly to denote a specific form of monopoly. There are two main forms of

monopoly assets and monopoly rents. First, a position of monopoly can be derived from the unique

quality of an asset. When an actor owns a unique asset, they have exclusive control over it on the

market, and this comes with privileges, such as the opportunity to influence prices and extract

monopoly rents from anyone wishing to use his/her unique asset. For example, in the data-intensive

research community, if a research team is the only one in possession of a rare dataset or sequencing

technology, that team can be said to be in a position of monopoly. Second, a position of monopoly can

be derived from constructing an asset as scarce and organizing limits and exclusion rights over its use.

In this paper, the DNA methylation data shared in consortia conform with this second type of

monopoly assets. Teams coming together in a consortium have similar data, which means they do no

gain monopoly rents from the unique quality of their asset. Instead, the teams gain monopoly rents by

enclosing their data within their laboratories’ walls and restricting access to it.

Monopoly rents are captured once the meta-analysis is conducted and valuable knowledge produced.

In the example above, the participating cohorts sent results originating from their datasets to David

and his partner leading the collaboration. They combined these individual results into a large study,

thus increasing the sample size and improving the estimates of the size of the effect. David and his

partner were able to gather over 4,000 samples and reach a statistically significant result, which they

wrote up into a manuscript for publication. As such, the consortium enabled the production of

scientific results and the creation of value. The peer-reviewed paper in this configuration functioned

as a token of credibility for the teams involved, while it came to recognise the epistemic value

18

fostered in the consortium. It is through authorship on the publication that the value created from the

meta-analysis was distributed to the participating cohorts: David’s and his partner’s team were

granted first and senior authorship for their work initiating the project, putting together the analysis

plan, managing the collaboration and conducting the meta-analysis, while the participating cohorts

gained middle authorship. By providing their data to the consortium, the participating cohorts thus

receive a revenue, in terms of authorship on publications. It represents a form of rent is that it is an

income derived from the ownership of valuable and scarce assets.

This collaborative arrangement within research consortia resembles a rentier regime of accumulation:

research teams participating in the consortium own valuable assets – their large-scale datasets which

they formatted and processed – and construct them as monopoly assets by restricting their use; they

put these assets to productive use in the consortium for the conduct of research and the creation of

valuable research; finally they extract monopoly rents from their assets through authorship on

publications. Rentiership in research consortia thus rests on the active and ongoing organisation and

management of value through the construction of scarcity. This observation comes in complement of

existing work on assetization and rentiership in the bioeconomy. STS scholars have mostly applied

the concept of rent to examine specific resources and assets in technoscience such as intellectual

property, with for example discussions about the use of IPRs (Birch, 2017, Birch, 2020, Birch and

Tyfield, 2013). Here, I broaden the conceptual applicability of rentiership to resources like

epigenomic data. I show that a similar regime of accumulation is in place in academic science and

epigenetics research, with research consortia enabling assetization and monopolisation practices.

The particular collaborative arrangements in place within research consortia shed light on an

interesting tension. To enhance the epistemic value of their research, laboratories require an

abundance of data. But in contemporary technoscience, laboratories turn their data into scarce and

monopoly assets by organising limits and exclusions rights on the use of their data. Data are therefore

accumulated, at the same time that they are made into scarce assets through practices of

monopolisation. Data are not actually shared between collaborators, but they are autonomously put to

productive use with the aim of maximising the value that can be extracted from them.

Discussion

Over the years, epigenetics has received high public and scientific attention and some have even gone

so far as to argue that “epigenetics is now the hottest thing in bioscience” (Jirtle, 2012). Social

scientists and humanities scholars, including scholars in the STS field, have taken an interest in

epigenetics. For them, epigenetics is interesting because it represents a new ‘style of reasoning’

(Hacking, 2002) according to which the body, health and illness are more open to the social, thus

19

symbolising a move away from gene-centric approaches (Lock, 2013, Mukherjee, 2016). However,

few studies have yet empirically examined epigenetics research in practice (Lloyd and Müller, 2018).

This means that little is known about how epigenetics knowledge is formed, negotiated and

interpreted in specific research contexts. This paper, together with other papers in this special issue,

help to close this gap by providing an empirical and critical account of epigenetics research through

an analysis of scientists’ collaboration practices in the field.

I took a close look at the ways teams produce epigenetics knowledge within research consortia and

showed that what is at stake in these collaborative arrangements is the aggregation of masses of data.

Epigenetics research teams, no longer able to foster scientific results and value from their data alone,

turn to research consortia. These scientific infrastructures loosely connect research teams, who can put

together their datasets to investigate associations between DNA methylation markers and specific

traits or environmental factors. Enabling an abundance of data within the field of epigenetics, research

consortia are key scientific infrastructures that facilitate the production of epistemic value. For data to

become a valuable research resource, it needs to be worked on in a standardised fashion by staff

across the different teams participating in the collaborative endeavour. The masses of data are shaped

by researchers mobilising their computational expertise and in-depth knowledge of their data to

format, process and clean the data before analysis can take place. Data are also turned into scarce

assets, with teams enclosing their respective data within their laboratories’ walls and enacting limits

and exclusions on the use of their resource. Participating cohorts thus turn their data into monopolised

assets only they have control over and can decide to put them to productive use against a revenue, in

the form of authorship on publications. This is a form of rent in that it is derived from the ownership

of valuable and scarce assets. The creation of value within research consortia is therefore carefully

organised, managed and governed, and rests on a set of practices, human labour and knowledges that

render data into valuable and scarce assets.

While discussing epigenetics research practices in research consortia, I underlined that scientists in

this field focus their studies on specific research questions, in particular exploring associations

between DNA methylation and a number of environmental factors such as smoking, diet or alcohol.

They do so because these specific articulations of the notion of environment can be defined and

studied well thanks to the availability of vast amounts of data throughout the epigenetics research

community. As such, the sort of research undertaken in epigenetics and the ways the notion of

environment is defined in this field are influenced by the availability of data and what is likely to

foster epistemic value. These insights resonate with ongoing debates about the social production of

ignorance, with scholars interrogating the social context shaping what we know and don’t (Kleinman

and Suryanarayanan, 2013).

20

These findings provide a ‘de-romanticised’ picture of epigenetics research. It comes in stark contrast

with some of the social science and humanities accounts of epigenetics that tell us about the

revolutionary potential of this research field for the opportunity it represents to examine the body in

its environmental, historical and sociocultural context (Lock, 2015, Landecker and Panofsky, 2013).

By exploring epigenetics research in the making as part of research consortia, I point to what it takes

to produce valuable knowledge in this field of research in terms of practices and knowledges. I show

that the questions being asked in epigenetics are shaped by knowledge infrastructures like research

consortia, are unpack how exactly these knowledge infrastructures enable the production of valuable

knowledge.

A question that emerges is what is particular about epigenetics and the collaboration and valuation

practices discussed in this paper? One could argue that there is nothing specific about epigenetics.

Data accumulation is a common thread of contemporary postgenomics research (Richardson and

Stevens, 2015), with scientists looking to study genetics in their wholeness, while this is made

possible by the use of new sequencing technologies enabling the production of large-scale genomic or

epigenomic maps (Ankeny and Leonelli, 2015). In any field, data need to be processed and cleaned in

order to be made valuable. Scholars in critical data studies (e.g. Leonelli, 2016, Neff et al., 2017,

Ribes and Jackson, 2013, Gitelman, 2013) have contributed to this debate by pointing to the series of

practices, expertise and tacit knowledge required to curate data. In addition, the production of value is

a social process that involves a network of actors and objects. While, in epigenomics research

consortia, value is linked to specific assets (e.g. epigenomic data; computational expertise), similar

processes of value production could be observed in neighbouring fields such as gene expression or

genomics.

In some respects, however, epigenetics lends itself well to the practices described in this paper, and

this for a number of reasons. First, epigenetics has received high public policy and media attention, as

well as high levels of public and private funding. This means that an important number of research

teams have invested in epigenetics, acquiring and developing the necessary resources (e.g. datasets;

expertise; skills) to produce scientific results in this field. These are the research teams that, having

something to share, turn to consortia for the production of valuable research. Second, research teams’

move to epigenetics was facilitated by the fact that epigenetics is a flexible concept with fluid

boundaries (Meloni and Testa, 2014, Pickersgill, 2016, Pinel et al., 2018). This means that a range of

research groups with a diversity of disciplinary backgrounds and expertise can attach themselves to

epigenetics research and participate in collaborative endeavours in consortia. Third, in environmental

epigenetics, what is defined as ‘the environment’ varies greatly from one research team to another,

and many of the environmental factors studied, such as diet or menopause, are difficult to define and

measure, and as such, they can prove difficult to study. Research consortia provide ways for research

21

teams to bypass this problem by, first, aggregating masses of data that can balance the imprecisions in

the ways the environment is defined, second, by adopting consistent definitions across the research

teams involved of what is understood as the environment and, third, by standardising the processing

and cleaning of data.

This paper not only contributes empirical insights into the practice of epigenetics research, it also

offers conceptual tools to examine and problematise large-scale research collaborations. By bringing

into dialogue the body of literature on research collaborations and data repositories together with STS

work on value production, assetization and rentiership, I unpacked the assumed relationship, present

in both scientists’ discourses and science studies (Dietz and Bozeman, 2005, Lee and Bozeman, 2005,

Subramanyam, 1983), between research collaborations and the production of valuable research.

Specifically, I took a closer look at the normative assumption within ‘big biology’ whereby expanding

the scale of biological enquiry through, for example, the aggregation of more research materials, leads

to better research. Conceptualising data as assets, I examined how these are mobilised within research

consortia towards the production of research results and the creation of epistemic value, pointing in

particular to a rentier regime of accumulation. This analytical frame was instrumental in uncovering

monopolisation practices at play in research consortia, and led to the observation that in order to

create value from the abundance of data within these scientific infrastructures, teams construct their

assets as scarce. While contemporary big biology is marked by the generalised imperative to share

data to create abundance (Lezaun, 2013, Lezaun and Montgomery, 2015), collaborative endeavours

within research consortia are in fact built around forms of exclusions: exclusion of those outside the

collaboration who do not own valuable properties that can be shared and exclusion of those inside the

collaboration from using the ‘shared’ resources.

This paper also contributes to discussions on valuation, providing both empirical and analytical

grounding to the question of what gets valued and how in the knowledge economy. Demonstrating

that knowledge production is entangled with valuation processes, I unpack what it takes for

epigenetics researchers to produce knowledge that can be deemed valuable. Value production in

academic research laboratories takes places through a series of assets, which are mobilised as

resources towards the production of knowledge, while they also function as private properties in their

own right, used by laboratories to extract a revenue, through a rentier regime of accumulation. Such a

regime of accumulation comes together with tight processes of control, whereby research laboratories,

concerned with maximising the revenue that can be extracted from their assets, enact limits and

exclusions on the use of their resources. Finally, this paper offers analytical tools to understand the

origins of value in the knowledge economy. I suggest thinking of the laboratory as a productive

system, which entails, first, unpacking the set of resources making up the laboratory; second,

analysing the set of practices through which resources are turned into valuable assets; and third,

22

paying attention to the ways in which these come together in the productive system that is the

laboratory towards the production of value.

The collaborative arrangements discussed in this paper, based on principles of scarcity and private

property, contrast with the contemporary Open Science movement (Levin and Leonelli, 2017), which

encourages researchers to disclose a variety of outputs from their work, ranging from datasets, to

biological materials and publications (European Commission, 2015, The Royal Society, 2012,

Research Councils UK, 2013), on the basis that openness will enhance the transparency of research

and promote the reusability of research outputs within and beyond the research community. Within

the epigenetics research community, research consortia demonstrate a selective approach to openness

by organising the scarcity and monopolisation of data on the one hand, and the sharing of research

results, on the other hand. It is through these selective processes of openness and exclusion, that data

are rendered highly valuable resources, and the creation of valuable research enabled.

Acknowledgements

I first would like to thank the staff in the laboratory who shared their time, space and thoughts with

me. I am indebted to Christopher McKevitt and Barbara Prainsack, for their guidance and insightful

comments on this work. I also thank the editors of this special issue for their input on the manuscript.

Finally, my thanks go to David Wyatt for stimulating discussions and providing valuable comments

on earlier versions of this article.

Funding

This work was supported by the Wellcome Trust [grant number WT108574MA].

References

Ankeny, R. & Leonelli, S. 2015. Valuing data in postgenomic biology. In: Richardson, S. & Stevens, H. (eds.) Postgenomics: Perspectives on Biology after the Genome. Durham, NC: Duke University Press.

Anonymous. 2017. The world’s most valuable resource is no longer oil, but data. The Economist (6 May) [Online]. Available: https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource [Accessed 26 March 2019].

Armstrong, L. 2014. Epigenetics, London, Garland Science.Attride-Stirling, J. 2001. Thematic networks: an analytic tool for qualitative research. Qualitative

Research, 1, 385-405.Baker, K. & Millerand, F. 2010. Infrastructuring ecology: challenges in achieving data sharing. In:

Parker, J., Vermeulen, N. & Penders, B. (eds.) Collaboration in the New Life Sciences. Farnham: Ashgate.

Birch, K. 2017. Rethinking Value in the Bio-economy: Finance, Assetization, and the Management of Value. Science, Technology, & Human Values, 42, 460-490.

23

https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource

https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource

Birch, K. 2020. Technoscience Rent: Toward a Theory of Rentiership for Technoscientific Capitalism. Science, Technology & Human Values, 45, 3-33.

Birch, K. & Tyfield, D. 2013. Theorizing the Bioeconomy: Biovalue, Biocapital, Bioeconomics or . . . What? Science, Technology, & Human Values, 38, 299-327.

Borgman, C. L., Wallis, J. & Enyedy, N. 2008. Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries. International Journal of Digital Libraries, 17, 17-30.

Bugge, M., Hansen, T. & Klitkou, A. 2016. What Is the Bioeconomy? A Review of the Literature. Sustainability, 8, 691-713.

Bush, W. & Moore, J. 2012. Chapter 11: Genome-wide association studies. PLoS Computational Biology, 8.

Calvert, J. 2013. Systems biology, big science and grand challenges. BioSocieties, 8, 466-479.Chiappetta, M. & Birch, K. 2018. Limits to biocapital. In: Gibbon, S., Prainsack, B., Hilgartner, S. &

Lamoreaux, J. (eds.) Handbook of Genomics, Health and Society. New York: Routledge.Davies, G., Frow, E. & Leonelli, S. 2013. Bigger, faster, better? Rhetorics and practices of large-scale

research in contemporary bioscience. BioSocieties, 8, 386-39.Decker, K. 2018. Data struggles: The life and times of a database in Historical Climatology. Social

Science Information, 57, 6-30.Dietz, J. S. & Bozeman, B. 2005. Academic careers, patents, and productivity: industry experience as

scientific and technical human capital. Research Policy, 34, 349-367.Edwards, P. N., Mayernik, M. S., Batcheller, A. L., Bowker, G. C. & Borgman, C. L. 2011. Science

friction: Data, metadata, and collaboration. Social Studies of Science, 41, 407–414.Etzkowitz, H. 2011. Normative change in science and the birth of the Triple Helix. Social Science

Information, 50, 549-568.European Commission 2012. Enhancing and focusing EU international cooperation in research and

innovation. In: Communication from the Commission to the European Parliament, T. C., The European Economic and Social Committee and the Committee of Regions. Tech. Rep. Com(2012) 497 Final. (ed.). Brussels: European Commission.

European Commission 2015. Validation of the results of the public consultation on Science 2.0: Science in Transition. In: Innovation, R. A. (ed.). Brussels: European Commision,.

Fochler, M. 2016. Variants of Epistemic Capitalism: Knowledge Production and the Accumulation of Worth in Commercial Biotechnology and the Academic Life Sciences. Science, Technology, & Human Values, 41, 922-948.

General Secretariat of the Council 2016. Council Conclusions on the Transition towards an Open Science System. Brussels, Belgium: Council of the European Union.

Gitelman, L. (ed.) 2013. Raw data is an oxymoron, Cambridge, MA: MIT Press.Hackett, E. J. 2005a. Essential Tensions: Identity, Control, and Risk in Research. Social Studies of

Science, 35, 787-826.Hackett, E. J. 2005b. Introduction to the Special Guest-Edited Issue on Scientific Collaboration.

Social Studies of Science, 35, 667-671.Hacking, I. 2002. Historical Ontology, Cambridge, MA, Harvard University Press.Haig, D. 2012. Commentary: The epidemiology of epigenetics. International Journal of

Epidemiology, 41, 13-16.Hilgartner, S. 2013. Constituting large-scale biology: Building a regime of governance in the early

years of the Human Genome Project. BioSocieties, 8, 397-416.Hilgartner, S. 2017. Reordering Life. Knowledge and Control in the Genomics Revolution, MA:

Cambridge, MIT Press.Jirtle, R. L. 2012. Epigenetics: How genes and environment interact. NIH Director’s Wednesday

Afternoon Lecture Series [Online], 18 April 2012. Available: http://videocast.nih.gov/launch.asp?17223 [Accessed 26 March 2019].

Key, T., Verkasalo, P. & Banks, E. 2001. Epidemiology of breast cancer. The Lancet Oncology, 2, 133-140.

Kleinman, D. L. & Suryanarayanan, S. 2013. Dying Bees and the Social Production of Ignorance. Science, Technology, & Human Values, 38, 492-517.

24

http://videocast.nih.gov/launch.asp?17223

Landecker, H. & Panofsky, A. 2013. From Social Structure to Gene Regulation, and Back: A Critical Introduction to Environmental Epigenetics for Sociology. Annual Review of Sociology, 39, 333-357.

Lee, S. & Bozeman, B. 2005. The Impact of Research Collaboration on Scientific Productivity. Social Studies of Science, 35, 673-702.

Leonelli, S. 2013. Global data for local science: Assessing the scale of data infrastructures in biological and biomedical research. BioSocieties, 8, 449-465.

Leonelli, S. 2016. Data-Centric Biology: A Philosophical Study, Chicago, University of Chicago Press.

Levin, N. & Leonelli, S. 2017. How Does One “Open” Science? Questions of Value in Biological Research. Science, Technology, & Human Values, 42, 280–305.

Lezaun, J. 2013. The escalating politics of ‘Big Biology’. BioSocieties, 8, 480-485.Lezaun, J. & Montgomery, C. 2015. The Pharmaceutical Commons: Sharing and Exclusion in Global

Health Drug Development. Science, Technology and Human Values, 40, 3-29.Lloyd, S. & Müller, R. 2018. Situating the biosocial: Empirical engagements with environmental

epigenetics from the lab to the clinic. BioSocieties, 13, 675-680.Lock, M. 2013. The Epigenome and Nature/Nurture Reunification: A Challenge for Anthropology.

Medical Anthropology, 32, 291-308.Lock, M. 2015. Comprehending the body in the era of the epigenome. Current Anthropology, 56, 151-

177.Martin, P. 2015. Commercialising neurofutures: Promissory economies, value creation and the

making of a new industry. BioSocieties, 10, 422-443.Meloni, M. & Testa, G. 2014. Scrutinizing the epigenetics revolution. Biosocieties, 9, 431-456.Mirowski, P. 2011. Science-Mart: Privatizing American Science, Cambridge, MA, Harvard

University Press.Mukherjee, S. 2016. Same but different: How epigenetics can blur the line between nature and

nurture. The New Yorker [Online], 2 May 2016. Available: https://www.newyorker.com/magazine/2016/05/02/breakthroughs-in-epigenetics [Accessed 26 March 2019].

Neff, G., Tanweer, A., Fiore-Gartland, B. & Osburn, L. 2017. Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science. Big data, 5, 85-97.

Organization for Economic Cooperation and Development 2009. OECD Guidelines on Human Biobanks and Genetic Research Databases.

Pavone, V. & Goven, J. 2017. Introduction. In: Pavone, V. & Goven, J. (eds.) Bioeconomies. Life, Technologies, and Capital in the 21st Century. Cham, Switzerland: Palgrave Macmillan.

Pickersgill, M. 2016. Epistemic modesty, ostentatiousness and the uncertainties of epigenetics: on the knowledge machinery of (social) science. The Sociological Review Monographs, 64, 186-202.

Pinel, C. 2018. Hilgartner, S . Reordering Life: Knowledge and Control in the Genomics Revolution. Cambridge, MA: PB - MIT Press . 2017. 368 pp. £27.95 (hbk) $24 (ebk) ISBN 9780262035866. Sociology of Health & Illness, 40, 926-928.

Pinel, C. 2019. Enterprising environments: Knowledge production in epigenetics in two British laboratories. PhD thesis, King's College London.

Pinel, C., Prainsack, B. & Mckevitt, C. 2018. Markers as mediators: A review and synthesis of epigenetics literature. BioSocieties, 13, 276-303.

Pinel, C., Prainsack, B. & Mckevitt, C. In Press. Caring for data: Value creation in a data-intensive research laboratory. Social Studies of Science.

Price, J. D. 1963. Little Science, Big Science, New York, Columbia University Press.Puschmann, C. & Burgess, J. 2014. Big data, big questions. Metaphors of big data. International

Journal of Communication, 8.Rajan, K. S. 2006. Biocapital: The constitution of postgenomic life., Durham, Duke University Press.Research Councils Uk 2013. RCUK Policy on Open Access and Guidance.Ribes, D. & Jackson, S. 2013. Data Bite Man: The work of sustaining long-term data collection. In:

Gitelman, L. (ed.) '‘Raw Data’’ is an oxymoron. Cambridge, MA: MIT Press.Richardson, S. & Stevens, H. (eds.) 2015. Postgenomics: Perspectives on Biology after the Genome,

Durham: Duke University Press.

25

https://www.newyorker.com/magazine/2016/05/02/breakthroughs-in-epigenetics

Sadowski, J. 2019. When data is capital: Datafication, accumulation, and extraction. Big Data & Society.

Slaughter, S. & Leslie, L. 1997. Academic Capitalism, Baltimore/London, The John Hopkins University Press.

Subramanyam, K. 1983. Bibliometric studies of research collaboration: A review. Journal of Information Science, 6, 33-38.

The Royal Society 2011. Knowledge, Networks and Nations: Global scientific collaboration in the 21st century. RS Policy document 03/11. London, UK: The Royal Society.

The Royal Society 2012. Science as an Open Enterprise. London: The Royal Society Science Policy Centre report 02/12.

Vermeulen, N. 2009. Supersizing Science: On Building Large-Scale Research Projects in Biology. PhD Thesis, Maastricht University.

Vermeulen, N., Parker, J. N. & Penders, B. 2013. Understanding life together: A brief history of collaboration in biology. Endeavour, 37, 162-171.

Vermeulen, N. & Penders, B. 2010. Collecting Collaborations: Understanding Life Together. In: Parker, J., Vermeulen, N. & Penders, B. (eds.) Collaboration in the New Life Sciences. Farnham: Ashgate.

Waldby, C. 2002. Stem Cells, Tissue Cultures and the Production of Biovalue. Health, 6, 305-23.Zimmerman, A. & Nardi, B. 2010. Two Approaches to Big Science: An Analysis of LTER and

NEON. In: Parker, J., Vermeulen, N. & Penders, B. (eds.) Collaboration in the New Life Sciences. Farnham: Ashgate.

26

Documents

This is not the final version - static-curis.ku.dk · Web viewassets, which they can put to productive use in collaborative endeavours against a revenue. In addition to contributing