Science Beyond the Facts: A Pragmatic Structure for Research
Articles
Anita de Waard
Elsevier Labs, Disruptive Technologies Utrecht University
Introduction
- There was too much scientific information
(43,848 Papers on p53)
- And it was all written in stories....
[demo Papers]
Once Upon a Time....
Research Goal
- Find a structure for research articles,that allows computer-aided access to knowledge elements
- Start with Research Articles in Cell Biology
- Expand to other genres/domains?
- How do we extract this structure?
- How do we use this structure?
Pragmatic
1. Colloquial: practical, vs. theoretical
2. Linguistic: ‘meaning of linguistic messages in their context of use’ (per/il/locutionary goals)
3. Pragmatic web: ‘quality of goal-oriented discourse in communities’
2
English 306A; Harris 4
Functions
Ideational function:What does “The cat is on the mat” mean as an expression in
the system of English?
How?Denotation, truth conditions, event schemata, semantic roles, …
Interpersonal function:What does “The cat is on the mat” mean to hearer X, when
said by speaker Y, in context Z?
How?Speech acts, conversational maxims, face principles, deixis, …
English 306A; Harris 5
Functions
Ideational function:What does “The cat is on the mat” mean as an expression in
the system of English?
How?Denotation, truth conditions, event schemata, semantic roles, …
Interpersonal function:What does “The cat is on the mat” mean to hearer X, when
said by speaker Y, in context Z?
How?Speech acts, conversational maxims, face principles, deixis, …
English 306A; Harris 6
Meaning
SemanticsPropositions
Truth/falsity
Context-free
Language-in-vitro
PragmaticsUtterances
Appropriateness
Context-dependent
Language-in-vivo
Method
Genre + Discourse Studies- Science is written in text, as a story
- Text is created by humans to persuade other humans (peers, that claims are facts)
- To tell the computer how we encode our knowledge, we need to understand:
=> How do humans tell stories?
=> How do stories make sense?
Work on corpus
-Corpus of 14 coherent (citing, cited) articles in Cell Biology, based around (Voorhoeve, 2006)
-Hand-modeled ascii text; created XML
-Manual (by me + small user validation)
(Preliminary)Results
Aristotle Quintilian Cell APA Style Guide
prooimion Introduction exordiumThe introduction of a speech, where one announces the subject and
purpose of the discourse, and where one usually employs the persuasive appeal of ethos in order to establish credibility with the audience.
Introduction Introduction
prothesis Statement of Facts narratio
The second part of a classical oration, following the introduction or exordium. The speaker here provides a narrative account of what has
happened and generally explains the nature of the case. Quintilian adds that the narratio is followed by the propositio, a kind of summary of the
issues or a statement of the charge.
Introduction Introduction
Summary propostitioComing between the narratio and the partitio of a classical oration, the
propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation.
Abstract Abstract
Division/outline partitio
Following the statement of facts, or narratio, comes the partitio or divisio. In this section of the oration, the speaker outlines what will follow, in
accordance with what's been stated as the status, or point at issue in the case. Quintilian suggests the partitio is blended with the propositio and
also assists memory.
Table of Contents Article Outline
pistis Proof confirmatioFollowing the division / outline or partitio comes the main body of the
speech where one offers logical arguments as proof. The appeal to logos is emphasized here.
Results Methods, Results
Refutation refutatioFollowing the the confirmatio or section on proof in a classical oration,
comes the refutation. As the name connotes, this section of a speech was devoted to answering the counterarguments of one's opponent.
Discussion Discussion
epilogos peroratioFollowing the refutatio and concluding the classical oration, the peroratio
conventionally employed appeals through pathos, and often included a summing up (see the figures of summary, below).
Discussion Discussion
1st Attempt: Classical rhetoric
The Story of Goldilocks and the Three Bears
Story Grammar Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins
Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.
a little girl named Goldilocks Characters Objects of study
the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,
She went for a walk in the forest. Pretty soon, she came upon a house.
Location Experimental setup
studied and compared in vivo effects and interactions to those of the human protein
She knocked and, when no one answered,
Goal Theme Researchgoal
Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.
she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression
At the table in the kitchen, there were three bowls of porridge.
Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files
Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain
She tasted the porridge from the first bowl.
Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.
This porridge is too hot! she exclaimed.
Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells
So, she tasted the porridge from the second bowl.
Data (data not shown),
This porridge is too cold, she said
Outcome Results both genotypes show many large holes and loss of cell integrity at 28 days
So, she tasted the last bowl of porridge.
Data (Figures 1B-1D).
Ahhh, this porridge is just right, she said happily and
Outcome Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles
she ate it all up. Data (Figure 1F),
2nd Attempt: Story Grammar
3rd Attempt: Discourse Segments
- “A text is made up of Discourse Segments and the relations between them” - Grosz and Sidner, Mann-Thomson, Marcu, Swales
- Discourse Segment Purpose: element that has a consistent rhetorical/pragmatic goal.
- Define for Biological Research Article
Discourse Segments In Biology <Goal>To examine miRNA expression from the miR-Vec system, </Goal><Method> a miR-24 minigene-containing virus was transduced into human cells. Expression was determined using an RNase protection assay (RPA) with a probe designed to identify both precursor and mature miR-24 (Figure 1B). </Method><Result>Figure 1C shows that cells transduced with miR-Vec-24 clearly express high levels of mature miR-24, whereas little expression was detected in control-transduced cells. </Result>
12
Introduction Method Results Discussion Total
Fact 63 0 104 37 204
Problem 20 0 10 15 45
Goal 2 0 72 6 80
Method 2 all 129 6 137
Result 10 0 230 44 284
Implication 14 0 100 36 150
Hypothesis 10 0 33 26 69
Total 121 0 678 170 969
Segments vs. Sections
Fact Problem Goal Method Result Implication Hypothesis Total
Present active 72 46% 27 60% 15 23% 7 7% 37 16% 69 51% 38 55% 265
Present passive
5 3% 2 4% 2 3% 1 1% 1 0% 11 8% 1 1% 23
Past active 18 11% 5 11% 11 17% 48 47% 122 54% 16 12% 8 12% 228
Past passive 25 16% 2 4% 1 2% 17 17% 21 9% 1 1% 5 7% 72
Future 2 1% 3 7% 0 0% 0 0% 1 0% 0 0% 0 0% 6
Imperfect: "to" 13 8% 2 4% 32 50% 2 2% 20 9% 14 10% 7 10% 90
Gerund ("ing") 22 14% 4 9% 3 5% 28 27% 23 10% 24 18% 10 14% 114
Total 157 100% 45 100% 64 100% 103 100% 225 100% 135 100% 69 100% 798
Segment Tense
Fact Hypothesis Problem Goal Method Result Implication End Total
Start 18 3 1 8 2 2 4 0 38
Fact 83 22 13 17 9 31 12 1 188
Hypothesis 20 5 3 7 6 2 6 3 52
Problem 9 7 7 2 3 5 3 3 39
Goal 7 0 2 4 46 6 0 0 65
Method 13 2 3 10 25 54 3 0 110
Result 23 9 4 6 16 85 78 6 227
Implication 13 6 4 12 11 30 12 25 113
Total 186 54 37 61 118 215 118 38 827
Segment order
goalto
hypothetical realm: (might, would)
realm of activity: (to test, to see)
realm of models: present
realm of experience:
past
we
method
result
resulting in
Discourse: A Fact(ory)
suggests that
implication
discussion
Own viewShared view
hypothesis
fact fact fact
incongruity or ignorance
problem
introduction
results
discussion
Links (Under Construction)To references:
- From/to segment type makes difference: methods link, fact link, agree/disagree link
- Not clear where to link into: is claim truly in referred document? How to locate?
To figures/tables:
- Usually main proof in results (methods) segments: need to allow multi-media elements in system!
Discourse relations:
- Many taxonomies: RST, Hovy, Sanders, ClaiMaker
- Identify textual coherence/argumentation...
Fact Problem Goal Method Results Implication Hypothesis
Fact in animals however (3x)
to, we examined (2x)
we fused, we utilised
in contrast, we found (5x), though, on average, under our conditions
our data suggest, we propose that, consistent with
suggesting that (2x)
Problem we fused in this paper
Goal we isolated we showed
Method we found (2x), while, as seen
but suggests we predicted
Results in addition, in contrast
we utilised, we used
interestingly (2x), since (3x), also (2x), while (2x), second (2x), third (2x), finally (2x), subsequent, thereafter, in our study
(strongly) suggests/suggesting that (8x), implicating (2x), consistent with (2x), demonstrating that (3x)
we propose, suggesting that
Implication to verify, to confirm
we replaced, we fused, we tried
however, first (2x), interestingly (2x), consistent with, in our analysis, strikingly, neither
also in theory
Hypothesis in animals, in support of this, indeed
to test (2x) however, our results provide evidence that
Coherence Markers
Preliminary Hypotheses1 'To' infinitive appears as marker of Goal moves +
2 Sequential connectives appear within same segment type -
3 'though', 'however', 'therefore' - causal connectives occur at all
-> Problem and -> Hypothesis boundaries
0
4 'suggests' occurs at Results-> Implication/Hypothesis boundary +/0
5 'we found' /'we observed'/ 'we showed' -> Result boundary +/0
6 'we + other verb' occurs at -> method boundary 0
7 Contrast/correspondence in Fact <-> Result <-> Implication moves +!
Discussion
Research Goals fulfilled?allow computer-aided access to knowledge: yes, but: > need to identify if they do cover this genre> need to finalize a structure of relations
other genres/domains?> investigate more than cell biology
how do we extract this structure?> collaborative attempts to identify segment markers/relationships - next step
how do we use this structure? : [ DEMO ]> possible collaborations with sensemaking systems?
Preliminary Conclusions- Science is created in text
- Goal of text is to convince peers that claims (backed by data) belong to fact canon
- Text convinces humans through rhetorical/narrative discourse structure
- Text creates meaning in the human mind
- Discourse parsing could allow access to knowledge structure
- More work needed: collaborations?
Questions?
[email protected]://people.cs.uu.nl/anita
Appendix
Related workBio-informatics Style Guides Shum et al Harmsze Swales RST Teufel Collier et.al
Sections x x x
Moves x x x x!
Entities x x
Embedding x x
Discourse relations x x x
Argumentational relations x x
* Need complete model for multidocument collection – markup of content elements and relationships
* Unique role as a publisher: can apply/mandate at the source!
Total Fact Problem Goal Method Result Implication Hypothesis End Total
Start 18 1 8 2 2 4 3 0 38
Fact 83 13 17 9 31 12 22 1 188
Problem 9 7 2 3 5 3 7 3 39
Goal 7 2 4 46 6 0 0 0 65
Method 13 3 10 25 54 3 2 0 110
Result 23 4 6 16 85 78 9 6 227
Implication 13 4 12 11 30 12 6 25 113
Hypothesis 20 3 7 6 2 6 5 3 52
Total 186 37 61 118 215 118 54 38 827
Selfs 221
Model: 399
% in Model:
65.84%19
24
Nr Section Introduction Results Discussion
A1 Agami, Results 4
A2 Agami, Discussion ½ 2 ½
A3 Agami, Introduction 3
S1 Serrano, Results 2
S2 Serrano, Discussion 1 1
S3 Serrano, Introduction 2
V1 Voorhoeve, Results 2
V2 Voorhoeve, Discussion 3
V3 Voorhoeve, Introduction 1 2
Results Clause assignment test (8 tests handed in, avg. 38 clauses each):
114 Clauses 51 No Disagreement
13 Fact/Result 11 Fact/Problem
10 Method/Result 7 Result/Implication
4 Goal/Method 3 Problem/Goal 2 Goal/Result
2 Problem/Interpretation 2 Fact/Interpretation
1 Problem/Result
Comments on classification:• Incomplete sentences are unclear, hard to classify• Add ‘Hypothesis’ category, exx. clauses 8, 33, 74a, 77,
78b.• Other possible categories: Assumption, Observation,
“Given that...”
Clause Classification Test
25
References• Austin, J.L. How to do things with words, J.O. Urmson, ed. Oxford: Clarendon Press, 1962.
• Bazerman, Charles : Shaping written knowledge : the genre and activity of the experimental article in science, Madison, Wisconsin: Univ. of Wisconsin Press, 1988.
• F.J. Bex, H. Prakken, C. Reed & D.N. Walton, Towards a formal account of reasoning about evidence: argumentation schemes and generalisations. Artificial Intelligence and Law 11 (2003), 125-165
• Buckingham Shum, Simon J. Uren, V. et. al , Modelling Naturalistic Argumentation in Research literatures: Representation and Interaction Design Issues, Tech Report kmi-04-28, December 2004
• Harmsze, Frédérique. PhD Thesis, February 9, 2000. A modular structure for scientific articles in an electronic environment (HTML & PDF).
• Hovy, E. Automated discourse generation using discourse structure relations. Art. Intelligence 63(1-2): 1993. 341-386.
• Kircz, Joost G.. Modularity: the next form of scientific information presentation? Journal of Documentation. vol.54. No. 2. March 1998. pp. 210-235.
• Kuhn, Thomas, The Structure of Scientific Revolutions (Chicago: University of Chicago Press, 1962)
• Latour, B., Science in Action, How to Follow Scientists and Engineers through Society, (Cambridge, Ma.: Harvard University Press, 1987)
• Latour, Bruno, Steve Woolgar, Jonas Salk, Laboratory Life: The Construction of Scientific Facts, Princeton University Press, 1986