Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
DELIVERABLE 3.3
Workflow implementation and streamlining for high-
throughput image analysis of large-scale studies
Grant agreement no.: 601055 (FP7-ICT-2011-9)
Project acronym: VPH-DARE@IT
Project title: Dementia Research Enabled by IT
Funding Scheme: Collaborative Project
Project co-ordinator: Prof. Alejandro Frangi, University of Sheffield
Tel.: +44 114 22 20153
Fax: +44 114 22 27890
E-mail: [email protected]
Project web site address: http://www.vph-dare.eu
Due date of deliverable Month 24
Actual submission date Month 27
Start date of project April 1st 2013
Project duration 48 months
Work Package & Task WP 3, Task 3.2, 3.3
Lead beneficiary UCL
Editor PMO
Author(s) Nicolas Toussaint, David Cash, Wyke
Huizinga
Quality reviewer Peter Metheral, Wiro Niessen
Project co-funded by the European Union within the Seventh Framework Programme
Dissemination level
PU Public X
PP Restricted to other programme participants (including Commission Services)
RE Restricted to a group specific by the consortium (including Commission
Services)
CO Confidential, only for members of the consortium (including Commission
Services)
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 2 -
Issue Record
Version no. Date Author(s) Reason for
modification
Status
1.0 18/06/15 Nicolas Toussaint,
David Cash
Initial release Draft
2.0 30/06/15 Nicolas Toussaint,
David Cash
Reviewers
comments
Reviewed
2.1 05/07/15 PMO Final check Finalised
Copyright Notice
Copyright © 2013 VPH-DARE@IT Consortium Partners. All rights reserved. VPH-
DARE@IT is an FP7 Project supported by the European Union under grant agreement no.
601055. For more information on the project, its partners, and contributors please see
http://www.vph-dare.eu. You are permitted to copy and distribute verbatim copies of this
document, containing this copyright notice, but modifying this document is not allowed. All
contents are reserved by default and may not be disclosed to third parties without the prior
written consent of the VPH-DARE@IT consortium, except as mandated by the grant agreement
with the European Commission, for reviewing and dissemination purposes. All trademarks and
other rights on third party products mentioned in this document are acknowledged and owned
by the respective holders. The information contained in this document represents the views of
VPH-DARE@IT members as of the date of its publication and should not be taken as
representing the view of the European Commission. The VPH-DARE@IT consortium does not
guarantee that any information contained herein is error-free, or up to date, nor makes
warranties, express, implied, or statutory, by publishing this document.
Author(s) for Correspondence
Nicolas Toussaint, PhD
University College London
Translational Imaging Group,
Centre for Medical Image Computing
3rd Floor, Wolfson House,
London NW1 2HE
United Kingdom
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 3 -
TABLE OF CONTENTS
1. INTRODUCTION ......................................................................................................................... 5
2. BIOMARKER PIPELINE DESIGN ............................................................................................ 5
2.1. GENERAL PIPELINE SPECIFICATIONS ........................................................................................ 6 2.2. INVENTORY OF IMAGE ANALYSIS TOOLS .................................................................................. 7 2.3. IMAGE ANALYSIS WORKFLOW BLOCKS ................................................................................... 7 2.4. INCORPORATION INTO VPH-DARE RESEARCH PLATFORM ..................................................... 7
3. PER BIOMARKER PIPELINE IMPLEMENTATION ............................................................ 9
3.1. BIAS CORRECTION ................................................................................................................... 9 3.1.1. Evaluation on testing Set ................................................................................................ 9
3.2. WHOLE BRAIN PARCELLATION AND TISSUE SEGMENTATION ................................................ 10 3.2.1. Testing Set .................................................................................................................... 11
3.3. HIPPOCAMPAL VOLUME PROFILE .......................................................................................... 12 3.3.1. Evaluation on AD / control testing set ......................................................................... 14
3.4. DIFFUSION PROCESSING ......................................................................................................... 15 3.4.1. Evaluation on ADNI retrospective cohort .................................................................... 18
4. BIOMARKER EXTRACTION ROADMAP ............................................................................ 19
5. CONCLUSIONS .......................................................................................................................... 20
6. REFERENCES ............................................................................................................................ 20
7. ANNEXES .................................................................................................................................... 21
7.1. WORKFLOW ............................................................................................................................ 21
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 4 -
TABLE OF FIGURES
Figure 1: Diagram of the Nipype workflow environment (http://nipy.sourceforge.net/nipype).
Distinct image analysis tools are embedded into interfaces that constitute blocks of a workflow
seamlessly executable and fully reproducible. .......................................................................... 7 Figure 2: Data and information flow in the research platform. The pipelines are stored in the
platform and are at the disposition of the user. The platform facilitates the application of
validated biomarker extraction workflows on large cohorts for high throughput analysis. ...... 8 Figure 3: Correlation of BSI measures when using N3 or N4 for bias correction as pre-
processing, presented as Bland-Altman plots. ......................................................................... 10 Figure 4: Brain Parcellation workflow. A database is used to propagate template labels to the
target input structural image .................................................................................................... 11 Figure 5: Example of whole brain parcellation on one of the 1000 subjects of the RSS. ...... 12 Figure 6: Top: parcellation showing the hippocampus in red. Bottom: distribution of the
hippocampus volume as percentage of intracranial volume as function of age. ..................... 12 Figure 7: hippocampus segmentations (left) with estimated left and right long axes (middle and
right) superimposed on structural T1 image. ........................................................................... 13 Figure 8: (left) Kernel density estimation in 1D (black line) with sparse data points (blue
markers). The blue dots represent the data points and the black continuous line represents the
density estimation. (right) Kernel density estimation is applied along the principal axis of the
hippocampus. ........................................................................................................................... 13 Figure 9: Two examples of output profiles from a healthy subject (left) and a patient suffering
from Alzheimer’s disease (right). The graph shows the left (red) and right (green) hippocampal
volume profiles. ....................................................................................................................... 14 Figure 10: Volume Profile Generation workflow. Kernel density estimation is used to estimate
the continuous function from sparse voxel volumes projected along the hippocampus axis. . 14 Figure 11: Hippocampal volume (in TIV percentage) profile on a population of 27 controls
(grey) and 43 AD patients (black) between the anterior and the posterior part. ...................... 15 Figure 12: Workflow for diffusion weighted imaging data. The diffusion-weighted images are
linearly registered to the average B0 image. Field maps and T1 images are used to estimate
susceptibility distortion. The resulted corrected images are used for tensor fitting. ............... 17 Figure 13: The diffusion-processing pipeline outputs maps depicting the white matter
arrangement. (Left) T1 weighted image. (Right) corresponding FA map colour-coded with
tissue orientation. ..................................................................................................................... 17 Figure 14: Quality Control graphs for Diffusion MRI. (Top) DWI image showing some
significant signal dropouts. (Bottom) corresponding inter-slice cross-correlation for B0 (red)
and DWI (blue) images, where the problematic volume is automatically detected. ............... 18 Figure 15: Inter-slice cross-correlation graphs on 216 subjects of the ADNI cohort.
Thresholding allows the automatic detection of 13 outliers containing significant signal
dropouts. .................................................................................................................................. 19 Figure 16: The Taverna Workbench. During the integration process, the user needs to import
the newly created service (top-left panel) into the workbench (main panel) and connect inputs
and outputs of the pipeline (in green) with the DARE portal nodes (in blue). The pipeline can
then be run from the menu. ...................................................................................................... 21
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 5 -
1. INTRODUCTION
Over the first two years of the project, VPH-DARE has collected numerous retrospective
imaging studies in dementia into a single repository represented by the VPH-SHARE
infostructure. This provides the ability to determine whether data and results from multiple
datasets can be pooled together in order to provide a better understanding of disease processes
and what factors (genetic, lifestyle, environmental) could influence them. Some of the most
well established biomarkers, as well as some of the most promising for early disease detection
and differential diagnosis, come from imaging. While many of these databases already have
been analysed before and contain certain derived imaging biomarkers for some datasets, each
database has employed different methodologies, software packages, and program settings to
obtain these values. Thus, it is important to extract the relevant imaging biomarkers from each
of these retrospective studies using standardised pipelines and to make them available to the
consortium for the purposes of mechanistic and phenomenological modelling done in WPs 5
and 6, as well as to provide normal and abnormal distributions to aid in diagnostic decisions as
part of the clinical platform being developed in WP8. This task represents a computationally
expensive endeavour, as there are multiple pipelines that require hours of computing time, and
tens of thousands of datasets from which the biomarkers need to be extracted. The VPH-DARE
research platform offers the opportunity to perform this extraction in a standardised and high-
throughput manner.
This deliverable has close ties with deliverable 3.1, in which we laid out the basic requirements
for key biomarkers we felt would be necessary for extraction, and deliverable 7.2, where we
presented methods for integrating these biomarker pipelines into the research platform. In this
document, we first focus on the design approach for constructing these biomarker pipelines
with the consideration that they will be used within the research platform. Then we discuss the
key biomarker pipelines: how we have optimised the pipeline parameters and any adjustments
that we have made to overcome various challenges that arose during the implementation. We
then show evidence that the biomarkers are performing as expected through the use of
validation test sets for each pipeline. Finally, we present the outline of a plan to complete
extraction of imaging biomarkers from the retrospective database as represented by Milestone
33.
2. BIOMARKER PIPELINE DESIGN
The goal of this deliverable and Milestone 33 is to extract imaging biomarkers from the
retrospective databases. This objective had strong implications in terms of the selection and
subsequent implementation of the pipelines. First, the most commonly used imaging
biomarkers in dementia research are based on high-resolution structural MRI using volumetric
T1-weighted imaging. These biomarkers provide quantitative volumetric assessments of key
brain structures and the longitudinal rate of change in volume of these structures. These are the
most well established biomarkers, with evident changes just before symptom onset and a strong
correlation with clinical severity. All of the retrospective studies in VPH-DARE that contain
imaging have some form of volumetric T1-weighted imaging as part of the protocol. As a result,
we decided to primarily focus on extracting biomarkers from these images. Second, there are
numerous publicly available software packages available to perform the image processing
needed to obtain these biomarkers, and we wanted our design to be flexible and interoperable
between these packages, additionally to tools developed in-house, so that the pipeline could
take the best components from each. Finally, we want to provide results to the end users that
are clear and reproducible. We felt that this would involve providing reasonable provenance
information (software versions, computer hardware, etc.) as well as one “validated” version of
the pipeline and corresponding end result rather than multiple versions with different settings.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 6 -
2.1. GENERAL PIPELINE SPECIFICATIONS
Current neuroimaging software provides a large variety of analysis tools that have been largely
approved by the scientific community. A non-exhaustive list of tools commonly used in the
neuroimaging community is presented below:
- FSL (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki) is a library and software suite containing a
large panel of basic and complex image processing and manipulation tools
- SPM (http://www.fil.ion.ucl.ac.uk/spm) is a library dedicated to statistical analysis of
imaging data
- FreeSurfer (http://surfer.nmr.mgh.harvard.edu) is a software suite especially used for
cortical surface study
- Camino (http://cmic.cs.ucl.ac.uk/camino) is a software for diffusion image processing
- Slicer (http://slicer.org) is a generic software package for medical image computing
- ANTS (http://stnava.github.io/ANTs) is a processing library used for image
registration and segmentation
It is therefore crucial to take advantage of this repertoire of existing tools. The choice of the
workflow implementation should therefore be driven by its ability to facilitate the integration
of such tools. Additionally, it is crucial for large studies that the workflows aiming at the
extraction of imaging biomarkers share some degree of reproducibility. Furthermore, common
image processing techniques such as registration and segmentation will be used numerous times
in different workflow. This requires a workflow environment facilitating the transfer of
processing blocks from workflow to workflow.
Such an environment could be achieved using basic shell scripts or by programming within the
command line binaries. This would require additional time in terms of implementation,
correctly identifying interoperability between software packages, managing the versions of
these packages, and recording this information so that it could be saved with the resulting
outputs. In some cases, it would be more sensible to do this when integrating into the research
platform. However, Nipype (http://nipy.sourceforge.net/nipype) is a Python based workflow
engine that is well adapted for neuroimaging studies, incorporates many of these ancillary
capabilities, and thereby comes as a natural choice for the implementation of the biomarker
extraction pipelines so that they could be incorporated as a major part of VPH-DARE@IT
image analysis workflows.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 7 -
Figure 1: Diagram of the Nipype workflow environment (http://nipy.sourceforge.net/nipype).
Distinct image analysis tools are embedded into interfaces that constitute blocks of a workflow
seamlessly executable and fully reproducible.
2.2. INVENTORY OF IMAGE ANALYSIS TOOLS
As neuroimaging is a mature area of research, many software tools to analyse and process
neuroimaging data have emerged throughout the past decades.
Partners within the consortium have a strong background in imaging software and in-house
analysis tools play an important role in the implementation of biomarker’s extraction
workflows. Amongst them, NifTK [1] is an in-house medical imaging software suite that
contains numerous command line applications for automated image-processing tasks such as
registration and segmentation.
2.3. IMAGE ANALYSIS WORKFLOW BLOCKS
Most tools from the libraries described in Sec. 2.2 have already their interface integrated in the
Nipype environment. Therefore, all FSL, FreeSurfer, SPM, and ANTS binaries can be used as
building blocks. A large effort has been taking place in order to implement interfaces for
additional in-house tools contained within the NifTK library. Amongst them the following three
interfaces are commonly used in the implementation of the image-based biomarker’s extraction
workflows:
- NiftyReg (http://sourceforge.net/projects/niftyreg) contains programs to perform rigid,
affine and non-linear registration for medical images
- NiftySeg (http://sourceforge.net/projects/niftyseg) is dedicated to EM based
segmentation and label fusion algorithms for medical images
- NiftyFit (closed-source) provides a selection of routines for model fitting to different
types of MRI data, especially used for diffusion tensor fitting
2.4. INCORPORATION INTO VPH-DARE RESEARCH PLATFORM
High throughput analysis of these biomarker pipelines will be achieved by incorporating the
pipelines within the VPH-DARE@IT research platform, resulting in the data, derived results,
and pipelines for all projects to exist in a single location allowing for cross-project queries and
analysis. For the purposes of the biomarker extraction from retrospective database, there was a
large design question. Should the workflows be incorporated into the research platform is
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 8 -
individual atomic units, and then constructed within the research platform? Or should they be
imported as a complete “black box” tested outside of the research platform before use for this
task or by other users? Some users may want to have access to these atomic units in order to
build novel workflows up from scratch on the research platform. For the purpose of biomarker
extraction from retrospective databases, we opted for the latter approach of creating complete
“black box” pipelines for many reasons. First, we believe that the research platform, and
specifically these biomarkers, should be geared towards end users/researchers who are more
clinically oriented and are interested in using or generating established biomarkers over large
studies, rather than technical end-users who want to further develop or optimise pipelines for
their analysis. These technical end users could generate numerous pipelines and values resulting
from these pipelines that are only slightly different, which could lead to confusion from other
users. Second, it is more efficient to test and validate one complete biomarker pipeline off-line
and incorporate into the research platform compared to a incorporating all of its individual
component atomic units. Efficiency is important not just in execution of these pipelines, but
also in design, as this will be on the critical path for biomarker extraction.
The research platform and its components have been described in D7.1 and D7.2. The platform
dedicates appropriate resource from virtual machines to complete a user-specific task. The user
chooses a set of data to perform the task on, and selects a workflow dedicated to the extraction
of a specific biomarker. The key biomarker extraction pipelines and the settings used for testing
are described in the next section. They have been integrated in the virtual machine template in
order for the research platform to easily access them and perform the required task. The outputs
of the pipeline located on the virtual machine are then linked back to the web-based platform
for access and further analysis from the user. A diagram explaining the information flow in the
research platform is presented in Figure 2.
Figure 2: Data and information flow in the research platform. The pipelines are stored in the
platform and are at the disposition of the user. The platform facilitates the application of
validated biomarker extraction workflows on large cohorts for high throughput analysis.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 9 -
The groups at UCL and USFD have been collaborating in order to integrate these imaging
pipelines into the research platform. Initially, a newly implemented pipeline, whether it is a
command-line script, a binary or a Nipype-based pipeline, needs to be installed as a dedicated
“Application” (see D7.3). This results in the creation of a dedicated resource in the platform.
For instance the UCL image-based biomarker’s extraction pipelines’ Application is found at
this URL1. The Application receives a corresponding xml end point to be used later on for
pipeline integration. Further details of the incorporation and execution of these pipelines is
found in the Appendix.
3. PER BIOMARKER PIPELINE IMPLEMENTATION
Below are the key biomarkers that have been tested and provided to the team of WP7 for
integration into the research platform. As the purpose of these biomarkers in previous
deliverables has been discussed, we only provide enough background here to make clear what
optimisation and testing has been done.
3.1. BIAS CORRECTION
Bias correction is one of the most common pre-processing steps performed on structural MR
images. It aims at correcting for low frequency signal variations induced by inhomogeneities
of the static magnetic field of the MRI scanner.
The algorithm used in this pipeline is an improved multi-level version of nonparametric non-
uniform intensity normalization N3 [2]. The algorithm has 4 major input parameters:
- Down sampling level (default: 1): As the bias field corresponds to low frequency signal
variation, it is often sufficient to work on a down sampled version of the input high-
resolution structural T1 image.
- Maximum iteration (default 50): The optimization occurs iteratively until the number
of iterations exceeds the maximum specified by this variable.
- Convergence (default: 0.001): The threshold used to determine convergence (the
standard deviation of the ratio between subsequent field estimates is used.
- FWHM (default: 0.15) The full width at half maximum of the Gaussian used to model
the bias field.
In addition to these inputs, bias correction often has improved performance when the algorithm
is provided with a rough segmentation of the intracranial volume, as including low intensity
voxels in the background causes instability if the algorithm. We achieve this segmentation
automatically using a quick registration between the input T1 and the MNI template, followed
by the resampling of the corresponding MNI mask to the input T1 space. The resulting mask is
used for optimising the bias correction within the embedded volume.
The outputs of the pipeline consist on the bias corrected image, the multiplicative bias field,
and (as an indication) the total intracranial mask used to estimate the bias field.
3.1.1. Evaluation on testing Set
The pipeline has been thoroughly tested against an in-house UCL database that consists of 180
subjects from a large-scale phase III trial in Alzheimer’s disease with data coming from multiple
MR scanners. The results from the bias correction itself is not important; rather, it’s the impact
of the bias correction on the Boundary Shit Integral, a measurement of atrophy between two
scans and often an exploratory endpoint in clinical trials. The results from the new bias
correction were compared to previously calculated values with the N3 algorithm that was used
to compute the BSI endpoint as it was submitted back to the sponsor. The correlation between
the resulting BSI measures is presented in Figure 3 as a Bland-Altman plot. These results show
1 https://dare.vph-share.eu/resources/7a78a3e9-f6a6-4f32-85cd-bea8f10a242d/
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 10 -
that there is no systematic difference when using N4 w.r.t using N3. The multi-resolution aspect
of the N4 implementation explains the apparent variance of these differences.
Figure 3: Correlation of BSI measures when using N3 or N4 for bias correction as pre-
processing, presented as Bland-Altman plots.
The pipeline is multi-threaded; its computation time is correlated to the input image resolution.
The typical time is 10 minutes on a single CPU 2.5GHz Intel core for a 1.1 cubic mm image.
3.2. WHOLE BRAIN PARCELLATION AND TISSUE SEGMENTATION
The primary high throughput pipeline is the complete, automated parcellation of the cortical
and subcortical grey regions from a structural T1 image. In this implementation we use an
algorithm based on a label propagation scheme designated as Geodesic Information Flow (GIF)
[3]. This scheme necessitates the use of a template database where structural images and
corresponding segmented regions are used in a graph environment in order to transport or
propagate labels towards the target image. In this implementation we use a database carefully
constructed using the OASIS (65 subjects) and ADNI data (85 subjects). Unlike other template
libraries, this template contains data from subjects with a wide age range, some of whom are
affected by Alzheimer’s disease. Inclusion of similar subjects often improves the performance
of the parcellation. The template labels are based on the braincolor protocol
(http://braincolor.mindboggle.info/) The resulting segmented regions can be used for volume
based statistical analyses to help characterising biomarkers such as atrophy or other volume
related measures.
The pipeline input is the structural T1 image. First, the scan is linearly registered to the MNI
T1 atlas. The MNI TIV mask is subsequently resampled to the target image, and dilated (10
voxels). The dilated mask is used to crop the input image, and the resulting affine
transformation is composed to each dataset subject’s n order to initialise the non-linear
registrations. The resulting non-linear registrations are performed between each subject of the
database and the input image. To minimise the computation time, the input image is cropped to
reduce the volume to the minimum region of interest. These non-linear registrations constitute
the most computation expensive blocks of the pipeline (90%). For all registration the NiftyReg
(http://sourceforge.net/projects/niftyreg) has been used. Once all registrations are completed,
the final label propagation to the target T1 image is performed using the GIF algorithm [4] .
The components have been assembled using Nipype and the workflow has been integrated in
the platform as a black box. The workflow is described in Figure 4.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 11 -
Figure 4: Brain Parcellation workflow. A database is used to propagate template labels to the
target input structural image
The pipeline outputs consist of several images:
- The cropped and fully bias corrected image
- A tissue segmentation image: a 4D image where in each layer, the voxel signal is the
probability of belonging to respectively the CSF, the cortical grey matter, the white
matter, the deep grey matter, and the brainstem
- The brain parcellation labelled image
The key outputs that will be reused are the tissue segmentation and the parcellation image. The
pipeline has been tested on small databases and experts had evaluated the resulting
segmentations.
3.2.1. Testing Set
The whole brain parcellation pipeline was applied to a subset of 1000 T1 weighted scans from
the Rotterdam Scan Study [5]. Because the data from this study is not publicly available, the
pipeline had to be made compatible with the computing cluster of the Erasmus MC, the owner
of the study data. The compatibility was facilitated by the Nipype implementation of the
parcellation pipeline, and allowed us to compute brain region volumes of 1000 subjects in an
age range of 45 – 92 years. These brain region volumes will be compared to volumes computed
by other brain parcellation pipelines, such as FreeSurfer and in-house developed methods.
Figures below show results of a parcellation and the dynamic distributions of hippocampal and
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 12 -
thalamus volume in the age range of 45 – 92 years. The volumes are given as a percentage of
intracranial volume.
3.3. HIPPOCAMPAL VOLUME PROFILE
There is evidence that different forms of neurodegenerative diseases have different patterns of
atrophy in the temporal lobe in the anterior-posterior gradient [6]. While the overall loss of
volume in specifically vulnerable regions like the hippocampus is informative, more localised
characterisation of the atrophy within the structure could be provide more information in terms
of differential diagnosis. A pipeline dedicated to the generation of this regional analysis from a
segmented hippocampus has been implemented. The components have been assembled using
Nipype and the workflow has been integrated in the platform as a black box. The input of the
pipeline is the segmentation in the form of a binary mask indicating which voxels in the image
are contained within the hippocampus. From this binary mask, a volume profile is created along
Figure 5: Example of whole brain parcellation on one of the 1000 subjects of the RSS.
Figure 6: Top: parcellation showing the hippocampus in red. Bottom: distribution of the
hippocampus volume as percentage of intracranial volume as function of age.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 13 -
the long axis of the hippocampi, running from the anterior head to the posterior tail. Using a
Gaussian kernel density estimator [7]. We obtain a continuous function local volume at any
given point of the hippocampal principal axis. The area below this curve therefore represents
the total hippocampus volume. Classification and statistical analysis along the principal axis
will be performed to determine which areas are the most sensitive in detecting differences
between the two disease groups.
Figure 7: hippocampus segmentations (left) with estimated left and right long axes (middle and
right) superimposed on structural T1 image.
The pipeline takes advantage of the global coordinate system in order to distinguish between
the left and the right hippocampi. From each structure the long axis of the group of voxels is
calculated using principal component analysis. The volume profile is then generated along a
normalised axis coordinate using a kernel density estimation technique (see Figure 8).
Figure 8: (left) Kernel density estimation in 1D (black line) with sparse data points (blue
markers). The blue dots represent the data points and the black continuous line represents the
density estimation. (right) Kernel density estimation is applied along the principal axis of the
hippocampus.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 14 -
Figure 9: Two examples of output profiles from a healthy subject (left) and a patient suffering
from Alzheimer’s disease (right). The graph shows the left (red) and right (green) hippocampal
volume profiles.
Figure 10: Volume Profile Generation workflow. Kernel density estimation is used to estimate
the continuous function from sparse voxel volumes projected along the hippocampus axis.
3.3.1. Evaluation on AD / control testing set
As an example of application, this pipeline has been applied to a small database of 60 subjects,
consisting of 23 controls and 40 patients suffering from Alzheimer’s disease (AD). The results
are shown in Figure 11. They demonstrate a difference in global volume between healthy
controls and AD patients. More interestingly, this difference is more pronounced in the anterior
part of the hippocampus.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 15 -
Figure 11: Hippocampal volume (in TIV percentage) profile on a population of 27 controls
(grey) and 43 AD patients (black) between the anterior and the posterior part.
3.4. DIFFUSION PROCESSING
Diffusion MRI allows for the depiction of white matter microstructure [8]. Changes in the white
matter due to neuronal dysfunction and other disease processes caused by neurodegenerative
dementias might be picked up by this imaging modality. However, this type of imaging is
inherently much lower resolution and noisier than conventional structural MRI. Image quality
problems are further exacerbated by its high sensitivity to physiological motion [9]. This
modality also results in thousands of images generated, and manually identifying issues with
quality in these scans is not feasible. We have provided the VPH-DARE@IT community with
an in-house pipeline dedicated to the pre-processing and processing of diffusion data.
Additionally, this pipeline allows for the automatic detection of major artefacts that can affect
the data. The workflow is described in Figure 12. The input data consists on:
- N Diffusion Weighted Images (DWIs)
- M B=0 non weighted images (B0)
- The T1 structural image
- The magnitude and phase field map images, which are used for correction of echo
planar imaging distortion artefacts (optional)
Due to the duration of scan acquisition (of the order of 5-10 minutes), diffusion MRI is prone
to subject motion. To detect such motion and partially correct for it, a common technique
consists of performing linear registration between the diffusion weighted images towards the
non-weighted one. In the case where M > 1, we first perform a groupwise (rigid) registration
of the B0 images. Due to the difference of signal nature between DWIs and B0s, it is useful to
perform the DWI to B0 registration in the log-space. This registration procedure corrects for
motion-induced and eddy-current affine distortions between DWIs. Another typical artefact is
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 16 -
due to magnetic field inhomogeneities and induces low frequency susceptibility distortions. In
order to correct for this, we use a phase unwrapping technique [10] that calculates the
multiplicative distortion field from magnitude and phase field map images. In the case these
images are not present, a non-linear registration between the B0 and the T1 is performed. Since
the susceptibility distortion is of low frequency, we constrain the registration for smooth
deformations only.
The affine transformations and non-linear distortion fields are composed in order to obtain a
final displacement field for each DWI and B0 image. Each image is then interpolated using the
displacement fields. A (positive) constrained sinc interpolation is used. Note that in this
workflow each image is interpolated only once in order to avoid interpolation induced over-
smoothing. The resulting images and gradients are used to estimate a single tensor model at a
voxel level. For quality control purposes, two separate graphs are calculated. First, from the
DWI to B0 registration we extract the rotation parameters of the transformation in order to
quantify subject motion during scan. Second, normalised cross-correlation (NCC) is computed
between adjacent slices in all motion corrected images, as significant drops in NCC can help
identify signal dropout or motion artefacts that result in banding (zebra artefact) due to the
interleaved acquisition. We summarise the different processing steps on a diagram in Figure
12. The output of the workflow consist of the following:
- The corrected B0-DWI 4D image
- The corrected gradient table
- The estimated tensor map
- Tensor-based scalar maps (FA, MD, AD, RD, RGB)
- The individual DWI to B0 affine transformations
- The subject rotation graph
- The inter-slice cross-correlation graph
Some examples of diffusion processing pipeline outputs are shown in Figure 13, while quality
control graphs are shown in Figure 14.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 17 -
Figure 12: Workflow for diffusion weighted imaging data. The diffusion-weighted images are
linearly registered to the average B0 image. Field maps and T1 images are used to estimate
susceptibility distortion. The resulted corrected images are used for tensor fitting.
Figure 13: The diffusion-processing pipeline outputs maps depicting the white matter
arrangement. (Left) T1 weighted image. (Right) corresponding FA map colour-coded with
tissue orientation.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 18 -
Figure 14: Quality Control graphs for Diffusion MRI. (Top) DWI image showing some
significant signal dropouts. (Bottom) corresponding inter-slice cross-correlation for B0 (red)
and DWI (blue) images, where the problematic volume is automatically detected.
3.4.1. Evaluation on ADNI retrospective cohort
As an example, this workflow has been applied to a large portion of the ADNI retrospective
cohort. The purpose of this study is to demonstrate the capabilities of the pipeline to
automatically detect outliers in a large database. The data consists in 216 subjects, 67 healthy
controls, 91 AD patients and 58 patients of mild cognitive disorder. The diffusion pipeline was
applied and the resulting inter-slice normalised cross-correlation was extracted for each subject.
The results are presented in Figure 15. There are 13 subjects that are highlighted as outliers
with a simple threshold on the cross-correlation. This technique can be used in order to
automatically reject subjects from large cohorts within a statistical study, where the large
number of subjects and images does not allow exhaustive manual quality control. These results
were presented in the Y2 review as a clinical relevance exemplar of biomarker’s extraction
pipeline.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 19 -
Figure 15: Inter-slice cross-correlation graphs on 216 subjects of the ADNI cohort.
Thresholding allows the automatic detection of 13 outliers containing significant signal
dropouts.
4. BIOMARKER EXTRACTION ROADMAP
At the end of Y3, Milestone 33 is scheduled, which is the extraction of biomarkers from the
retrospective studies. This milestone will involve extraction of biomarkers from over 20,000
images, including the Rotterdam study. We will work with all the partners involved in Task 3.4
who will be contributing biomarkers to the milestone, and plan out the processes of finalising
the workflows, setting them up within the research platform, executing the pipeline on
appropriate data from each of the studies, followed by review of the results in order for
completion to be at the milestone date. Progress will be monitored by tracking updates of the
research platform within WP7. The progress will be reported back to all partners through
monthly teleconferences; and a simple spread sheet dashboard. During the teleconferences, we
will discuss any potential problems that could cause deviation from the plan. This exercise will
be supported by WP7, who will assist with the incorporation of workflows into the research
platform. WP5 and WP6 will use these biomarkers and incorporate them into the models
developed in these work packages.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 20 -
5. CONCLUSIONS
This deliverable provides clear guidelines on the implementation of the biomarker extraction
pipelines for the retrospective studies. These pipelines will result in a fully standardised set of
biomarkers in thousands of datasets from multiple retrospective cohorts, which will allow new
research questions to be explored, ones that normally cannot be addressed within any one of
the single disease cohorts.
6. REFERENCES
1. Clarkson, M.J., et al., The NifTK software platform for image-guided interventions:
platform overview and NiftyLink messaging. International journal of computer assisted
radiology and surgery, 2014: p. 1-16.
2. Tustison, N.J., et al., N4ITK: improved N3 bias correction. Medical Imaging, IEEE
Transactions on, 2010. 29(6): p. 1310-1320.
3. Cardoso, M.J., et al., Geodesic information flows, in Medical Image Computing and
Computer-Assisted Intervention–MICCAI 2012. 2012, Springer Berlin Heidelberg. p.
262-270.
4. Cardoso, M.J., et al., Geodesic Information Flows: Spatially-Variant Graphs and Their
Application to Segmentation and Fusion. 2015.
5. Ikram, M.A., et al., The Rotterdam Scan Study: design and update up to 2012. European
journal of epidemiology, 2011. 26(10): p. 811-824.
6. Chan, D., et al., Patterns of temporal lobe atrophy in semantic dementia and
Alzheimer's disease. Annals of neurology, 2001. 49(4): p. 433-442.
7. Turlach, B.A., Bandwidth selection in kernel density estimation: A review. 1993:
Université catholique de Louvain.
8. Westin, C.-F., et al., Processing and visualization for diffusion tensor MRI. Medical
image analysis, 2002. 6(2): p. 93-108.
9. Tournier, J.D., S. Mori, and A. Leemans, Diffusion tensor imaging and beyond. Magn
Reson Med, 2011. 65(6): p. 1532-56.
10. Daga, P., et al., Susceptibility artefact correction using dynamic graph cuts:
Application to neurosurgery. Medical image analysis, 2014. 18(7): p. 1132-1142.
FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015
- 21 -
7. ANNEXES
7.1. WORKFLOW
Here is a simplified step-by-step procedure for the integration and test of a new pipeline in the
platform:
1) Request credentials for the dare portal: https://dare.vph-share.eu.
2) Download and install Taverna 2.5.02, Java 73, the VPH-SHARE Taverna plugin4.
3) From Taverna:
a) Import the VPH-SHARE service using your application end-point.
b) Drag and drop the required service to the design panel (see Figure 16).
c) Provide inputs and outputs relative to the dare portal’s filestore architecture5.
d) Run the workflow from the Taverna menu.
4) From the dare portal:
a) Monitor the status of the pipeline from the workflow dashboard6.
b) Upload the tested workflow in the portal using this help page.
Figure 16: The Taverna Workbench. During the integration process, the user needs to import
the newly created service (top-left panel) into the workbench (main panel) and connect inputs
and outputs of the pipeline (in green) with the DARE portal nodes (in blue). The pipeline can
then be run from the menu.
2 http://www.taverna.org.uk/download/workbench/2-5/core/ 3 http://www.oracle.com/technetwork/java/javase/downloads/index.html 4 http://repository.cistib.org/nexus/content/repositories/releases/ 5 https://portal.vph-share.eu/filestore 6 https://dare.vph-share.eu/applications/#switchToWorkflowsView