Upload
laurentiu-radoi
View
213
Download
0
Embed Size (px)
Citation preview
7/28/2019 ajr%2E10%2E5718
1/3
AJR:196 , June 2011 W781
to automate the entire process or extraction
o CT dose inormation rom CT dose report
images using Microsot Visual Basic or Ap-
plications (VBA). The tool was able to drive
MODI to conduct OCR, to parse raw OCR
results or text extraction, to make error cor-
rections, and to perorm a quality check on
the retrieved data. Multiple CT dose report
images could be batch processed, and all re-
sults could be saved into an Excel spread-
sheet (Microsot) or convenient data analy-
sis. The development was summarized and
urther illustrated with a case study, which
included a large number o patient examina-
tions rom a Siemens Healthcare CT scanner.
Materials and Methods
Figure 1 shows a owchart o text recognition
o CT dose report images and radiation dose in-
ormationextraction. It has the ollowing steps:1. Conduct OCR. Based on patient dose report
images routed to a hard drive, each image was
processed by the OCR engine. Results were as-
signed to a string, i.e., a sequence o characters.
2. Parse. Sample cases were manually reviewed to
identiy the OCR output ormats and error pat-terns in character recognition. Locations o a se-
ries o keywords were searched in the OCR out-
put string. They were used to identiy the output
ormat, scanning series, and starting positions
o the items o interest, such as total mAs and
total DLP. The starting positions and the speci-
fed character lengths were used to extract cor-
responding items. The examination date was
retrieved using a date ormat pattern.
3. Perorm error correction. The OCR outputs
Automated Extraction of RadiationDose Information From CT DoseReport Images
Xinhua Li1
Da Zhang
Bob Liu
Li X, Zhang D, Liu B
1All authors: Department o Radiology, Division o
Diagnostic Imaging Physics, Massachusetts GeneralHospital, Harvard Medical School, 55 Fruit St, Boston,
MA 02114. Address correspondence to B. Liu
Medical Physics and Informatics Technical Innovation
WEB
This is a Web exclusive article.
AJR2011; 196:W781W783
0361803X/11/1966W781
American Roentgen Ray Society
Exposure to radiation in CT exam-
inations has become a topic o
high interest in the past ew
years [13]. To assess the associ-
ated radiation risks and to evaluate CT ex-
amination protocols or adopting CT dose re-
duction strategies, analyses o CT dose data
are oten required or large numbers o pa-
tients in many clinical and research projects.
Quantities o interest include the CT dose in-
dex (CTDI) or scanning protocol evaluation
and dose-length product (DLP) or eective
dose estimation. Historically, these data have
not been stored in machine-editable ormats
but in CT dose report images. These studies
are labor intensive and error prone because
o the need or manual review o many im-
ages. In this article, we report the develop-
ment o an automated tool or extracting CT
examination inormation rom the CT dose
report images.
Optical character recognition (OCR) is a
technique that is used to translate scanned
images o text into computer-editable and
searchable text. Many o the currently avail-
able OCR packages do not perorm well withtext in the small onts usually ound in CT
dose report images, such as those rom Sie-
mens Healthcare and GE Healthcare. Ater a
trial-and-error process among dierent OCR
packages, the OCR engine in the Microsot
Ofce Document Imaging (MODI) library
[4], which showed relatively superior peror-
mance with the small-ont text in CT dose re-
port images, was chosen to perorm text rec-
ognition. In this work, we developed a tool
Keywords: CT dose image, optical character recognition,
text ext raction
DOI:10.2214/AJR.10.5718
Received September 3, 2010; accepted ater revision
November 9, 2010.
OBJECTIVE. The purpose o this article is to describe the development o an automated
tool or retrieving texts rom CT dose report images.
CONCLUSION. Optical character recognition was adopted to perorm text recognitions
o CT dose report images. The developed tool is able to automate the process o analyzing
multiple CT examinations, including text recognition, parsing, error correction, and export-
ing data to spreadsheets. The results were precise or total dose-length product (DLP) and
were about 95% accurate or CT dose index and DLP o scanned series.
Li et al.Radiation Dose Inormation
Medical Physics and InormaticsTechnical Innovation
7/28/2019 ajr%2E10%2E5718
2/3
W78 2 AJR:196, June 2011
Li et al.
were not 100% accurate but contained errors.
Thereore, corrections were necessary or high-
er accuracy. They were perormed on the basis
o error patterns identifed in step 2.
4. Assign a quality ag (iFlag). The results o the
previous step were checked or consistency.
When total DLP was identifed in a CT exami-
nation, it should be equal to a sum o DLP values
o all series. Otherwise, iFlag with a value o 1
was assigned to the examination. Moreover, the
mAs value o a series should not be larger than
the total mAs identifed or the entire examina-
tion. Otherwise, iFlag with a value o 2 was as-
signed. These agged cases were a small per-centage o the total and could be corrected either
by manually checking the original images or by
additional programming eort.
5. Export. Results o text extraction were exported
to Excel.
A Visual Basic macro was written to implement
all these steps. Pseudocode shown in Appendix 1
illustrates the key implementations o the macro.
The development o this automated tool was
urther detailed in a case study with a Siemens
Healthcare CT scanner. Figure 2A shows a CT
dose report image rom a Somatom Defnition
scanner (Siemens Healthcare). It contains the ex-
amination date, total mAs, and total DLP. Eachseries had the ollowing items: description, se-
ries number, kV, time per rotation, and collimated
slice. Scanning series other than topogram also
contained mAs, volume CTDI, and DLP. Reer-
ence mAs might also be present in some series.
Descriptions o the series included topogram,
CaScSeq, TestBolus, and CorCTA. The
OCR output o the CT dose report image is shown
in Figure 2B, in which errors are underlined.
These results were moderately accurate or the CT
dose report image but contained errors with pat-
terns that could be summarized as ollows: Wrong
characters, such as l being recognized as I and
kV being recognized as IN; split texts, such
as 100 being recognized as 1 00 in the line
o TestBolus and 12D being recognized as 1
2D in the line o CorCTA. (From other cases,
we ound more descriptions [Calcium Score,
DS CorCTA, DELAY, and so on]) and addi-
tional error patterns; and wrong characters, such
as 17 being recognized as i7, 7D as ID,
and 1D as ID. Series data were reormatted
or about 2.5% o all images. We also noticed that
total DLP was not present in about 60% o the Sie-
mens Healthcare CT dose report images.
The OCR output ormat was identifed by the
location o keywords in the string. When series
data were not reormatted, the record o a series
was identifed as text between two descriptions orater the last description in the OCR output. Series
number, kV, and mAs were determined sequen-
tially; collimated slice, time per rotation, DLP,
and volume CTDI were located reversely. In this
approach, reerence mAs was not misidentifed as
another parameter i it was present in a series. For
the cases in which OCR output ormats were di-
erent rom original images, the parameters o a
series were identifed according to the reormat-
ted patterns o the OCR results. Wrong charac-
ters were corrected according to error patterns
that were recognized on the basis o the analysis
o sample cases in the parsing step. Split texts in
a series were identifed and then corrected on thebasis o the ranges o parameters, e.g., 80140 or
kV, and the lengths o numeric expressions. These
corrections greatly improved the accuracy o dose
inormation extraction.
Results
Figure 3 shows a screen shot o the CT
dose inormation extracted rom the CT dose
report image shown in Figure 2. All impor-
tant parameters o the CT examination were
exported to an Excel spreadsheet. The entry
o iFlag was not flled because the retrieved
DLP and mAs o this case were accurate. The
developed tool processed multiple CT dose re-
port images in sequence and saved all results
into a single spreadsheet. About 1000 CT
dose report images rom the Siemens Health-
care CT scanner were analyzed. The results
o total DLP had an accuracy o 100%. The
values o CTDI and DLP o the scanning se-
ries were about 95% accurate because o the
OCR split-text errors. They could be urther
improved by additional eort in case review
and programming.
Discussion
This ar ticle has described a method or de-
veloping automated extraction o radiation
dose inormation rom CT dose record im-ages. Among the OCR packages currently
available or conducting text recognition
rom images, the OCR engine in the MODI
library is capable o text recognition rom
the CT dose report images. The library is in-
cluded in the Microsot Ofce suite. The raw
OCR outputs are moderately accurate but
contain errors, such as split texts and wrong
characters. Sample cases can be manually re-
viewed to identiy the OCR output ormats
and error patterns in character recognition.
On the basis o this review, robust algorithms
can then be developed to extract texts and
conduct error correction. VBA was chosenbecause it is ully integrated in the Microsot
Conduct OCR
Parse
Make error correction
Assign quality flag
Export
Fig. 1Flowchart shows radiation dose inormationextraction rom CT dose report images. OCR = opticalcharacter recognition.
Fig. 3Screen shot shows CT dose inormationextracted rom CT dose report image in Figure 2A.
A
Fig. 2CT dose report image rom Somatom Defnition scanner (Siemens Healthcare).A and B, Image (A) and optical character recognition(OCR)-generated (B) texts or dose report image.Text recognition errors are underlined in B. CDTIvol = volume CT dose index, DLP = dose-length product,TI = time per rotat ion, cSL = collimated slice.
B
7/28/2019 ajr%2E10%2E5718
3/3
AJR:196 , June 2011 W78 3
Radiation Dose Information
Ofce amily and is convenient or the pur-
pose o automation. In a case study, the de-
veloped Visual Basic macro was able to drive
the MODI to perorm OCR, process the
OCR outputs or text extraction, and perorm
error correction. Multiple CT dose report im-
ages were batch processed, and all results
were saved into an Excel spreadsheet orconvenient data analysis. The tool can be
easily adapted or other CT examinations
and or dierent CT scanners by ollowing
the approach described in this article. CT
dose data analysis on a large patient popula-
tion can thereore be greatly eased.
References
1. Brenner DJ, Hall EJ. Computed tomography: an
increasing source o radiation exposure.N Engl J
Med2007; 357:22772284
2. National Council on Radiation Protection and
Measurements.Ionizing radiation exposure of the
population of the United States: 20 06. Bethesda,
MD: National Council on Radiation Protection
and Measurements, 2009
3. Smith-Bindman R, Lipson J, Marcus R, et al. Ra-
diation dose associated with common computed
tomography examinations and the associated lie-
time attributable risk o cancer. Arch Intern Med
2009; 169:20782086
4. Microsot Website. Microsot Ofce Document
Imaging Visual Basic Reerence. msdn.microsot.
com/en-us/library/aa279424(v=ofce.11).aspx.
Accessed August 30, 2010
APPENDIX 1: Pseudocode of the Microsoft Visual Basic Macro
Loop through all image fles
For each Image in a Folder
Create an object o a specifed type
Set MIDoc = CreateObject(MODI.document)
Create a new document
MIDoc.Create Image Perorm optical character recognition on the entire image
MIDoc.Images(0).OCR
Save the OCR results into a string
OcrText = MIDoc.Images(0).Layout.Text
MIDoc.Close
Search the frst occurrence o an item by some characters
ItemLocation=InStr(StartingPositionForSearch,OcrText,ItemKeyword)
Retrieve a specifed number o characters in a string
ItemContent=mid(OcrText,ItemLocation,ItemLength)
Conduct direct correction should a specifed error occur
ExamDate = Replace(ExamDate,0ct,Oct)
Detect additional errors and make corrections i necessary
I statements and/or loop structures or checking errors and/or perorming corrections
Write a result to a cell in an Excel spreadsheetWorkSheets.Range($B$ & CStr(Line)).Value=ItemContent
...
Next