Click here to load reader
Upload
aysu-betin
View
216
Download
1
Embed Size (px)
Citation preview
978-1-4244-5969-8/10/$26.00 ©2009 IEEEApril 20-22, 2010Antalya, Turkey21
TOWARD EFFECTIVE MEDICAL SEARCH ENGINES
Mohammed AL Zamil, Aysu Betin Can
Informatics Institute, Middle East Technical University
İnönü Buvari, 06531, Ankara/Turkey
phone: + (+90) 5388506808, email: [email protected]
ABSTRACT
In this paper, we present a domain specific search engine
that relies on extracting the semantic relation among medical
documents. Our goal is to maximize the contextual retrieval
and ranking performance with minimum input from users.
We have performed experiments to measure the effective-
ness of the proposed technique by evaluating the perform-
ance of the retrieval process in terms of recall, precision and
topical ranking. The results indicated that the proposed
medical search engine achieved higher average precision in
compare with highest scored runs submitted to TREC-9.
1. INTRODUCTION
Many medical information retrieval systems restrict their
users to be medical experts. Currently, ordinary people tend
to be more informed to participate in the decision process for
their health problems. For this purpose, such non-medical
users search the Internet and try to dig information out. This
situation leads to a growing number of medical inquires on
the Internet by users with no medical training [1].
Traditional information retrieval features, such as terms and
phrases, have been widely used to find the similarity between
documents and queries. Recent research [2] shows that the
application of combined semantic features improves the re-
trieval process in domain-specific information retrieval sys-
tems.
In this research, we are proposing an effective search en-
gine that creates document networks which correlates medi-
cal documents according to the semantic relations among
them. Through this document network we address problems
faced by non-medical users such as: shrinking hit-list with
maximum number of relevant documents and ranking docu-
ments according to their topic. We have performed experi-
ments to measure the effectiveness of tour search engine by
evaluating the performance of the retrieval process in terms
of recall, precision and topical ranking.
2. RELATED WORK
Modelling the similarity among text entities is a potential
area that can affect the overall performance of information
retrieval systems. Rather than focusing on algorithm per-
formance, recent research concentrate on representing infor-
mation using different features in order to improve the re-
trieval process [8, 9].
In the literature, the common approach is to represent
documents as a collection of all individual-terms, often re-
ferred to as bag-of-words representation. Many studies [10,
11] shows that using sophisticated feature representation
does not improve the effectiveness of retrieval process of
general purpose retrieval systems but provide a significant
enhancement on domain specific text.
In medical domain, experiments in [12, 13] showed that
using medical terms and medical phrases resulted in better
information retrieval performance in compare with tradi-
tional bag-of-words. In this research, we use a medical terms
and their relationships defined in UMLS [20]. In addition, we
propose the application of more than one domain-feature as a
similarity measurement.
Although medical websites and portals such as SNOMED
[14], OMNI [15] and MedHunt [16] offer a useful search
place for medical information, these tools do not provide
consistent response for related medical topics. For instance,
OMNI distinguishes between the query “Breast cancer” and
“Carcinoma of Breast” while these queries are synonyms to
each other in medical context. In contrast, our tool relies on
semantics-enrichment, that enhance the retrieval and the
ranking tasks [17-18], to extract such relations which reduce
searching biases.
3. SYSTEM DESIGN
In this section, we provide description to the basic modules
of the system. In particular, the design consists of 7 modules
that are responsible for receiving user-query as input, per-
form text-operations, expanding the query with medical re-
lated concepts, searching the inverted file for relevant docu-
ments, and ranking the hit-list.
22
Figure 1- System Design
3.1 TEXT OPERATIONS
In order to reduce the set of representative words, the system
eliminates stop-words using Princeton English stop-word list
[19]. Furthermore, the stemming module is used to reduce
distinct words to their common grammatical root. Notice
that, the inverted-index has been cleaned from stop-words
during construction and contains terms and their stems.
3.2 QUERY EXPANSION
To facilitate the searching process to non-medical users, the
query expansion module is responsible for automatic refor-
mulation and expansion of a query with domain concepts.
The goal is to enable the search engine to retrieve relevant
documents with a little information provided by a user and
minimize the effect of language ambiguity encountered by
non-medical users.
3.3 CONCEPT GENERATOR
Concept generator module consists of a set of functions that
contact UMLS (Unified Modelling Language System) data-
base [20] in order to retrieve specific information about these
terms such as: type of terms (medical or non-medical) ,
synonyms, contextually related terms, or partially related
terms. The following table shows the concepts relations re-
trieved from UMLS relevant to the query “breast cancer”.
Table 1-Concept Relation list for query term "breast
cancer"
Concepts obtained from UMLS Relation
Breast Carcinoma SYN
Cancer of Breast SYN
Mammary Carcinoma SYN
Carcinoma of Breast SYN
Malignant Neoplasm of Breast PAR
Malignant Tumor of Breast CTX
3.4 SEARCHING MODULE
The searching for relevant documents is performed on a col-
lection of medical documents that have been harvested by a
topical crawler. The purpose is to maximize the quality of
retrieved hit-list. The module receives an expanded query
and, using the following formula, computes the similarity
between documents in the collection and the input query.
2),(tan
)()(),(),(
ji
ji
jijiddceDis
dMassdMassddGddSimilarity
×
×=
Given two documents di and dj, the similarity between these
documents is calculated similar to the magnitude of the
23
gravitational force among two bodies. The idea has been
inspired from the definition of gravitational force among
bodies [3].
In this formula, G refers to degree of attractiveness which
measures the distance of two documents based on the medi-
cal terms in their title and anchors only; Mass refers to im-
portance of the document in the collection; Distance refers to
cosine value between the feature vectors of the documents
where the feature vector of a document holds the frequencies
of the medical terms appearing in the text. The mass of a
document is a scalar which is the ratio of the sum of medical
term frequencies to the frequencies of all terms in the docu-
ment.1 Through this feature we give professional documents
or documents with frequent medical terms higher score than
non-professional ones or documents with few medical terms.
3.5 DOCUMENT NETWORK AND RANKING
A document network is a network where the members are
documents, and the links represent similarity among them.
The system computes the similarity between retrieved docu-
ments with respect to user-query, and creates a network of
links between the documents, which are based on the similar-
ity scores. The similarity between a query q and a document
di is defined in the same manner as the similarity between
two documents.
2),(tan
)()(),(),(
qdceDis
qMassiMassqdGqdSimilarity
i
ii
××=
The system, then, ranks retrieved document according to
their position in the network. Figure 2 shows part of docu-
ment network created during experiments in which circles
represent documents while the squares refer to the similarity
percentage among different documents
1 Unlike weirdness factor defined in [21], which is directed to meas-
ure the differences in the distribution of a specific term in domain-
specific and generic text, document’s mass measures the weight of
domain-terms in a specific document; not how the distribution of
these terms affect the term-weighting.
Figure 2- Document Network Created
4. RETRIEVAL PERFORMANCE
In TREC-9 [6], 53-runs have been submitted to provide an
evaluation datasets on OHSUMED collection for different
retrieval methods over 63-queries. In this context, we provide
the results reported by 5-runs, which represent the top runs
reported by TREC-9 participants. In addition, we provide the
resulted evaluation of our search engine on the same dataset.
Table 2 shows the retrieval performance in terms of preci-
sion metric. The table shows the average precision of every
run and the precision at N retrieved documents. Notice that,
we obtained the results in Table 1 from running trec_eval
program on the datasets listed in the restricted area of TREC-
9 server. In this table the last row which is marked as MIR
shows the performance of our tool.
24
Table 2- Retrieval Performance on TREC filtering Track (Top 5-runs)
Run ID Average
Precision
P@5 P@10 P@15 P@500 P@1000
CMUDIR 0.2016 1 0.8 0.8667 0.5180 0.5010
Mer9 0.2131 0.6 0.8 0.6 .33 .275
Ok9 0.3538 1 1 0.9333 0.5940 0.5140
KUN 0.3640 1 1 0.9333 0.7140 0.5960
S2RN 0.4629 1 1 1 0.6560 0.5610
MIR 0.5772 1.000 0.800 0.800 0.750 0.690
The results in Table 2 show that MIR outperforms the other
runs since it satisfied higher average precision than the oth-
ers. Furthermore, the average precision at large N number of
documents (i.e. P@500 and P@1000) is higher than other
runs. The high average precision of our search engine re-
sulted from applying the semantic features of detecting an-
chor-terms. This feature guarantee detecting relevant docu-
ments even if the query terms are infrequently appear in the
text; the degree of attractiveness parameter and the frequency
of index terms are independent.
The precision at N number of documents (P@N) indicates
the ability of our tool in ranking relevant documents in the
top N-hits. We relatively achieved high precision at small and
large N. The ranking technique, which relies on ranking
documents according to their topics through the implementa-
tion of the document network, plays a significant role to
achieve this result.
R-precision is a single value summary of the ranking by
computing the precision at the R-th position in the ranking,
where R is the total number of relevant documents for the
current query [4]. Table 3 shows the overall average R-
precision reported by trec_eval program.
Table 3-R-Precision value
Run ID R-Precision
pircT9U2 0.2544
KUNa2T9U 0.2887
KUNb 0.2712
Mer9r1 0.2228
KUNr2 0.3477
S2RNr2 0.4039
MIR 0.5874
The results shown in Table 3 assist our previous conclusion
about the ranking task. MIR achieved high R-precision in
compare with other runs. The attractiveness and documents
mass parameters guarantee assigning higher rank to relevant
documents.
Finally, Figure 3 shows the recall/precision curve of all
runs. Our technique achieves higher precision among all
standard recall levels (11-levels). This result implies that our
tool performs better than other runs at different recall levels.
Recall-Precision
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Precis
ion
CMUDIR
mer9
ok9
KUN
S2RN
MIR
MIR
S2RN
ok9KUN
mer9CMUDR
Figure 3-Recall-Precision Curve
25
5. DISCUSSION
Our experimental results showed that our technique provides
effective performance in terms of recall, precision and topical
ranking. The results demonstrate that we achieved high aver-
age precision and high precision at different recall levels. The
following table shows the improvement we achieved over
different runs in TREC-9.
Table 4- Improvement Achieved by the proposed search engine over top TREC-runs
Run ID Average
Precision
Improvement R-Precision Improvement
CMUDIR 0.2016 + 37.56% 0.2544 + 33.30%
Mer9 0.2131 + 36.41% 0.2887 + 29.87%
Ok9 0.3538 + 22.34% 0.2712 + 31.62%
KUN 0.3640 + 21.32% 0.2228 + 36.46%
S2RN 0.4629 + 11.43% 0.3477 + 23.97%
There are some limitations that need to be considered in
this context:
1. The semantic parameters in the similarity formulation
are restricted to include title and address. We believe that
including more medical-specialized parameters such as
medical grammars will increase the performance and the
effectiveness of our technique.
2. During experiments, we have observed that we perform
better with long-text in terms of precision and ranking.
Although our technique shows good results with short-
text collection (OHSUMED), we believe that the tool
can demonstrate better in terms of precision.
In the future, we intended to develop a technique to track-
ing user behaviour. This technique will benefit from the
structure of document network created in this work.
Moreover, in the future we plan to measure, similar to [7],
the impact of cognitive biases on the searching task and rele-
vance rankings. Debiasing strategies, such as question-
answering user interface, might applied to reduce such biases
which, in turn, enhance the overall performance of the pro-
posed system.
6. CONCLUSION
In this study, we have developed a search engine, which is
based on creating a document semantic network, for retriev-
ing medical and health information for both medical and
non-medical users. We measured the effectiveness of the
technique in terms of recall and ranking.
In our experiment on TREC collection, we achieved higher
average precision and R-Precision compared with the top 5-
runs in TREC-9. Indeed, the proposed model is built on the
top of vector model as it represents documents using vectors
but it includes more semantic features directed to medical
domain. These features are evaluated with the assistant of
medical domain semantic relations.
These results indicate that the proposed technique is effective
and a good alternative to classicaltechniques in retrieving and
ranking medical and health information.
REFERENCES
1. Aysu Betin Can, Nazife Baykal. MedicoPort: A medical
search engine for all. Computer methods and programs
in biomedicine (2007) (86) 73–86
2. Meliha Yetisgen-Yildiz and Wanda Pratt. The Effect of
Feature Representation on MEDLINE Document Classi-
fication. AMIA 2005 Symposium Proceedings Page
3. Isaac Newton, editor. Philosophiæ Naturalis Principia
Mathematica (1687).
4. Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern
Information Retrieval. ACM press, New York, 1999.
5. Khurshid Ahmad, Lee Gillam, and Lena Tostevin.
Weirdness Indexing for Logical Document Extrapolation
and Retrieval (WILDER).
6. Robertson S and Hull DA (2000) The TREC-9 filtering
track final report. In: Voorhees EM and Harman DK,
Eds., Proceedings of the Ninth Text REtrieval Confer-
ence (TREC-9). Department of Commerce, National In-
stitute of Standards and Technology, pp. 25-40.
7. ANNIE Y.S. LAU, PHD, ENRICO W. COIERA,
MBBS, PHD. Can Cognitive Biases during Consumer
Health Information Searches Be Reduced to Improve
Decision Making? J Am Med Inform Assoc.
(2009)16:54–65. DOI 10.1197/jamia.M2557.
8. Xiubo Geng, Tie-Yan Liu , Tao Qin, and Hang Li.
Feature selection for ranking. Proceedings of the 30th
annual international ACM SIGIR conference on Re-
26
search and development in information retrieval. (2007)
407-441.
9. Monica Chagoyen, Pedro Carmona-Saez, Hagit Shatkay,
Jose M Caraz and Alberto Pascual-Montano. Discover-
ing semantic features in the literature: a foundation for
building functional associations. BMC Bioinformatics
2006, 7:41 doi:10.1186/1471-2105-7-41.
10. Dumais, S. T., Platt, J., Heckerman, D., and Sahami, M.
Inductive learning algorithms for text categorization. In
Proceedings of CIKM. 1998. Bethesda, MD.
11. Lewis, D. D. An Evaluation of Phrasal and Clustered
Representations on a Text Categorization Task. In Pro-
ceedings of SIGIR. 1992.
12. Mao, W., and Chu., W.W. Free-text Medical Document
Retrieval via Phrase-based Vector Space Model. In Pro-
ceedings of AMIA. 2002.
13. Boyack, Kevin W., Mane, Ketan and Börner, Katy, edi-
tors. Mapping Medline Papers, Genes, and Proteins Re-
lated to Melanoma Research. IV2004 Conference, Lon-
don, UK (2004) 965-971.
14. Alan L. Rector and Sebastian Brandt. Why Do It the
Hard Way? The Case for an Expressive Description
Logic for SNOMED. J Am Med Inform Assoc. (2008)
15:744–751. DOI 10.1197/jamia.M2797.
15. OMNI Medical Search. Available from:
www.omnimedicalsearch.com
16. Health On the Net foundation. Available from:
http://www.hon.ch/MedHunt/
17. Yongjing Lin, Wenyuan Li, Keke Chen, Ying Liu. A
Document Clustering and Ranking System for Exploring
MEDLINE Citations. J Am Med Inform Assoc.
(2007)14:651– 661. DOI 10.1197/jamia.M2215.
18. Zhiyong Lu, Won Kim, W. John Wilbur. Evaluating
Relevance Ranking Strategies for MEDLINE Retrieval.
J Am Med Inform Assoc. (2009)16:32–36. DOI
10.1197/jamia.M2935.
19. English Stopword List. Available at:
ftp://ftp.cs.cornell.edu/pub/smart/english.stop
20. National Library of Medicine. Unified Medical Lan-
guage System Fact Sheet. Available at:
http://www.nlm.nih.gov/pubs/factsheets/umls.html
21. Khurshid Ahmad, Lee Gillam, and Lena Tostevin.
Weirdness Indexing for Logical Document Extrapolation
and Retrieval (WILDER).