[IEEE 2010 5th International Symposium on Health Informatics and Bioinformatics - Ankara, Turkey (2010.04.20-2010.04.22)] 2010 5th International Symposium on Health Informatics and

978-1-4244-5969-8/10/$26.00 ©2009 IEEEApril 20-22, 2010Antalya, Turkey21

TOWARD EFFECTIVE MEDICAL SEARCH ENGINES

Mohammed AL Zamil, Aysu Betin Can

Informatics Institute, Middle East Technical University

İnönü Buvari, 06531, Ankara/Turkey

phone: + (+90) 5388506808, email: [email protected]

ABSTRACT

In this paper, we present a domain specific search engine

that relies on extracting the semantic relation among medical

documents. Our goal is to maximize the contextual retrieval

and ranking performance with minimum input from users.

We have performed experiments to measure the effective-

ness of the proposed technique by evaluating the perform-

ance of the retrieval process in terms of recall, precision and

topical ranking. The results indicated that the proposed

medical search engine achieved higher average precision in

compare with highest scored runs submitted to TREC-9.

1. INTRODUCTION

Many medical information retrieval systems restrict their

users to be medical experts. Currently, ordinary people tend

to be more informed to participate in the decision process for

their health problems. For this purpose, such non-medical

users search the Internet and try to dig information out. This

situation leads to a growing number of medical inquires on

the Internet by users with no medical training [1].

Traditional information retrieval features, such as terms and

phrases, have been widely used to find the similarity between

documents and queries. Recent research [2] shows that the

application of combined semantic features improves the re-

trieval process in domain-specific information retrieval sys-

tems.

In this research, we are proposing an effective search en-

gine that creates document networks which correlates medi-

cal documents according to the semantic relations among

them. Through this document network we address problems

faced by non-medical users such as: shrinking hit-list with

maximum number of relevant documents and ranking docu-

ments according to their topic. We have performed experi-

ments to measure the effectiveness of tour search engine by

evaluating the performance of the retrieval process in terms

of recall, precision and topical ranking.

2. RELATED WORK

Modelling the similarity among text entities is a potential

area that can affect the overall performance of information

retrieval systems. Rather than focusing on algorithm per-

formance, recent research concentrate on representing infor-

mation using different features in order to improve the re-

trieval process [8, 9].

In the literature, the common approach is to represent

documents as a collection of all individual-terms, often re-

ferred to as bag-of-words representation. Many studies [10,

11] shows that using sophisticated feature representation

does not improve the effectiveness of retrieval process of

general purpose retrieval systems but provide a significant

enhancement on domain specific text.

In medical domain, experiments in [12, 13] showed that

using medical terms and medical phrases resulted in better

information retrieval performance in compare with tradi-

tional bag-of-words. In this research, we use a medical terms

and their relationships defined in UMLS [20]. In addition, we

propose the application of more than one domain-feature as a

similarity measurement.

Although medical websites and portals such as SNOMED

[14], OMNI [15] and MedHunt [16] offer a useful search

place for medical information, these tools do not provide

consistent response for related medical topics. For instance,

OMNI distinguishes between the query “Breast cancer” and

“Carcinoma of Breast” while these queries are synonyms to

each other in medical context. In contrast, our tool relies on

semantics-enrichment, that enhance the retrieval and the

ranking tasks [17-18], to extract such relations which reduce

searching biases.

3. SYSTEM DESIGN

In this section, we provide description to the basic modules

of the system. In particular, the design consists of 7 modules

that are responsible for receiving user-query as input, per-

form text-operations, expanding the query with medical re-

lated concepts, searching the inverted file for relevant docu-

ments, and ranking the hit-list.

22

Figure 1- System Design

3.1 TEXT OPERATIONS

In order to reduce the set of representative words, the system

eliminates stop-words using Princeton English stop-word list

[19]. Furthermore, the stemming module is used to reduce

distinct words to their common grammatical root. Notice

that, the inverted-index has been cleaned from stop-words

during construction and contains terms and their stems.

3.2 QUERY EXPANSION

To facilitate the searching process to non-medical users, the

query expansion module is responsible for automatic refor-

mulation and expansion of a query with domain concepts.

The goal is to enable the search engine to retrieve relevant

documents with a little information provided by a user and

minimize the effect of language ambiguity encountered by

non-medical users.

3.3 CONCEPT GENERATOR

Concept generator module consists of a set of functions that

contact UMLS (Unified Modelling Language System) data-

base [20] in order to retrieve specific information about these

terms such as: type of terms (medical or non-medical) ,

synonyms, contextually related terms, or partially related

terms. The following table shows the concepts relations re-

trieved from UMLS relevant to the query “breast cancer”.

Table 1-Concept Relation list for query term "breast

cancer"

Concepts obtained from UMLS Relation

Breast Carcinoma SYN

Cancer of Breast SYN

Mammary Carcinoma SYN

Carcinoma of Breast SYN

Malignant Neoplasm of Breast PAR

Malignant Tumor of Breast CTX

3.4 SEARCHING MODULE

The searching for relevant documents is performed on a col-

lection of medical documents that have been harvested by a

topical crawler. The purpose is to maximize the quality of

retrieved hit-list. The module receives an expanded query

and, using the following formula, computes the similarity

between documents in the collection and the input query.

2),(tan

)()(),(),(

ji

ji

jijiddceDis

dMassdMassddGddSimilarity

×

×=

Given two documents di and dj, the similarity between these

documents is calculated similar to the magnitude of the

23

gravitational force among two bodies. The idea has been

inspired from the definition of gravitational force among

bodies [3].

In this formula, G refers to degree of attractiveness which

measures the distance of two documents based on the medi-

cal terms in their title and anchors only; Mass refers to im-

portance of the document in the collection; Distance refers to

cosine value between the feature vectors of the documents

where the feature vector of a document holds the frequencies

of the medical terms appearing in the text. The mass of a

document is a scalar which is the ratio of the sum of medical

term frequencies to the frequencies of all terms in the docu-

ment.1 Through this feature we give professional documents

or documents with frequent medical terms higher score than

non-professional ones or documents with few medical terms.

3.5 DOCUMENT NETWORK AND RANKING

A document network is a network where the members are

documents, and the links represent similarity among them.

The system computes the similarity between retrieved docu-

ments with respect to user-query, and creates a network of

links between the documents, which are based on the similar-

ity scores. The similarity between a query q and a document

di is defined in the same manner as the similarity between

two documents.

2),(tan

)()(),(),(

qdceDis

qMassiMassqdGqdSimilarity

i

ii

××=

The system, then, ranks retrieved document according to

their position in the network. Figure 2 shows part of docu-

ment network created during experiments in which circles

represent documents while the squares refer to the similarity

percentage among different documents

1 Unlike weirdness factor defined in [21], which is directed to meas-

ure the differences in the distribution of a specific term in domain-

specific and generic text, document’s mass measures the weight of

domain-terms in a specific document; not how the distribution of

these terms affect the term-weighting.

Figure 2- Document Network Created

4. RETRIEVAL PERFORMANCE

In TREC-9 [6], 53-runs have been submitted to provide an

evaluation datasets on OHSUMED collection for different

retrieval methods over 63-queries. In this context, we provide

the results reported by 5-runs, which represent the top runs

reported by TREC-9 participants. In addition, we provide the

resulted evaluation of our search engine on the same dataset.

Table 2 shows the retrieval performance in terms of preci-

sion metric. The table shows the average precision of every

run and the precision at N retrieved documents. Notice that,

we obtained the results in Table 1 from running trec_eval

program on the datasets listed in the restricted area of TREC-

9 server. In this table the last row which is marked as MIR

shows the performance of our tool.

24

Table 2- Retrieval Performance on TREC filtering Track (Top 5-runs)

Run ID Average

Precision

P@5 P@10 P@15 P@500 P@1000

CMUDIR 0.2016 1 0.8 0.8667 0.5180 0.5010

Mer9 0.2131 0.6 0.8 0.6 .33 .275

Ok9 0.3538 1 1 0.9333 0.5940 0.5140

KUN 0.3640 1 1 0.9333 0.7140 0.5960

S2RN 0.4629 1 1 1 0.6560 0.5610

MIR 0.5772 1.000 0.800 0.800 0.750 0.690

The results in Table 2 show that MIR outperforms the other

runs since it satisfied higher average precision than the oth-

ers. Furthermore, the average precision at large N number of

documents (i.e. P@500 and P@1000) is higher than other

runs. The high average precision of our search engine re-

sulted from applying the semantic features of detecting an-

chor-terms. This feature guarantee detecting relevant docu-

ments even if the query terms are infrequently appear in the

text; the degree of attractiveness parameter and the frequency

of index terms are independent.

The precision at N number of documents (P@N) indicates

the ability of our tool in ranking relevant documents in the

top N-hits. We relatively achieved high precision at small and

large N. The ranking technique, which relies on ranking

documents according to their topics through the implementa-

tion of the document network, plays a significant role to

achieve this result.

R-precision is a single value summary of the ranking by

computing the precision at the R-th position in the ranking,

where R is the total number of relevant documents for the

current query [4]. Table 3 shows the overall average R-

precision reported by trec_eval program.

Table 3-R-Precision value

Run ID R-Precision

pircT9U2 0.2544

KUNa2T9U 0.2887

KUNb 0.2712

Mer9r1 0.2228

KUNr2 0.3477

S2RNr2 0.4039

MIR 0.5874

The results shown in Table 3 assist our previous conclusion

about the ranking task. MIR achieved high R-precision in

compare with other runs. The attractiveness and documents

mass parameters guarantee assigning higher rank to relevant

documents.

Finally, Figure 3 shows the recall/precision curve of all

runs. Our technique achieves higher precision among all

standard recall levels (11-levels). This result implies that our

tool performs better than other runs at different recall levels.

Recall-Precision

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Precis

ion

CMUDIR

mer9

ok9

KUN

S2RN

MIR

MIR

S2RN

ok9KUN

mer9CMUDR

Figure 3-Recall-Precision Curve

25

5. DISCUSSION

Our experimental results showed that our technique provides

effective performance in terms of recall, precision and topical

ranking. The results demonstrate that we achieved high aver-

age precision and high precision at different recall levels. The

following table shows the improvement we achieved over

different runs in TREC-9.

Table 4- Improvement Achieved by the proposed search engine over top TREC-runs

Run ID Average

Precision

Improvement R-Precision Improvement

CMUDIR 0.2016 + 37.56% 0.2544 + 33.30%

Mer9 0.2131 + 36.41% 0.2887 + 29.87%

Ok9 0.3538 + 22.34% 0.2712 + 31.62%

KUN 0.3640 + 21.32% 0.2228 + 36.46%

S2RN 0.4629 + 11.43% 0.3477 + 23.97%

There are some limitations that need to be considered in

this context:

1. The semantic parameters in the similarity formulation

are restricted to include title and address. We believe that

including more medical-specialized parameters such as

medical grammars will increase the performance and the

effectiveness of our technique.

2. During experiments, we have observed that we perform

better with long-text in terms of precision and ranking.

Although our technique shows good results with short-

text collection (OHSUMED), we believe that the tool

can demonstrate better in terms of precision.

In the future, we intended to develop a technique to track-

ing user behaviour. This technique will benefit from the

structure of document network created in this work.

Moreover, in the future we plan to measure, similar to [7],

the impact of cognitive biases on the searching task and rele-

vance rankings. Debiasing strategies, such as question-

answering user interface, might applied to reduce such biases

which, in turn, enhance the overall performance of the pro-

posed system.

6. CONCLUSION

In this study, we have developed a search engine, which is

based on creating a document semantic network, for retriev-

ing medical and health information for both medical and

non-medical users. We measured the effectiveness of the

technique in terms of recall and ranking.

In our experiment on TREC collection, we achieved higher

average precision and R-Precision compared with the top 5-

runs in TREC-9. Indeed, the proposed model is built on the

top of vector model as it represents documents using vectors

but it includes more semantic features directed to medical

domain. These features are evaluated with the assistant of

medical domain semantic relations.

These results indicate that the proposed technique is effective

and a good alternative to classicaltechniques in retrieving and

ranking medical and health information.

REFERENCES

1. Aysu Betin Can, Nazife Baykal. MedicoPort: A medical

search engine for all. Computer methods and programs

in biomedicine (2007) (86) 73–86

2. Meliha Yetisgen-Yildiz and Wanda Pratt. The Effect of

Feature Representation on MEDLINE Document Classi-

fication. AMIA 2005 Symposium Proceedings Page

3. Isaac Newton, editor. Philosophiæ Naturalis Principia

Mathematica (1687).

4. Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern

Information Retrieval. ACM press, New York, 1999.

5. Khurshid Ahmad, Lee Gillam, and Lena Tostevin.

Weirdness Indexing for Logical Document Extrapolation

and Retrieval (WILDER).

6. Robertson S and Hull DA (2000) The TREC-9 filtering

track final report. In: Voorhees EM and Harman DK,

Eds., Proceedings of the Ninth Text REtrieval Confer-

ence (TREC-9). Department of Commerce, National In-

stitute of Standards and Technology, pp. 25-40.

7. ANNIE Y.S. LAU, PHD, ENRICO W. COIERA,

MBBS, PHD. Can Cognitive Biases during Consumer

Health Information Searches Be Reduced to Improve

Decision Making? J Am Med Inform Assoc.

(2009)16:54–65. DOI 10.1197/jamia.M2557.

8. Xiubo Geng, Tie-Yan Liu , Tao Qin, and Hang Li.

Feature selection for ranking. Proceedings of the 30th

annual international ACM SIGIR conference on Re-

26

search and development in information retrieval. (2007)

407-441.

9. Monica Chagoyen, Pedro Carmona-Saez, Hagit Shatkay,

Jose M Caraz and Alberto Pascual-Montano. Discover-

ing semantic features in the literature: a foundation for

building functional associations. BMC Bioinformatics

2006, 7:41 doi:10.1186/1471-2105-7-41.

10. Dumais, S. T., Platt, J., Heckerman, D., and Sahami, M.

Inductive learning algorithms for text categorization. In

Proceedings of CIKM. 1998. Bethesda, MD.

11. Lewis, D. D. An Evaluation of Phrasal and Clustered

Representations on a Text Categorization Task. In Pro-

ceedings of SIGIR. 1992.

12. Mao, W., and Chu., W.W. Free-text Medical Document

Retrieval via Phrase-based Vector Space Model. In Pro-

ceedings of AMIA. 2002.

13. Boyack, Kevin W., Mane, Ketan and Börner, Katy, edi-

tors. Mapping Medline Papers, Genes, and Proteins Re-

lated to Melanoma Research. IV2004 Conference, Lon-

don, UK (2004) 965-971.

14. Alan L. Rector and Sebastian Brandt. Why Do It the

Hard Way? The Case for an Expressive Description

Logic for SNOMED. J Am Med Inform Assoc. (2008)

15:744–751. DOI 10.1197/jamia.M2797.

15. OMNI Medical Search. Available from:

www.omnimedicalsearch.com

16. Health On the Net foundation. Available from:

http://www.hon.ch/MedHunt/

17. Yongjing Lin, Wenyuan Li, Keke Chen, Ying Liu. A

Document Clustering and Ranking System for Exploring

MEDLINE Citations. J Am Med Inform Assoc.

(2007)14:651– 661. DOI 10.1197/jamia.M2215.

18. Zhiyong Lu, Won Kim, W. John Wilbur. Evaluating

Relevance Ranking Strategies for MEDLINE Retrieval.

J Am Med Inform Assoc. (2009)16:32–36. DOI

10.1197/jamia.M2935.

19. English Stopword List. Available at:

ftp://ftp.cs.cornell.edu/pub/smart/english.stop

20. National Library of Medicine. Unified Medical Lan-

guage System Fact Sheet. Available at:

http://www.nlm.nih.gov/pubs/factsheets/umls.html

21. Khurshid Ahmad, Lee Gillam, and Lena Tostevin.

Weirdness Indexing for Logical Document Extrapolation

and Retrieval (WILDER).

Documents

[IEEE 2010 5th International Symposium on Health Informatics and Bioinformatics - Ankara, Turkey (2010.04.20-2010.04.22)] 2010 5th International Symposium on Health Informatics and