Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
International Journal of Software Engineering and Knowledge Engineeringc© World Scientific Publishing Company
A QUESTION-ANSWER BASED HEURISTIC APPROACH TO
CONSTRUCT DUCG KNOWLEDGE BASE FOR CLINICAL
INTELLIGENT DISEASE DIAGNOSES
SHICHAO GENG
School of Computer Science and Engineering, Beihang University, 37 XueYuan Rd.Beijing,100191,China
QIN ZHANG
School of Computer Science and Engineering, Beihang University, 37 XueYuan Rd.Beijing,100191,China
Received (Day Month Year)
Revised (Day Month Year)Accepted (Day Month Year)
Clinical intelligent diagnostic decision support systems have been a research hotspotfor a long time. As one of the alternative models, Dynamic Uncertain Causality Graph
(DUCG) has been newly presented and applied diseases such as vertigo, jaundice, etc. So
far the results are perfect. However, to successfully apply DUCG in real practice, how tocorrectly model the clinical experts’ knowledge with the DUCG language is critical. In
order to clinical experts can translate their medical knowledge into the DUCG knowledge
base easily, we propose a question-answer based heuristic approach that can elicit clinicalexperts to dig out the causal relationships between diseases and symptoms, diseases and
complications, and complications and symptoms respectively, and then obtain all the
influences between various diseases and symptoms. As a case study, the knowledge baseabout “gastrointestinal-thyroid” disease is elicited with the approach, and is verified in
our experiment. The feasibility and rationality of this heuristic approach are illustrated,which shows that this approach is effective and convenient for use.
Keywords: Heuristic; knowledge representation; clinical decision support system; causal-ity.
1. Introduction
The applications of artificial intelligence in medicine, especially the clinical diag-
nostic decision support systems based on various artificial intelligence technologies,
have been the research hotspot for decades [1] [2].The main task of them is to i-
dentify patients’ possible diseases according to patients’ physical signs, symptoms,
and health history, as well as imaging information and medical laboratory tests.
With the help of intelligent systems, clinical doctors can improve their accuracy of
diagnoses and reduce missed diagnoses and misdiagnoses [3].
1
Manuscript Click here to download Manuscript ws-ijseke.pdf
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
2 Authors’ Names
Rule-based reasoning method, such as MYCIN and INTERNIST-1 [4], etc., are
the earlier systems applied to clinical diagnoses and have been used widely until
now [5].Case-based reasoning medical expert system began to be researched and
applied in the 1980s, and has developed rapidly [6].Many successful medical appli-
cations, such as psychiatry and epidemiology, etc., have emerged one after anoth-
er [7] [8].Since fuzzy logic can be used to handle inaccurate and uncertain medical
knowledge, it has also been widely applied to medical expert system [9].Similarly,
neural network is also well applied in the medical diagnoses [10]. Many other in-
telligent systems have been applied to the diagnoses of skin disease, myocardial
infarction, dyspepsia, gastritis and cancer, etc., as well as for treatment decision-
s [11]. In these intelligent systems, Bayesian network is well known as a probabilistic
graphical model with solid theoretical foundation, and therefore has been used more
frequently in various disease diagnoses such as cancer, heart disease, lumbago and
infectious disease [12].
As a new model, DUCG is newly presented for casual knowledge representa-
tion and probabilistic reasoning [13] [14] [15] [16] [17]. It can represent complex
causal relationships among various event variables intuitively and explicitly, which
is coincident with the knowledge structure of domain experts and therefore easy to
be abstracted form and understood by domain experts. It’s probabilistic inference
is effective and efficient, and the results are explanatory, the latter is important
for real applications. DUCG can also cope with dynamic cases, logic gates capa-
ble of representing any logical relations, uncertain evidence, incomplete knowledge
representations and directed cyclic graphs (DCGs). In clinical diagnoses, DUCG
can work with multiple in which usually one dominates, and bear spurious symp-
toms. DUCG has been initially applied to the diagnosis of vertigo. The diagnostic
accordance rate reaches 88.3% [18].
DUCG-based clinical diagnosis system and many other existing systems are
knowledge-based systems, and one of their key issues is the knowledge acquisition
and the construction of knowledge base by the specific model. Difficulties in knowl-
edge acquisition and construction of knowledge base are the following: (1) Many
doctors have never systematically organized their experience and knowledge, they
can use their experience and intuition to make the right diagnosis, but it is difficult
to describe the diagnostic process. (2) Medical knowledge is complex and uncertain,
so knowledge is easy to lose the symptoms, signs and other variables in the process
of construction. And with the expansion of knowledge base, the doctor is easy to
forget the variables and causal relationships that have established. (3) Doctors can-
not spend a lot of time to learn a highly abstract reasoning model. And reasoning
and expression method of model cannot be fully consistent with doctors diagnostic
way.
Therefore, in the application process of clinical diagnoses of DUCG, how to
easily and systematically translate the expert knowledge in medical domain to the
knowledge base of DUCG model is the primary task of the DUCG-based clinical
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 3
diagnosis system. This paper presents a question-answer based heuristic approach
to elicit domain experts to construct clinical DUCG knowledge bases. First, based
on the characteristic of medical knowledge and DUCG, medical variables are di-
vided into five categories: “diseases“, “symptom“, “complication“, “pathogenic ab-
normality“, and “risk factor“. Then three heuristic procedures of question-answer
are proposed. “Procedure 1“ elicits a direct causal relationship between “diseases“
and “complications“, “diseases“ and “symptoms“. “Procedure 2“ elicits “compli-
cations“ and “symptoms“ that are caused by “complications“ of “Procedure 1“,
and builds the causal relationship between them. “Procedure 3“ finds “pathogenic
abnormalities“ of “diseases“ and “risk factors“.
Section 2 provides a brief introduction to the concept of DUCG. Section 3
introduces the medical concepts in terms of DUCG, and proposes the question-
answer based heuristic approach to construct the clinical diagnosis knowledge base
of DUCG. Section 4 elicits the knowledge base about “gastrointestinal-thyroid“
disease by the use of the question-answer based heuristic approach. Constructed
knowledge base is verified evaluated by the reasoning of DUCG. Section 5 sums up
this paper and outlines the future work.
2. A Brief Introduction to DUCG
DUCG is composed of a set of nodes/variables and directed arcs connecting nodes.
All the directed arcs represent certain or uncertain causalities among nodes [13], in
which the bidirectional arcs (two directed arcs in opposite directions) and directed
cyclic graphs (DCGs) are allowed [16]. Each node is an event variable with dis-
crete states predefined. The continuous variables can be fuzzily classified as fuzzy
discrete variables and treated as ordinary discrete variables equally [17].Five type
event variables are shown in Table 1: (1) The B-type variable drawn as square rep-
resents root causes without any input. Each B-type variable has a prior probability
distribution; (2) The X -type variable (process variable) drawn as circle represents
the consequence and intermediate causes. They must have at least one input and
may or may not have output; (3) The D-type variable drawn as pentagon is as-
sociated with its corresponding X -type variable, representing the default cause of
the X -type variable. In the case that all other parent variable are absent, the D-
type variable decides the probability distribution of the X -type variable; (4) The
G-type variable drawn as gate represents logic gate, in which any logic relations
among input nodes can be specified; (5) The BX -type variable drawn as double
circle represents integrated causes and has at least one input [19].
In addition to these variables, the directed arcs are also represented as event
variables. In this paper, the directed arcs are denoted as A-type event variable
associated with a weighting factor (rn;i/rn), in which An,k;i,j denotes the virtual
causal event that parent event Vi,j , V ∈ {B,X,D,G},causes the consequence event
Xn,k in which the first subscript indexes the variable and the second subscript
indexes the state of the variable,rn;i > 0 represents the causal relationship intensity
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
4 Authors’ Names
between Xn and Vi, rn ≡∑
i rn;i. It is obvious that An,k;i,j is a member of the event
matrix An;i. The directed arc can be written in text as Fn;i ≡ (rn;i/rn)An;i.
The dashed directed arc means conditional Fn;i with condition event Zn;i.
When Zn;i is met, the dashed directed arc becomes solid (becomes ), otherwise
it is eliminated. “;” divides the subscripts of child node (former) and parent node
(later). “,” divides the node index (former) and node state (later), “,” can omit
without confusing.
A simple example of DUCG that is the knowledge sub-graph of a disease is
shown in Fig. 1. This sub-graph includes all types of variables used in medicine.In
this paper, the specific medical meanings of the DUCG variables are defined in
Table 1.
Fig. 1. The DUCG knowledge sub-graph of a disease. The example includes DUCG various
graphical representation used in medicine.
3. The Question-Answer Based Heuristic Approach
3.1. The Medical Concepts Related to DUCG
To elicit doctors to find medical variables and causal relationships among them, we
need to design a standard medical term based heuristic approach, so that doctors
can add corresponding variables and causal relationships according to the relevant
problems. In order the related medical variables to find more conveniently, we clas-
sify the medical variables as five kinds: diseases, symptoms (e.g. physical signs and
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 5
Table 1. Medical meaning of variable types used in the medical DUCG.
Variable type Medical meaning Symbol
BRoot causes of disease, its prior probability represents
the disease prevalence in the population to see doctors
BX
The integrated posterior probability of disease weighted
by a combination of disease incidence and risk factors.
Because, in clinical diagnosis, the doctor should not only judge
the patient’s disease by symptoms, signs,
and related checks but also need to consider the impact of
risk factors on the disease
XSymptoms, physical signs, complications,
laboratory test, imaging information, and risk factors
D Default causes of risk factors
F
Causal relations (causal impact of disease risk factors on disease,
impact of disease on symptoms and medical tests,
and impact of disease on its complications). The starting
from parent to child
Conditional F F -type causal functional event or event matrix with a condition Z
the various physiological values of human body systems), risk factors, complications
or functional abnormality variables, pathogenic abnormality.
Five kinds of medical variables describe as follows:
“Disease” is an event within the patient body, which is the cause of symptoms,
complications and functional abnormalities. “Disease” can lead to a harmful change
of body in vital function, as well as a change in symptoms, physical signs, and human
body systems. Doctors can make a definite diagnosis of patients’ diseases according
to symptoms, physical signs, imaging test and laboratory test, and then treat the
diseases.
“Symptom” can perceptible or observable and is the result of diseases. There are
multiple forms of symptoms, some of which are subjective feelings, such as dizziness,
chest pain and abdominal distension, etc.; some are not subjective feeling, and need
to examined objectively before being identified, such as mucous membrane bleeding
and urine color deepening, etc.; some can be not only felt subjectively, but also
detected objectively, such as jaundice and fever; some other manifest as changes
in human body system, such as frequent micturition, bulimia and obesity, etc. The
physical sign is known as a vital sign, which is also a phenomenon caused by diseases
and can be detected by physical examination, such as blood pressure disorder, ar-
rhythmia, hepatomegaly and splenomegaly, etc. The vital sign can be used to judge
the state of illness and the level of severity. It’s necessary to determine whether
the various physiological indicators of the human body are abnormal through lab-
oratory test or imaging test, such as blood test, B-ultrasonic test, and CT scan,
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
6 Authors’ Names
etc.
“Risk factors” are known as the factors that can influence the results of the dis-
eases. The occurrence of risk factors will affect the posterior probability of diseases.
In DUCG, the posterior probability of diseases is equal to the weighted synthe-
sis of the diseased posterior probability and the risk factors. Risk factors have a
great relationship with chronic diseases and infectious diseases. Risk factors are
generally featured by uncertainty, variability and non-specificity; there may be a
very great difference among the risk factors that endanger different patients. In the
paper, individual behavior and lifestyle, personal medical history, family history,
epidemiological factor and occupational factor, which can produce an impact on
the incidence rate of diseases, are called risk factors by a joint name.
“Complication” means that a disease leads to another disease in the develop-
ment process, which is the result of the secondary reaction of disease. For instance,
when blood glucose rises in the long time, diabetics’ kidneys, heart, brain, eyes
and peripheral nerves will be imperiled, as a result of which diabetic nephropathy
and diabetic foot will be caused. The functional abnormality means that diseases
lead to an abnormal change in organ function or hormone secretion, so that organs
cannot work normally. In the paper, complications and functional abnormality are
expressed as a kind of variables, and are not regarded as a disease to be solved.
“Pathogenic abnormality” means the viral infection and pathogenic infection
that can directly cause diseases. The disease incidence has a very large relation-
ship with the pathogenic abnormality. If a pathogenic abnormality occurs, disease
incidence rate will increase greatly. Moreover, pathogenic abnormality produces im-
pacts only on diseases, and cannot impact on symptoms, signs, and physiological
indexes. Virus infection can serve as the main cause of disease incidence, which is
common in infectious diseases, and laboratory examination can serve to confirm
whether virus infection occurs.
For easy expression, upper case letters are used to denote them, i.e., “D”, “S”,
“R”, “C” and “P” respectively. In DUCG model, V ∈ {B,BX,X,G,D} is used
to represent parent event/variables.For the medical use, we attach “D”, “S”, “R”,
“C” and “P” as a subscript of {V} to assign specific meaning to {V}. The detailed
meanings of every kind variable shows below:
BD:In the construction process of DUCG medical knowledge base, we use BD
to represent the set of all diseases. The prior probability is the disease incidence in
the population.
BXD:The integrated causal variable BXD is the posterior probability of dis-
ease as a set of final results. Since the lack of prior probability for an integrated
causal variable, we use root cause variable BD as the basic input of diseases. The
probability of root cause variable’s impact on diseases is equal to “1”. Meanwhile,
the input of such variables as risk factors is also available for BXD that is the inte-
grated causal variable of diseases. The posterior probability of diseases synthesizes
the incidence disease probability, risk factors and pathogenic anomaly of diseases.
XS :It represents the symptom sets that are directly caused by disease BXD in
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 7
DUCG medical knowledge base.
XC :It is complications set of disease-causing, and can cause the corresponding
symptoms that is expressed with XCS . Meanwhile, complications or functional
abnormality can cause new complications, expressed with XCC .
XR: It is risk factors that can affect diseases, symptoms, physical signs, physi-
ological indexes and pathogenic abnormality, etc.
XP : It represents pathogenic abnormality set. Risk factors can affect pathogenic
abnormality.
For the causal relationship between these variables types, it can be expressed
in Fig. 2, Fig. 3 and Fig. 4 that uses the form of DUCG graph. The occurrence of
diseases can cause symptoms XS or complications XC . Direct causal relationship
graph between diseases and symptoms or complications is shown in Fig. 2. The
development of complications XC may further cause new complications XCC . And
complications XC and XCC can cause the relational complication symptoms XCS .
In Fig. 2, after joining complications XC , XCC and complication symptoms XCS ,
Fig. 2 becomes Fig. 3.
Fig. 3 has expressed the propagation process after the person has the disease,
and shown causal relationship between diseases and its downstream variables. The
occurrence of disease may be due to effect of pathogenic abnormalities XP , and risk
factors XR can affect diseases BD, complications XC , XCC and pathogenic ab-
normalities XP , thus the posterior probability of the disease is affected. Pathogenic
abnormalities and risk factors belong to upstream variables of diseases, after adding
them to DUCG graph, Fig. 3 becomes Fig. 4.
If the presence of pathogenic abnormalities in Fig. 4, then risk factors that affect
pathogenic abnormalities must exist. Risk factors are the topmost X -type variable
and have no input variables, therefore, need to add a default cause variable DR
that can affect risk factors XR.
In the eliciting process of heuristic question-answer based approach, not all med-
ical variables should be involved. Doctors need only to choose those they concerned
in their medical knowledge base. The following variable types can be exclusive: XP ,
XC , XCC , and XCS . The heuristic question-answer based approach is designed to
help doctors choose the relevant variables conveniently to construct DUCG knowl-
edge bases. Doctors can select the corresponding medical variables as needed and
represent the causal relationship between them.
3.2. Three Specific Procedures of Question-Answer Based
Heuristic Approach
Considering a variable, when focusing on its own, we can only pay attention to it’s
direct upstream or downstream variables, and all find out them. The basic idea
of heuristic question-answer approach to constructing DUCG knowledge base for
clinical diagnoses is through each relevant variable, and exhaustively look upstream
and downstream variables of variable of interest. As can be seen in three figures of
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
8 Authors’ Names
Fig. 2. The basic DUCG graph that only includes BD, BXD, XS and XC . In the graph,
the direct causality of diseases and complications, diseases and symptoms are expressed. It alsoexpress the causality of BD and BXD .
Fig. 3. The basic DUCG graph that includes VCC and VCS on the basis of Fig. 2.
The graph increases the causality of complications and complications, and all complications andrelated symptoms on the basis of Fig. 2. Disease and its downstream variables are all contained
in the graph.
Fig. 2, Fig. 3 and Fig. 4, DUCG graph is gradually generating a large scale on a small
scale. Doctors can first find out the relationship between diseases and symptoms,
diseases and complications, and then can seek complications and symptoms caused
by complications. Finally, the pathogenic abnormalities and risk factors are added
to DUCG graph.
Considering the DUCG structure of Fig. 2, Fig. 3 and Fig. 4, the paper designs
three heuristic question-answer procedures to guide doctor build DUCG knowledge
base of clinical disease diagnoses. Each procedure elicits overall DUCG structure
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 9
Fig. 4. The DUCG graph that includes all types of variables. Pathogenic abnormalitiesand risk factors are added to the DUCG, and establish the related causality. The graph includes
all types of variables about DUCG. The graph can clearly express the causal relationship between
the all types of variables in the DUCG.
graph that respectively are shown in Fig. 2, Fig. 3 and Fig. 4. The heuristic question-
answer method to build the DUCG clinical knowledge base is stepwise refinement
method, at the same time, doctor can control the size of the clinical knowledge
base according to need. The method can start to build a very large-scale knowledge
base from a symptom, or build a small-scale knowledge base to fit use of a specific
problem.
Clinical medical diagnosis is made to identify diseases through symptoms, so
doctors can set a symptom of concern at the beginning and then look for the
related diseases with the heuristic question-answer based approach composed of
three procedures. The main thought of Procedure 1 is to start with a symptom
to find out the diseases related to the symptom, then seeks for the symptoms and
complications related to the diseases, and then search for more diseases in the
symptoms and complications, to iteratively find out all the symptoms, complications
and diseases that doctors regard important. In all procedures, the numerical symbols
on the front represent the step, and in the detailed example, we cite the symbols
on the front to represent the step.
Procedure 1: Represent a direct causal relationship between diseases and
symptoms, as well as between diseases and complications.
Input:the symptoms, laboratory test items or complications selected by doctors.
Output:including the sub-DUCG of the direct causal relationship between dis-
eases and symptoms, as well as between diseases and complications.
The guidance step of Procedure 1 is shown in Fig. 5.
Procedure 1 starts with the major symptoms of a kind of diseases. The corre-
sponding diseases are find by the major symptoms. Then It starts to find out the
direct symptoms of the diseases, and the diseases according to the symptoms. If the
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
10 Authors’ Names
Fig. 5. The guidance step of Procedure 1. The overall structure of DUCG that is shownin Fig. 2 may be guided by the procedure. During the elicited process, user can find the relatedvariables and causality based on these questions
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 11
steps are repeated, all the diseases, symptoms and complications directly caused
by diseases, which doctors consider necessary, can be found out. The basic DUCG
graph that includes diseases, symptoms and complications as well as the causali-
ty among them, is elicited through Procedure 1. The general framework of the
DUCG knowledge graph is shown in Fig. 2.
In Procedure 1, nine key questions-answers are designed, whose steps are num-
bered as 〈02〉, 〈04〉, 〈05〉, 〈09〉, 〈12〉, 〈13〉, 〈16〉, 〈19〉 and 〈22〉. A new disease is elicited
from the initial symptom in step 〈02〉. Doctors are instructed to build a sub-graph of
disease named in step 〈04〉 and 〈05〉. New symptoms, complications are elicited from
disease in step 〈09〉 and 〈13〉. New diseases are elicited from newly added symptoms
in step 〈19〉 and 〈22〉. A causal relationship between diseases and symptoms, as well
as diseases and complications, is built in step 〈12〉 and 〈16〉. Elicited by the nine key
steps, new diseases and symptoms, as well as a causal relationship between them,
are built.
Procedure 2: Add the symptoms or complications caused by complications to
each initially generated sub-graph. Start with the complication variable set to find
out all the complications, and final symptoms.
Input:Elicit the DUCG sub-graph which contains the direct causal relationship
between diseases and symptoms, as well as diseases and complications;
Output:The DUCG sub-graph that contains all the relationships chosen by
doctors between diseases and symptoms, diseases and complications, complications
and complications , as well as complications and symptoms.
The guidance step of Procedure 2 is shown in Fig. 6.
The complications that are caused by complications can be added to sub-graph
by Procedure 2. Moreover, the final symptoms caused by complications can be
elicited. Procedure 2 forms a DUCG sub-graph for the causal relationship be-
tween diseases and symptoms, diseases and complications, complications and com-
plications, as well as complications and symptoms. The DUCG structure graph is
as shown in Fig. 3.
In Procedure 2 there are only six key problems, which are 〈02〉, 〈05〉, 〈09〉, 〈12〉,〈18〉and 〈22〉. The new complications XCC caused by complications XC are found
from the complications directly caused by diseases in step 〈02〉. All the complications
caused by XCC are found in an iterative mode in step 〈09〉. The symptoms, physical
signs and relevant examinations caused by all the complications are found in step
〈18〉. The causal relationship between complications and complications, as well as
complications and symptoms, is mainly directed in such three steps as 〈05〉, 〈12〉and 〈22〉.
Procedure 3: Add pathogenic abnormality and risk factors to DUCG knowl-
edge graph. The procedure forms a complete DUCG knowledge graph that contains
diseases, symptoms, physical signs, relevant tests, risk factors and the corresponding
causal relationships.
Input:The DUCG knowledge graph that contains diseases, symptoms, compli-
cations, relevant tests and the causal relationships among them;
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
12 Authors’ Names
Fig. 6. The guidance step of Procedure 2. The overall structure of DUCG that is shown in
Fig. 3 may be guided by the procedure. Procedure 2 can guide the all downstream variables ofdiseases, these variables include symptoms, signs, complications and related checks, and so on.
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 13
Output:The DUCG knowledge graph that contains diseases, symptoms, func-
tional abnormality, laboratory tests, pathogenic abnormalities and hazards, as well
as the causal relationships among them.
The guidance step of Procedure 3 is shown in Fig. 7.
Procedure 3 elicits the pathogenic abnormalities and risk factors, as well as
the causal relationships among them. And meanwhile elicits the causal relationships
between pathogenic abnormalities and diseases, risk factors and disease, risk factors
and symptoms, complications, and laboratory tests. A general DUCG knowledge
graph is elicited after Procedure 3 ends.
In Procedure 3, the key steps include 〈03〉, 〈04〉, 〈07〉, 〈10〉, 〈15〉, 〈16〉, 〈19〉, 〈22〉and 〈26〉. All the pathogenic abnormalities of diseases are elicited in step 〈03〉. The
risk factors that cause pathogenic abnormalities are searched in step 〈07〉, and all
the risk factors of diseases are elicited in step 〈15〉. The causal relationships between
risk factors and symptoms, complications are elicited in step 〈19〉 and 〈22〉, in which
there isn’t new variable, and only new causal relationships are elicited. The new
causal relationships are elicited after new pathogenic abnormalities or risk factors
are added in step 〈04〉, 〈10〉 and 〈16〉. The risk factors that do not input are added
to the default causal variables and their causal relationships in step 〈26〉.The causal functional event F with condition Z isn’t involved in the above
Procedure of heuristic question-answer process for the time being, because in the
construction of knowledge base, the design of condition needs to be carried out
according to the established variables, and that some conditions can’t be made
clear at one blow, which need to be set in accordance with the quantity of diseases
and the difference among diseases.For example, in terms of two diseases that are
“alcoholic liver disease” and “nonalcoholic fatty liver disease”, whether the patient
have a history of alcoholism or alcohol which can distinguish two diseases.When the
patient does not a history of alcoholism or alcohol, that can diagnose “nonalcoholic
fatty liver disease” based on related symptoms, and in the case of the patient
has history of alcoholism or alcohol which diagnoses “alcoholic liver disease”. This
judgment is distinguished by conditional causal functional event between diseases
and symptoms, relational checks. The conditional causal functional event can be
set after completing construction of variables.
4. The Application of Question-Answer Based Heuristic Approach
4.1. The Application of Three Procedures
To verify the feasibility of Procedure 1, Procedure 2 and Procedure 3, we need
to use examples to judge whether the effective structure of DUCG knowledge graph
can be elicited through these Procedures. In the following application, we’ll elicit
DUCG knowledge graph from the beginning of such a symptom as “tachycardia”.
In eliciting process, the state of such X -type variables as symptom, physical sign,
complication, relevant examination and risk factors is ignored. In the process of
heuristic question-answer process, whether diseases and complications can cause
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
14 Authors’ Names
Fig. 7. The guidance step of Procedure 3. The overall structure of DUCG that is shown inFig. 4 may be guided by the procedure. After the end of the Procedure 3, all medical variablesand causality what users care can be elicited. The medical knowledge base that can be used is
established.
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 15
symptoms means whether they can cause an abnormal state in symptoms. If a
variable has many abnormal states, a causal relationship can be built as long as one
abnormal state is aroused.
Knowledge is elicited from the beginning of a symptom “tachycardia”. The ba-
sic DUCG knowledge graph of four diseases is elicited through the application of
Procedure 1, as shown in Fig. 5. The graph contains the direct causal relation-
ship between diseases and symptoms, complications. The key steps in Procedure
1 have been applied for 32 times. And 4 diseases, 9 symptoms, and 3 complications,
as well as their causal relationships, are elicited.
Starting from symptom “tachycardia” (X1), step 〈02〉 can elicit the disease “thy-
roid disease” (BX1). Step 〈05〉 can establish a sub-graph G1 and elicit the root
cause B1 of BX1. Step 〈09〉 can find symptoms “palpitation” (X2), “high level of
FT3” (X3), and “high level of FT4” (X4) by disease BX1. Step 〈12〉 can build
the causal relationship between BX1 and X2, X3, X4. Step 〈13〉 can obtain “atri-
al fibrillation”(X5) by BX1. Step 〈16〉 establishes the causal relationship between
BX1 and X5. Thus Fig. 8(a) is established.
Step 〈22〉 can elicit the disease “hiatus hernia” (BX2) from the beginning X5.
Step 〈05〉 can establish a sub-graph G2 that is named “hiatus hernia” and elicit
the root cause B2 of BX2. Step 〈09〉 can find symptoms “barium swallow evidence
of hiatal hernia” (X6), “upper GI endoscopy evidence of hiatal hernia” (X7), and
“burning pain” (X8) by disease BX2. The causal relationship between BX2 and
X6, X7, X8 are built with step 〈12〉. The complication “gastroesophageal reflux”
(X9) can be added to G2 by step 〈13〉. Step 〈16〉 builds the causal relationship
between BX2 and X9. Through these steps Fig. 8(b) is established.
The disease “gastric ulcer” (BX3) can be elicited by X8 according to step 〈19〉.Sub-graph G3 is built and the root cause B3 of BX3 is added to G3 by step 〈05〉.BX3 can elicit the symptom “upper GI endoscopy evidence of ulcer” (X10) by step
〈09〉 and the causal relationship between them is built by step 〈12〉. Using step
〈13〉, the complication “gastric mucosal lesion” (X11) of BX3 is elicited. The causal
relationship between BX3 and X10 is completed. Fig. 8(c) is established by these
steps.
Step 〈22〉 can find the disease “gastritis” by X11. Sub-graph G4 is built and
the root cause B4 of BX4 is added to G4 by step 〈05〉. Using step 〈09〉 can find
the symptom “Upper GI endoscopy evidence of gastritis”(X12) from BX4 and step
〈12〉 build the causal relationship between BX4 and X12. New complications are
not found with step 〈13〉. New diseases have not been added by 〈19〉 and 〈22〉. So
Fig. 8(d) is the last a sub-graph that is elicited by Procedure 1.
The knowledge graph of four diseases is built by Procedure 1. The paper is just
to illustrate the process of eliciting, thereby controlling the size of the knowledge
base. In practical applications, doctor can create an appropriate scale knowledge
base according to the need.
Three complications X5, X9 and X11 are found according to Procedure 1. Pro-
cedure 2 is to find the complications caused by three complications and symptoms
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
16 Authors’ Names
Fig. 8. The basic DUCG knowledge sub-graph elicited from Procedure 1 in the example
of “gastrointestinal-thyroid”. Starting from the symptom “tachycardia”, four diseases are find,
and symptoms and complications are directly caused by them are also added into sub-graph. Thecausality between diseases and symptoms, complications are find .
of all the complications. Step 〈02〉 can find X5 cannot cause new complications, but
X9 and X11 can cause new complication “stomach disorder” (X13). X13 is added
to sub-graph G2, G3 and G4 that include X9 or X11 by step 〈04〉, and the causal
relationship between them is built by step 〈05〉. X13 cannot cause any complications
by step 〈09〉, so the knowledge final include four complications: X5, X9, X11, and
X13. Using step 〈18〉 can find X5 can cause those symptoms: “tachycardia”(X1),
“palpitation”(X2), “irregular pulse”(X14) and “ECG evidence”(X15). The sub-
graph G1 and G2 that include X5 are added variables X1, X2, X14 and X15, and the
causal relational between them are built by step 〈21〉 and 〈22〉. Similarly, using step
〈18〉 can seek the symptoms “upper GI endoscopy evidence of esophagitis”(X16),
“24-hour esophageal PH monitoring” (X17), “cough” (X18) that are caused by X9
and the symptoms “acid regurgitation”(X19) and “heartburn”(X20) that are caused
by X13. In sub-graph G2, G3 and G4, the causal relationships are built by step 〈21〉and 〈22〉.
In the application process of Procedure 2, the key steps have been applied for
17 times, and elicited the DUCG knowledge graph that contains all the symptoms,
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 17
physical signs, examinations and complications or functional abnormality related
to diseases. G1, G2, G3 and G4 are shown in Fig. 9(a), 9(b), 9(c) and 9(d).
Fig. 9. Based on the result of Procedure 1, the four DUCG knowledge sub-graphselicited from Procedure 2 in the example of “gastrointestinal-thyroid”. On the basis ofProcedure 1, complications caused by complications, symptoms caused by all complications, and
all causality are added into four sub-graphs by Procedure 2.
During using Procedure 3, step 〈02〉 cannot find pathogenic abnormalities
of four diseases, so do not care the causal relationship between pathogenic ab-
normalities and diseases, risk factors and pathogenic abnormalities. Risk factors
“gender”(X21) and “age”(X22) that can cause BX1 are elicited by step 〈14〉 and
are added into sub-graph G1, at the same time, the causal relationship between
X21, X22 and BX1 are built. For BX2 of sub-graph G2, step 〈14〉 and 〈16〉elicit risk factors “gender”(X21), “age” (X22) and “overweight”(X23) and build
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
18 Authors’ Names
the causal relationship between them. In the sub-graph G3 and G4, risk factors
“gender”(X21),“age”(X22),“blood pressure”(X24) and “proton pumps inhibitors”
(X25) and the causal relationship between them and BX3, BX4 which can be find
by step 〈14〉 and 〈16〉. Step 〈19〉 can find the risk factor X22 can caused complica-
tions X5 and X9, and the causal relationship between X21 and X5, X9 are added
in sub-graph G1, G2, G3, G4 by step 〈22〉. Step 〈19〉 and 〈22〉 can elicit the effect of
risk factors X22, X23, X24 and X25 on complications and their causal relationship
in each sub-graph. At last, risk factors are added default causal variables DR by
step 〈26〉. After Procedures 3 is completed, four sub-graphs that include diseases,
symptoms, complications, related checks and risk factors are elicited and are shown
in Fig. 10(a), 10(b), 10(c) and 10(d).
Risk factors and the causal relationships between risk factors and diseases are
elicited in Procedure 3. The key steps in Procedure 3 have been used 35 times.
Because it’s believed that the four diseases have not pathogenic abnormality, the
key steps 〈03〉, 〈04〉, 〈07〉 and 〈10〉 aren’t applied. The DUCG knowledge graph to
be built has been finished after Procedure 3 ends. In the application process of
Procedure 1, Procedure 2 and Procedure 3, doctors can control knowledge
quantity according to actual requirements.
We us such a symptom as “tachycardia” and get the sub-DUCG of four diseases
through heuristic catechetical procedures. 153 steps are used in the above examples,
84 of which are key steps. 4 diseases (4 BX -type and B -type variables respectively),
16 symptoms, physical signs and relevant examinations(X -type variables), 4 compli-
cations or functional abnormality (X -type variables), 5 hazards (X -type variables),
5 default causal variables of hazards (D-type variable) derived and 68 causal rela-
tionships (F -type variable) are elicited. The prior probability of diseases can be set
according to the incidence probability of the diseases among people as recorded in
the relevant literatures, so the degree of causal influence should be set by doctors in
accordance with cardinal symptoms and secondary symptoms. After combining the
four knowledge sub-graphs in Fig. 10 together by DUCG Intelligent Diagnosis Sys-
tem, we can get the composite DUCG knowledge graph of “gastrointestinal-thyroid”
disease as shown in Fig. 11. The repetitive variables and causal relationships in the
sub-charts are merged in the composite graph.
Each step of heuristic question-answer method is focused on the direct causal
relationships, to refine and localize problems, so as to reduce the difficulty of think-
ing. This method mainly helps doctors find out the relevant diseases by symptoms,
and find out the symptoms caused by diseases, so as to work out the causal rela-
tionship between them, to finally sort out the structures of the DUCG knowledge
graph of the whole clinical medicine. The elicited medical DUCG knowledge graph
is borderless, and medical experts can extend it infinitely as long as they are willing
to. The complications or functional abnormality in the procedure can be a kind
of symptoms or relevant tests. The classification can help medical experts to use
better stepwise refinement to build DUCG knowledge graphs.
When using heuristic question-answer method, each doctor can pay attention
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 19
Fig. 10. Based on the result of Procedure 2, the four DUCG knowledge sub-graphs
elicited from Procedure 3 in the example of “gastrointestinal-thyroid”. On the basis ofProcedure 2, pathogenic abnormalities and risk are added into four sub-graphs by Procedure 3.
After the end of the Procedure 3, corresponding sub-graphs of four disease are built.
only to the diseases with which he is familiar, and several experts can construct
a knowledge base jointly according to the functions of the DUCG software sys-
tem. Doctors can check whether variables are already exist by the variable checking
function offered by software system. About the knowledge of contradiction, the soft-
ware will give doctors tips automatically, and multiple doctors can unify them after
consultation. If the same knowledge repeatedly appears in many sub-graphs, the
software system will combine it with the composite graph, to ensure the uniqueness
of the knowledge in the final DUCG knowledge graph. Heuristic question-answer
method has improved the construction efficiency and quality of DUCG medical
knowledge base based upon the functions offered by software system, thus ensuring
the consistency and integrity of DUCG medical knowledge base.
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
20 Authors’ Names
Fig. 11. The DUCG knowledge graph of “gastrointestinal-thyroid” disease. Four sub-graphs of Fig. 10 can be combined into the graph. The graph includes .
4.2. The Verification of Elicited Knowledge Base
The evidence information obtained through inquiry and physical examination is
shown as follows:
E′
1 = X1,1; E′
2 = X2,1; E′
3 = X14,1; E′
4 = X18,1; E′
5 = X19,1; E′
6 = X20,1;
E′
7 = X21,2; E′
8 = X24,4; E′
9 = X23,1; E′
10 = X24,1; E′
11 = X25,1.
The status information of other variables in the knowledge base is unknown.
Making an inference and calculation according to the above evidences, and figure
out the integrated causal variables of diseases and the sorting probability: Hiatus
hernia(BX2:34.27%), Gastritis(BX4:31.91%), Thyroid disease(BX1:29.02%) and
Peptic ulcer (BX3:4.80%). The inferential knowledge graphs of the four diseases
are shown in Fig. 12. In these graphs, light blue represents abnormal symptoms
and physical signs, while light yellow represents abnormal risk factors.
In Fig. 12, there are 3 symptoms that can be explained with B1, 6 symptoms
that can be explained with B2 and only 2 symptoms that can be explained with B3
and B4. The symptoms that can’t be explained with B1, B3 and B4 are affected
by risk factors. Among the four initial events, B2 has the lowest prior probability
(3.3%), but it can explain the largest number of symptoms and that there are three
risk factors which have impacts on BX2, so BX2 has a high posterior probability
and sorting probability. The prior probability of B4 is highest (80%), and that there
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 21
Fig. 12. The reasoning knowledge graphs of four diseases that contain the first group
of evidences. In these graphs, light blue represents abnormal symptoms and physical signs andlight yellow represents abnormal risk factors. Systems that can be affected by diseases are clearly
expressed. The effect of risk factors for each disease also can find.
are also three risk factors with impacts on BX4, and the posterior probability of
BX4 is high as well. The prior probability (18%) of B1 is higher than that (11%)
of B3, and that B1 can explain many symptoms, so the integrated causal posterior
probability of BX1 is higher than that of BX3. B3 and B4 have the same causal
chain for symptoms, but the probability of the impact of B3 and B4 on X11 is set
as 0.6, so there is the same causal influence on the posterior probability of B3 and
B4. However, in the calculation of the overall sorting probability, since the effect
on probability isn’t very obvious, but mainly because there is a small difference
among causal influences in degree, and there are causal influential events in the
numerator and denominator of the computational formula of posterior probability,
so the function of causal influence is further weakened. This is also the reason why
relative degree is available to causal influence probability.
Further examine of patient can get further evidence, as shown below:
E′
12 = X6,1; E′
13 = X7,1; E′
14 = X3,0; E′
15 = X4,0; E′
16 = X10,0;E′
17 = X12,0.
All other states with unclear variables are unknown. “Hiatus hernia disease“ can
be confirmed after reasoning. The inferential knowledge graph of the four diseases
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
22 Authors’ Names
under all evidences is shown in Fig. 13. In Fig. 13(b), the sub-graph of “hiatus
hernia” explains all the symptoms and relevant examinations. But B1, B3 and B4
can’t explain abnormal variable X6 and X7 in Fig. 13(a), 13(c) and 13(d), so
virtual default reasons are used for interpretation.
Fig. 13. The DUCG reasoning knowledge graph under all evidences. In these graph, only
“Hiatus hernia” can explain all the evidence, so the posterior probability of “Hiatus hernia” is
the largest. The reasoning knowledge graph clearly explains the relationship between diseases andsymptoms, risk factors and diseases.
In this example, the result of the relevant diseases was obtained through two-step
reasoning, and the reasoning process and result of the diseases were well explained.
The rationality of the knowledge base built in this paper was verified, which shows
that the elicited knowledge base fits in with such a characteristic of DUCG mod-
el structure and probabilistic reasoning as comprehensiveness and uniformity in
reasoning process.
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
Instructions for Typing Manuscripts (Paper’s Title) 23
5. Conclusion
Knowledge acquisition is a difficult and key point of knowledge-based intelligent
system. For DUCG-based clinical diagnostic decision system, the transformation of
medical knowledge into the causal relationship of DUCG model decides the accura-
cy and reliability of diagnosis. Based on the features of DUCG model and medical
knowledge, this paper starts with a symptom and elicited DUCG causal relation-
ship among medical knowledge by heuristic question-answer method that has three
procedures. Procedure 1 elicits the relevant diseases according to symptoms, and
then elicits symptoms according to diseases. After repetitions, it elicits the direc-
t causal relationships between diseases and symptoms, as well as between diseases
and complications. Procedure 2 starts with complications and elicits the complica-
tions and symptoms caused by complications, and built a causal relationship among
them. Procedure 3 elicits the risk factors of diseases, as well as the impact of risk
factors on complications and symptoms. All the relevant medical knowledge can be
elicited through the three Procedures. To verify the Procedures, we started with
“tachycardia” and elicited a simple knowledge graph for “gastrointestinal-thyroid”
disease. The example verified the validity and reliability of the elicited knowledge
base according to the ultimate reasoning and calculation. The heuristic construction
method of DCUG medical knowledge base proposed in this paper can help medi-
cal experts construct a DUCG medical knowledge base quickly and systematically.
With the aid of the functions of software system, it can improve the construction
efficiency of medical knowledge bases and the usability of DUCG clinical diagnostic
decision system.
Acknowledgments
This research is supported by the National Natural Science Foundation of China
under grant 61273330.
References
[1] A. Berlin, M. Sorani, and I. Sim, A taxonomic description of computer-based clinicaldecision support systems. Journal of Biomedical Informatics. vol. 39, no. 6, pp. 656-667, 2006.
[2] A. Yardimci, Soft computing in medicine. Applied Soft Computing. vol. 9, no. 3, pp.1029-1043, 2009.
[3] A. X. Garg, N. K. Adhikari, H. McDonald, M. P. Rosas-Arellano, P. J. Devereaux, J.Beyene, J. Sam, and R. B. Haynes, Effects of computerized clinical decision supportsystems on practitioner performance and patient outcomes: a systematic review. TheJournal of the American Medical Association. vol. 293, no. 10, pp. 1223-1238, 2005.
[4] R. A. Miller, H. E. Pople, and J. D. Myers, Internist-1, an experimental computer-based diagnostic consultant for general internal medicine. The New England Journalof Medicine. vol. 307, no. 8, pp. 468-476, 1982.
[5] R. D. Keith, S. Beckley, J. M. Garibaldi, J. A. Westgate, E. C. Ifeachor, and K. R.Greene, A multicentre comparative study of 17 experts and an intelligent computer
April 7, 2016 15:59 WSPC/INSTRUCTION FILE ws-ijseke
24 Authors’ Names
system for managing labour using the cardiotocogram. vol. 102, no. 8, pp. 688-700,1995.
[6] A. Holt, I. Bichindaritz, R. Schmidt, and P. Perner, Medical applications in case-based reasoning. The Knowledge Engineering Review. vol. 20, no. 3, pp. 289-292,2005.
[7] I. Bichindaritz, and C. Marling, Case-based reasoning in the health sciences: What’snext?. Artificial Intelligence in Medicine. vol. 36, no. 2, pp. 127-135, 2006.
[8] Y. J. Park, S. Chun, and B. C. Kim, Cost-sensitive case-based reasioning using agenetic algorithm: application to medical diagnosis. Artificial Intelligence in Medicine.vol. 51, no. 2, pp. 133-145, 2011.
[9] M. Mahfouf, M. F. Abbod, and D. A. Linkens, A survey of fuzzy logic monitoringand control utilization in medicine. Artificial Intelligence in Medicine. vol. 21, no. 1-3,pp. 27-42, 2001.
[10] R. Narasinga, G. Sridhar, and K. Madhu, A clinical decision support system usingmulti-layer perceptron neural network to predict quality of life in diabetes. Diabetes& Metabolic Syndrome: Clinical Research & Reviews. vol. 4, no. 1, pp. 57-59, 2010.
[11] B. Pandey, and R. B. Mishra, Knowledge and intelligent computing system inmedicine. Computers in Biology and Medicine. vol. 39, no. 3, pp. 215-230, 2009.
[12] N. Cruz-Ramirez, H. G. Acosta-Mesa, H. Carrillo-Calvet, L. A. Nava-Fernandez, andR. E. Barrientos-Martinez, Diagnosis of breast cancer using Bayesian networks: a casestudy. Computers in Biology and Medicine. vol. 37, no. 11, pp. 1553-1564, 2007.
[13] Q. Zhang, Dynamic uncertain causality graph for knowledge representation and rea-soing: discrete DAG cases. Journal of Computer Science and Technology. vol. 27, no.1, pp. 1-23, 2012.
[14] Q. Zhang, C. Dong, Y. Cui, and Z. Yang, Dynamic uncertain causality graph forknowledge representation and probabilistic reasoning: statistics base, matrix, andapplication. IEEE Transactions on Neural Networks and Learning. vol. 25, no. 4, pp.645-663, 2014.
[15] Q. Zhang, and S. Geng, Dynamic uncertain causality graph applied to dynamic faultdiagnoses of large and complex systems. IEEE Transactions on Reliability. vol. 64,no. 3, pp. 910-927, 2015.
[16] Q. Zhang, Dynamic uncertain causality graph for knowledge representation and prob-abilistic reasoning: directed cyclic graph and joint probability distribution. IEEETransactions on Neural Networks and Learing Systems. vol. 26, no. 7, pp. 1503-1517,2015.
[17] Q. Zhang, Dynamic uncertain causality graph for knowledge representation and prob-abilistic reasoning: continuous variable, uncertain evidence, and failure forecast. IEEETransactions on Systems, Man, and Cybernetics: Systems. vol. 45, no. 7, pp. 990-1003,2015.
[18] C. Dong, Y. Wang, Q. Zhang, and N. Wang, The methodology of dynamic uncertaincausality graph for intelligent diagnosis of vertigo. Computer Methods and Programsin Biomedicine. vol. 113, no. 1, pp. 162-174, 2014.
[19] S. Geng, and Q. Zhang, Calculation method to diagnose integrated causes of faultsin process systems by means of dynamic uncertain causality graph. 2014 Asia-PacificComputer Science and Application Conference (CSAC 2014). pp. 306-311, 2014.