· PDF filebased goals for ﬁnal outcome. The ﬁrst step of this work is to formalize “target prioritization”, the problem of choosing a set of putative target molecules

PANI: A Novel Algorithm for Fast Discovery of PutativeTarget Nodes in Signaling Networks
Huey-Eng Chua Sourav S Bhowmick Lisa Tucker-KelloggQing Zhao C F Dewey, Jr Hanry Yu
School of Computer Engineering, Nanyang Technological University, SingaporeMechanobiology Institute, National University of Singapore, Singapore
Department of Physiology, National University of Singapore, SingaporeDivision of Biological Engineering, Massachusetts Institute of Technology, USA
chua0530|assourav|[email protected], LisaTK|[email protected], [email protected]
ABSTRACTBiological network analysis often aims at the target identifi-cation problem, which is to predict which molecule to inhibit(or activate) for a disease treatment to achieve optimum ef-ficacy and safety. A related goal, arising from the increasingavailability of semi-automated assays and moderately par-allel experiments, is to suggest many molecules as potentialtargets. The target prioritization problem is to predict a sub-set of molecules in a given disease-associated network whichcontains successful drug targets with highest probability.Sensitivity analysis prioritizes targets in a dynamic networkmodel according to principled criteria, but fails to penal-ize off-target effects, and does not scale for large networks.We describe Pani (Putative TArget Nodes PrIoritization),a novel method that prunes and ranks the possible targetnodes by exploiting concentration-time profiles and networkstructure (topological) information. Pani and two sensitiv-ity analysis methods were applied to three signaling net-works, mapk-pi3k; myosin light chain (mlc) phosphoryla-tion and sea urchin endomesoderm gene regulatory networkwhich are implicated in ovarian cancer; atrial fibrillationand embryonic deformity. Predicted targets were comparedagainst a reference set: the molecules known to be targetedby drugs in clinical use for the respective diseases. Pani isorders of magnitude faster and prioritizes the majority ofknown targets higher than sensitivity analysis. This high-lights a potential disagreement between absolute mathemat-ical sensitivity and our intuition of influence. We concludethat empirical, structural methods like Pani, which demandalmost no run time, offer benefits not available from quan-titative simulation and sensitivity analysis.
1. INTRODUCTIONDrug discovery research has gradually shifted from
observation-based approaches with phenotypic screening, to-ward target-based research aimed at molecular mechanisms
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.
of disease [26]. Observation-based approaches screen drugcompounds in vitro, ex vivo, or in vivo; and measure em-pirical outcome for determining which drugs are activeagainst the disease. In contrast, target-based approachesidentify a particular molecule (e.g., enzyme, receptor) thatfunctions prominently in a validated mechanism of the dis-ease, and then synthesize a drug compound to interact specif-ically with that target molecule. One key expectation intarget-based drug development is that specificity for onedisease-causing molecule and lack of binding to othermolecules will minimize toxicity. Another emerging researcharea is the network-based drug discovery approaches whichexploit knowledge of disease mechanism at a systems level[226].
There is an ongoing debate on the value of observation-based, target-based, and network-based approaches in drugdevelopment [39, 107, 97]. As the debate continues, in-novations in experimental methods are chipping away atthe previous distinctions between these approaches. Hy-brid strategies are increasingly accessible, in part because ofcost-effective methods for parallelizing experiments on tis-sues and living cells. Technologies, such as microfluidic cellculture arrays, and high-content imaging [220, 45, 40], fa-cilitate investigations that bridge between screening empiri-cal outcomes, targeted study of disease mechanisms, and/ornetwork-based systems biology [155]. The specific meth-ods are diverse and rapidly changing, but one clear trend isan increase in customized [155] designs for high-throughputexperiments, meaning that individual investigators decidenot only the treatments and controls for the input samples,but they also decide which measurements to perform on thesamples (e.g., which genes to measure, which behaviors toquantify). This level of design is particularly challengingwhen experiments have enough high throughput coverageto exceed the biological expertise of any single investigator,but not enough coverage to skip the decision-making stepand simply measure every variable.
Experimental innovation often creates novel computationalproblems, including immature research topics where sim-ple algorithms might be effective. Experimental trends indrug discovery research are now creating demand for com-putational automation to assist in the selection of moleculesets for multiplex assays. This paper addresses moleculeselection in a manner that deliberately spans the gap fromnetwork-based and target-based computation to observation-

based goals for final outcome. The first step of this work isto formalize target prioritization, the problem of choosinga set of putative target molecules for further study. Next,we present a fast and novel algorithm called Pani (PutativeTArget Nodes PrIoritization), which uses network infor-mation and simple empirical scores to prioritize and rankbiologically relevant target molecules in signaling networks.
A putative target node in a signaling network is a proteinthat when perturbed is able to achieve desirable efficacyand safety in terms of regulation of a particular output node.Informally, an output node is a protein that is either involvedin some biological processes (e.g., proliferation) which maybe deregulated, resulting in manifestation of a disease (e.g.,cancer) or be of interest due to its physiological role in thedisease. An example of an output node for the applicationof cancer drug design might be Akt (a node of interest inthe mapk-pi3k network [93]). Constitutive activation of Aktwas shown to be oncogenic and may be targeted to disruptovarian tumor cell growth [8]. Regulation of the outputnode provides a means to restore normalcy to the diseasednetwork [106].
Pani is a generic algorithm applicable to any biologicalsignaling network. The algorithm Pani starts with a pre-processing step that prunes the candidate nodes (nodes be-ing considered for analysis) based on a reachability rule toreduce computation cost. Then, in its main phase, Paniprioritizes nodes using a score based on a kinetic propertycalled profile shape similarity distance (pssd) and two net-work structural properties, namely target downstream effect(tde) and bridging centrality (bc) [111]. Putative targetnodes are nodes with high ranking score.
Profile shape similarity distance measures the similaritybetween concentration-time series profiles (plot of a nodesconcentration against time) using a customized distance mea-sure that, for example, permits delays and inversions. pssdis a distance measure which identifies the most relevant up-stream regulators. Target downstream effect measures thepotential impact on the network when a node is perturbed.The impact is determined by the number of nodes down-stream of the target and the likelihood that these nodes areassociated to off-target effects. Bridging centrality identi-fies nodes that are located at a connecting bridge betweenmodular subregions in a network [111]. Note that struc-tural properties (tde and bc) are used to identify importantnodes that the kinetic property (pssd) fails to identify (e.g.,Raf in the mapk-pi3k network).
In Section 5, we evaluate the performance of Pani by com-paring it against two state-of-the-art global sensitivity anal-ysis (gsa)-based techniques [296, 238] run on three signalingnetworks, namely mapk-pi3k [93], myosin light chain (mlc)phosphorylation [160], and endomesoderm [138], which areimplicated in ovarian cancer, atrial fibrillation, and embry-onic deformity, respectively. Instead of defining success ac-cording to the internal logic of the original network, the goalis to agree with empirical outcome: namely, to predict theset of molecules that is actually targeted by drugs given tohuman patients. Our study shows that Pani can identifya majority of targets in these networks and many of thesetargets are ignored by the two gsa-based approaches (multi-parametric sensitivity analysis (mpsa) [296] and sobol [238]).Further, it is orders of magnitude faster than mpsa [296] andsobol [238]. Finally, extrapolating trends from the resultssuggests some insights and possible reasons why empirical
outcome of disease is not addressed well by sensitivity anal-ysis.
2. RELATED WORKSensitivity analysis [296, 193, 108] is a family of closely
related methods that is frequently proposed for target iden-tification. Sensitivity analysis measures the effect of a pa-rameter perturbation (e.g., a kinetic rate constant change)on the output node and assigns sensitivity values to a nodebased on the extent of output node perturbation. The pa-rameters are ranked according to the sensitivity value andsensitive parameters (parameters with high sensitivity val-ues) are then selected as potential targets [108]. The param-eter values of a real biological network vary depending ongenetics, cellular environment and cell type. Thus, no singletrue" nominal parameter value exists. Hence, more appro-priate

Documents

· PDF filebased goals for ﬁnal outcome. The ﬁrst step of this work is to formalize “target prioritization”, the problem of choosing a set of putative target molecules