55
PANI: A Novel Algorithm for Fast Discovery of Putative Target Nodes in Signaling Networks Huey-Eng Chua § Sourav S Bhowmick § Lisa Tucker-Kellogg Qing Zhao § C F Dewey, Jr Hanry Yu § School of Computer Engineering, Nanyang Technological University, Singapore Mechanobiology Institute, National University of Singapore, Singapore Department of Physiology, National University of Singapore, Singapore Division of Biological Engineering, Massachusetts Institute of Technology, USA chua0530|assourav|[email protected], LisaTK|[email protected], [email protected] ABSTRACT Biological network analysis often aims at the target identifi- cation problem, which is to predict which molecule to inhibit (or activate) for a disease treatment to achieve optimum ef- ficacy and safety. A related goal, arising from the increasing availability of semi-automated assays and moderately par- allel experiments, is to suggest many molecules as potential targets. The target prioritization problem is to predict a sub- set of molecules in a given disease-associated network which contains successful drug targets with highest probability. Sensitivity analysis prioritizes targets in a dynamic network model according to principled criteria, but fails to penal- ize off-target effects, and does not scale for large networks. We describe Pani (Putative TArget Nodes PrIoritization), a novel method that prunes and ranks the possible target nodes by exploiting concentration-time profiles and network structure (topological) information. Pani and two sensitiv- ity analysis methods were applied to three signaling net- works, mapk-pi3k; myosin light chain (mlc) phosphoryla- tion and sea urchin endomesoderm gene regulatory network which are implicated in ovarian cancer; atrial fibrillation and embryonic deformity. Predicted targets were compared against a reference set: the molecules known to be targeted by drugs in clinical use for the respective diseases. Pani is orders of magnitude faster and prioritizes the majority of known targets higher than sensitivity analysis. This high- lights a potential disagreement between absolute mathemat- ical sensitivity and our intuition of influence. We conclude that empirical, structural methods like Pani, which demand almost no run time, offer benefits not available from quan- titative simulation and sensitivity analysis. 1. INTRODUCTION Drug discovery research has gradually shifted from observation-based approaches with phenotypic screening, to- ward target-based research aimed at molecular mechanisms Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00. of disease [26]. Observation-based approaches screen drug compounds in vitro, ex vivo, or in vivo; and measure em- pirical outcome for determining which drugs are “active” against the disease. In contrast, target-based approaches identify a particular molecule (e.g., enzyme, receptor) that functions prominently in a validated mechanism of the dis- ease, and then synthesize a drug compound to interact specif- ically with that target molecule. One key expectation in target-based drug development is that specificity for one disease-causing molecule and lack of binding to other molecules will minimize toxicity. Another emerging research area is the network-based drug discovery approaches which exploit knowledge of disease mechanism at a systems level [226]. There is an ongoing debate on the value of observation- based, target-based, and network-based approaches in drug development [39, 107, 97]. As the debate continues, in- novations in experimental methods are chipping away at the previous distinctions between these approaches. Hy- brid strategies are increasingly accessible, in part because of cost-effective methods for parallelizing experiments on tis- sues and living cells. Technologies, such as microfluidic cell culture arrays, and high-content imaging [220, 45, 40], fa- cilitate investigations that bridge between screening empiri- cal outcomes, targeted study of disease mechanisms, and/or network-based systems biology [155]. The specific meth- ods are diverse and rapidly changing, but one clear trend is an increase in customized [155] designs for high-throughput experiments, meaning that individual investigators decide not only the treatments and controls for the input samples, but they also decide which measurements to perform on the samples (e.g., which genes to measure, which behaviors to quantify). This level of design is particularly challenging when experiments have enough “high throughput” coverage to exceed the biological expertise of any single investigator, but not enough coverage to skip the decision-making step and simply measure every variable. Experimental innovation often creates novel computational problems, including immature research topics where sim- ple algorithms might be effective. Experimental trends in drug discovery research are now creating demand for com- putational automation to assist in the selection of molecule sets for multiplex assays. This paper addresses molecule selection in a manner that deliberately spans the gap from network-based and target-based computation to observation-

· PDF filebased goals for final outcome. The first step of this work is to formalize “target prioritization”, the problem of choosing a set of putative target molecules

  • Upload
    voanh

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

  • PANI: A Novel Algorithm for Fast Discovery of PutativeTarget Nodes in Signaling Networks

    Huey-Eng Chua Sourav S Bhowmick Lisa Tucker-KelloggQing Zhao C F Dewey, Jr Hanry Yu

    School of Computer Engineering, Nanyang Technological University, SingaporeMechanobiology Institute, National University of Singapore, Singapore

    Department of Physiology, National University of Singapore, SingaporeDivision of Biological Engineering, Massachusetts Institute of Technology, USA

    chua0530|assourav|[email protected], LisaTK|[email protected], [email protected]

    ABSTRACTBiological network analysis often aims at the target identifi-cation problem, which is to predict which molecule to inhibit(or activate) for a disease treatment to achieve optimum ef-ficacy and safety. A related goal, arising from the increasingavailability of semi-automated assays and moderately par-allel experiments, is to suggest many molecules as potentialtargets. The target prioritization problem is to predict a sub-set of molecules in a given disease-associated network whichcontains successful drug targets with highest probability.Sensitivity analysis prioritizes targets in a dynamic networkmodel according to principled criteria, but fails to penal-ize off-target effects, and does not scale for large networks.We describe Pani (Putative TArget Nodes PrIoritization),a novel method that prunes and ranks the possible targetnodes by exploiting concentration-time profiles and networkstructure (topological) information. Pani and two sensitiv-ity analysis methods were applied to three signaling net-works, mapk-pi3k; myosin light chain (mlc) phosphoryla-tion and sea urchin endomesoderm gene regulatory networkwhich are implicated in ovarian cancer; atrial fibrillationand embryonic deformity. Predicted targets were comparedagainst a reference set: the molecules known to be targetedby drugs in clinical use for the respective diseases. Pani isorders of magnitude faster and prioritizes the majority ofknown targets higher than sensitivity analysis. This high-lights a potential disagreement between absolute mathemat-ical sensitivity and our intuition of influence. We concludethat empirical, structural methods like Pani, which demandalmost no run time, offer benefits not available from quan-titative simulation and sensitivity analysis.

    1. INTRODUCTIONDrug discovery research has gradually shifted from

    observation-based approaches with phenotypic screening, to-ward target-based research aimed at molecular mechanisms

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.

    of disease [26]. Observation-based approaches screen drugcompounds in vitro, ex vivo, or in vivo; and measure em-pirical outcome for determining which drugs are activeagainst the disease. In contrast, target-based approachesidentify a particular molecule (e.g., enzyme, receptor) thatfunctions prominently in a validated mechanism of the dis-ease, and then synthesize a drug compound to interact specif-ically with that target molecule. One key expectation intarget-based drug development is that specificity for onedisease-causing molecule and lack of binding to othermolecules will minimize toxicity. Another emerging researcharea is the network-based drug discovery approaches whichexploit knowledge of disease mechanism at a systems level[226].

    There is an ongoing debate on the value of observation-based, target-based, and network-based approaches in drugdevelopment [39, 107, 97]. As the debate continues, in-novations in experimental methods are chipping away atthe previous distinctions between these approaches. Hy-brid strategies are increasingly accessible, in part because ofcost-effective methods for parallelizing experiments on tis-sues and living cells. Technologies, such as microfluidic cellculture arrays, and high-content imaging [220, 45, 40], fa-cilitate investigations that bridge between screening empiri-cal outcomes, targeted study of disease mechanisms, and/ornetwork-based systems biology [155]. The specific meth-ods are diverse and rapidly changing, but one clear trend isan increase in customized [155] designs for high-throughputexperiments, meaning that individual investigators decidenot only the treatments and controls for the input samples,but they also decide which measurements to perform on thesamples (e.g., which genes to measure, which behaviors toquantify). This level of design is particularly challengingwhen experiments have enough high throughput coverageto exceed the biological expertise of any single investigator,but not enough coverage to skip the decision-making stepand simply measure every variable.

    Experimental innovation often creates novel computationalproblems, including immature research topics where sim-ple algorithms might be effective. Experimental trends indrug discovery research are now creating demand for com-putational automation to assist in the selection of moleculesets for multiplex assays. This paper addresses moleculeselection in a manner that deliberately spans the gap fromnetwork-based and target-based computation to observation-

  • based goals for final outcome. The first step of this work isto formalize target prioritization, the problem of choosinga set of putative target molecules for further study. Next,we present a fast and novel algorithm called Pani (PutativeTArget Nodes PrIoritization), which uses network infor-mation and simple empirical scores to prioritize and rankbiologically relevant target molecules in signaling networks.

    A putative target node in a signaling network is a proteinthat when perturbed is able to achieve desirable efficacyand safety in terms of regulation of a particular output node.Informally, an output node is a protein that is either involvedin some biological processes (e.g., proliferation) which maybe deregulated, resulting in manifestation of a disease (e.g.,cancer) or be of interest due to its physiological role in thedisease. An example of an output node for the applicationof cancer drug design might be Akt (a node of interest inthe mapk-pi3k network [93]). Constitutive activation of Aktwas shown to be oncogenic and may be targeted to disruptovarian tumor cell growth [8]. Regulation of the outputnode provides a means to restore normalcy to the diseasednetwork [106].

    Pani is a generic algorithm applicable to any biologicalsignaling network. The algorithm Pani starts with a pre-processing step that prunes the candidate nodes (nodes be-ing considered for analysis) based on a reachability rule toreduce computation cost. Then, in its main phase, Paniprioritizes nodes using a score based on a kinetic propertycalled profile shape similarity distance (pssd) and two net-work structural properties, namely target downstream effect(tde) and bridging centrality (bc) [111]. Putative targetnodes are nodes with high ranking score.

    Profile shape similarity distance measures the similaritybetween concentration-time series profiles (plot of a nodesconcentration against time) using a customized distance mea-sure that, for example, permits delays and inversions. pssdis a distance measure which identifies the most relevant up-stream regulators. Target downstream effect measures thepotential impact on the network when a node is perturbed.The impact is determined by the number of nodes down-stream of the target and the likelihood that these nodes areassociated to off-target effects. Bridging centrality identi-fies nodes that are located at a connecting bridge betweenmodular subregions in a network [111]. Note that struc-tural properties (tde and bc) are used to identify importantnodes that the kinetic property (pssd) fails to identify (e.g.,Raf in the mapk-pi3k network).

    In Section 5, we evaluate the performance of Pani by com-paring it against two state-of-the-art global sensitivity anal-ysis (gsa)-based techniques [296, 238] run on three signalingnetworks, namely mapk-pi3k [93], myosin light chain (mlc)phosphorylation [160], and endomesoderm [138], which areimplicated in ovarian cancer, atrial fibrillation, and embry-onic deformity, respectively. Instead of defining success ac-cording to the internal logic of the original network, the goalis to agree with empirical outcome: namely, to predict theset of molecules that is actually targeted by drugs given tohuman patients. Our study shows that Pani can identifya majority of targets in these networks and many of thesetargets are ignored by the two gsa-based approaches (multi-parametric sensitivity analysis (mpsa) [296] and sobol [238]).Further, it is orders of magnitude faster than mpsa [296] andsobol [238]. Finally, extrapolating trends from the resultssuggests some insights and possible reasons why empirical

    outcome of disease is not addressed well by sensitivity anal-ysis.

    2. RELATED WORKSensitivity analysis [296, 193, 108] is a family of closely

    related methods that is frequently proposed for target iden-tification. Sensitivity analysis measures the effect of a pa-rameter perturbation (e.g., a kinetic rate constant change)on the output node and assigns sensitivity values to a nodebased on the extent of output node perturbation. The pa-rameters are ranked according to the sensitivity value andsensitive parameters (parameters with high sensitivity val-ues) are then selected as potential targets [108]. The param-eter values of a real biological network vary depending ongenetics, cellular environment and cell type. Thus, no singletrue" nominal parameter value exists. Hence, more appro-priate