Scripps bioinformatics seminar_day_2

Day 2 of Computing on the shoulders of

giants: how existing knowledge is represented and applied in

bioinformaticsBenjamin Good

bgood@scripps.eduAssistant Professor of the Department of

Molecular and Experimental Medicine

Recap from Day 1• Make things (articles, genes,

antibodies, etc.) easier to find• Answer questions• Generate hypotheses

Controlled vocabularies (MeSH)Ontologies (Gene Ontology)

knowledge graphs on the Web: the SPARQL query language

knowledge plus computation = inference, the ABC model

Computing with knowledge• Challenges with knowledge graphs

• Too much data• ->> query, sort, visualize, interact

• Not enough data• ->> mine for more..

• Goal for practical day: Go beyond PubMed! • gain hands on experience using a knowledge graph

• either with tools built for the purpose or with your own code…

Assignment: knowledge graph to hypothesis• Option 1 Coding

• Implement and apply an ABC Model style hypothesis generating program (can adapt from example provided)

• explain its logic, explain how you used it to generate a hypothesis, explain the hypothesis (provide a visual)

• Option 2 Non-coding• Use a knowledge discovery application(s) (list provided) to define a new hypothesis• if you can’t think of where to start, try to explain why Metformin may contribute to cancer survival

• Assignment deliverables: a document containing • the inputs you gave to your program or the online tool(s) you used• what was generated in response and the underlying logic • an image and text describing the results, especially any hypothesis you could derive

• (for Option 1 also submit any code written or files generated as a tar or zip archive)

Online tools for knowledge discovery• http://knowledge.bio (* we make this one…)• http://www.biograph.be (this is a good tool, but often breaks down) • http://epiphanet.uth.tmc.edu (also on the flaky side, but can be good) • https://skr3.nlm.nih.gov/SemMed/ (works okay, requires a (free)

account) • http://arrowsmith.psych.uic.edu (ugly interface, but good tool)

Demos• http://knowledge.bio • http://www.biograph.be• http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/start.cgi

Example question: repurposing all drugs

http://tinyurl.com/hwm9388

?disease

interacts with

protein

geneencoded by genetic association

treats??

Example program (feel free to follow or adapt to your interest)• Example

• Input = a disease (A)• Output = a ranked list of drugs (C) that might be used for treatment

• Render the results of your workflow as a cytoscape network that illustrates the reasoning behind the predictions

• Implementation• Python• Use a SPARQL endpoint such as http://query.wikidata.org

• + identify and use another endpoint (e.g. EBI, UniProt)• ++ access pubmed articles and MeSH indexing

Python setup• pip install RDFLib, SPARQLWrapper, pandas…. • Hopefully Jupyter already installed ? else install it http://

jupyter.readthedocs.io/en/latest/install.html • get notebook from https://

github.com/SuLab/sparql_to_pandas/blob/master/SPARQL_pandas.ipynb • go to directory where you put the notebook• run it with• >jupyter notebook• should be ready to run

the notebook• will run a basic search for disease-gene-drug connections in wikidata• will sort the results by the number of intervening genes• will export the data to a tab-delimited file you can view in Excel, text

editor, or load into cytoscape• Your job:

• Run it and extend it by one or more of:• adapting the query• changing the way the results are sorted• working with the output in cytoscape to produce an informative visualization

example output rendered in cytoscape

Other queries from Day 1 (slides 48-54)• Drugs that target a cancer and impact a specific biological process

• http://tinyurl.com/j222k6g

• Drugs that target a new disease linked via biological pathway with shared genes to disease the drug is now used to treat

• http://tinyurl.com/gpfr9kj

Possible inputs for adaptations• Browse and examine wikidata.org to see what you might make use of

• e.g. • Type of physical interaction between gene and drug• Gene ontology annotation (what evidence codes?)• Disease ontology hierarchy• Drug characteristics

Other possible knowledge sources • SPARQL

• UniProt http://sparql.uniprot.org • EBI SPARQL https://www.ebi.ac.uk/rdf/documentation/sparql-endpoints • look for unique identifiers on genes and proteins that you can use to link

wikidata content to their content

• Text• use the NCBI the E-utils API to programmatically access pubmed articles and

MeSH indexing http://www.ncbi.nlm.nih.gov/books/NBK25501/ • Can use to build co-occurrence networks of e.g. MeSH terms

Good luck! Ask questions!

ABC ranking algorithms• Out of all C, which are most strongly

related to A?• Rank by N shared B concepts

• c2: 4• c4:3• c1: 1• c3: 1• c5:1• c6:1

• Next level: adjust to down-weight highly connected nodes

A B Cc1c2c3c4c5c6

ABC ranking algorithms – advanced (require large networks to be useful) • Wren – Average Minimum Weight (AMW) (Wren)

• http://bioinformatics.oxfordjournals.org/content/20/3/389.full.pdf

• Linking Term Count with Average Minimum Weight (LTC-AMW) (Yetisgen-Yildiz and Pratt)

• https://www.researchgate.net/publication/23759128_A_new_evaluation_methodology_for_literature-based_discovery_systems

• Predicate inter-dependence (Rastegar-Mojarad)• https://s3.amazonaws.com/uploads.hipchat.com/25885/154162/UaGvvQqbr

hPBAWN/A%20new%20method.pdf

Scripps bioinformatics seminar_day_2

Science

Scripps Ranch Planning Group - San Diego...Scripps Ranch Planning Group MEETING AGENDA Thursday, March 1, 2007 at 7:00 p.m. Scripps Ranch Community Library - Community Room 10301 Scripps

Scripps Networks Interactive (SNI) Earnings Report: Q4 2015 … · Scripps Networks Interactive (SNI) Earnings Report: Q4 2015 Conference Call Transcript The following Scripps Networks

bioinformatics secrets The Bioinformatics Skill Systemangus.readthedocs.io/en/2014/_static/2014-rpg.pdf · The Bioinformatics Skill System bioinformatics secrets 1. ... bioinformatics

Healthcare Jobs - Scripps Health

Soccer Knee Injury - Scripps

PYTHON FOR STRUCTURAL BIOINFORMATICS1 February 3-6 2003 BioCon 2003 - San Diego, CA - 1 PYTHON FOR STRUCTURAL BIOINFORMATICS Sophie COON & Michel SANNER MGL laboratory The Scripps

Scripps College - SEP · 2018-05-16 · Scripps is a private, women’s college located in Claremont, California with approximately 1,000 students and 500 faculty members. Scripps

Scripps Cancer Center

Jerry Scripps

Sports Injuries for Primary Care - Scripps Health · Daniel Keefe, MD Scripps Clinic, Scripps Green Hospital Grand Rounds Wednesday, Nov. 9, 2011 2 Introduction • The inadequacy

Scripps On Defense.ppt - University of California, San Diegoscilib.ucsd.edu/sio/hist/Day_Scripps On Defense.pdf · Scripps On Defense: Military Funding At Scripps, 1940-1970 Deborah

Scripps Case Study - Quantum

Scripps preso

Scripps Florida Funding Corporation

Scripps Institution of Oceanography · Scripps Institution of Oceanography 2300 Birch Aquarium at Scripps ..... C13 2300 Birch Aquarium Booth..... D12 2300 Birch Aquarium Education

Scripps Florida Senate Poll

Scripps - Beryl Presentation

Quality scripps

AI for Genomics - Scripps Research · 2019. 3. 28. · Li Yin (Scripps) Alex Wells (Stanford) ZijingGu (Scripps) Shang-Fu Chen (Scripps) Raquel Dias (Scripps) Ali Torkamani (Scripps)

Scripps JSchool imPRessions Portfolio