26
1 1 http://www.semoss.org http://youtube.com/user/semossa nalytics http://twitter.com/semossanalyt ics

1 1

Embed Size (px)

Citation preview

Page 1: 1 1

1

1

http://www.semoss.org

http://youtube.com/user/semossanalytics

http://twitter.com/semossanalytics

Page 2: 1 1

2

2

Agenda

Shark Tank Overview

SEMOSS

HHS Ignite Use Case

Demo

Page 3: 1 1

3

HHS Ignite is an “incubator for new ideas” run out of the HHS IDEA Lab.

NIAID SEB Innovation Challenge

HHS Ignite Innovation Program

Expansion

Jan ‘14

Jul ‘14 Aug ‘14 Sep ‘14

Nov ‘14

The Evolution of HHS Ignite

Phase 1:HHS Ignite boot camp

Phase 2:NIH Interviews & Pilot Development

Phase 3:HHS Ignite Shark Tank

Vincent Munster, PhD

Peter Jahrling,PhD

NIH SEMOSS team with HHS Deputy Secretary Bill Corr

& HHS CTO Bryan Sivak

Responding to an informal request for innovation ideas from the NIH’s National Institute of Allergy and Infectious Diseases (NIAID), a small Deloitte team submitted a written proposal. At the client’s suggestion, the proposal was submitted and the team selected to compete the 2014 HHS IdeaLab’s Ignite innovation tournament. During the 3-month pilot, the Deloitte team engaged 10 NIAID customers and created a functional proof-of-concept solution for 2 intramural scientists. The Deloitte/NIAID team successfully presented their pitch to a panel of relevant federal executives and the HHS CTO during the concluding Shark Tank on September 30, 2014.

Page 4: 1 1

4

4

Page 5: 1 1

5

SEMOSS Evolution

Solution History

SEMOSS is a result of several years of federal investment in federated, semantic web technology .

In 2010, the Deputy Chief Management Officer (DCMO) of the Department of Defense began experimenting with Semantic Web technology. Military Health System (MHS) with help from Deloitte created a graph-based toolset for multi-dimensional analysis of disparate data sources to determine investment sequencing for the MHS IT portfolio. After MHS presented its solution to DCMO, a joint investment was made to fund a similar tool utilizing the Semantic Web. The guiding principles of the tool were that it must be standards based; allow integrating data from multiple sources; and adopt visualizations and analytics on an as-needed basis. This investment spawned SEMOSS.

Analysis

Data Sources

• Excessive time spent on data preparation

• analysis and visualization constraints

1-2 Data Sources 3-4 Data Sources

• Limited to single database• Answers modeled as graph

traversals demonstrations

Stakeholders 2-3

No Limit

• Integrated knowledge analytics environment

• Transitive across databases• Collaboration• Answers modeled as reports

No Limit

Knowledge Exploration• Minimal• Repetitive Visualizations

• Single Dimensional• Difficult to customize

• Multi-Dimensional• Self Service

2010-2011 - Excel / Tableau 2011 – 2012 Neo4J 2012 – 2013 SEMOSS

1

IssuesFocus on visualizationNot Malleable

ProprietaryLong cycle times

None as product created to meet client needs

Solution Evolution

Page 6: 1 1

6

What Does Federated Analytics Mean?

• Elastic data integration with more than 6 connectors, including Excel/CSV, NLP, RDBMS, Cloud Aware Data sources

• Context aware data, that can link across databases

• W3C Standards – RDF, SPARQL

• Rich library of visualizations• Parallel Coordinates• Excel style charting• Network Viz.• Heat-maps

• Extensibility to adopt any visualization

• Overlay visualizations to see overlaps

• Graph Algorithms• Optimization – Linear and Non-

Linear algorithms• Statistical algorithms• Equation Solving

Data

Viz.

Analytics

http://www.semoss.org

http://youtube.com/user/semossanalytics

http://twitter.com/semossanalytics

Federate Data

Discover Insights

Perform Analysis

Visualize Decisions

Share Knowledge

Page 7: 1 1

7

Types of Visualizations Included in SEMOSS

Page 8: 1 1

8

8

HHS IGNITE INNOVATION PROGRAMUSE CASE

Page 9: 1 1

9

Diverse Researchers across HHS

Dawei LinPhD, NIHComputer Modeling

Vincent MunsterPhD, NIHInfectious Diseases

Marie Parker, NIHResearch Initiatives

Susanna VisserDrPh, CDCADHD

Page 10: 1 1

10

Common Research Goals

Dawei LinPhD, NIHComputer Modeling

Vincent MunsterPhD, NIHInfectious Diseases

Marie Parker, NIHResearch Initiatives

Susanna VisserDrPh, CDCADHD

Data Access

Robust Analysis

Collaboration

Page 11: 1 1

11

Technology Barriers

Dawei LinPhD, NIHComputer Modeling

Vincent MunsterPhD, NIHInfectious Diseases

Marie Parker, NIHResearch Initiatives

Susanna VisserDrPh, CDCADHD

Big Data

Inaccessibili

ty

Isolated Analysis

Collaboration Barriers

Multiple SourcesIntegrationChallenges

Page 12: 1 1

12

Dr. Munster’s Research

Vincent MunsterPhD, NIHInfectious Diseases

Big Data

Inaccessibili

ty

Isolated Analysis

Collaboration Barriers

Multiple SourcesIntegrationChallenges

Middle East Respiratory Syndrome

(MERS)

The platform allows me to analyze and grasp large seemingly incomprehensible datasets.

- Vincent Munster, PhD

Page 13: 1 1

13

Dr. Munster’s Research Challenges

1) Diseases2) Articles3) Collaborators

Private Data

PubMed

FAERS

DisGeNet

PharmGKB

HGNC

Page 14: 1 1

14

Our Tested Solution

1) Diseases2) Articles3) Collaborators

Private Data

PubMed

FAERS

DisGeNet

PharmGKB

HGNC

Page 15: 1 1

15

Use Case Metamodel

CTD

PharmGKB

PubMed

DisGeNet

DrugBank

HGNC

PubChem

PrivateDatasets

Gene

Publication

Author

Chemical Disease DrugDrug

Component

Researcher Datasets

Pathway

Molecular

Function

Biological Process

Chromosome

Cell Component

• No single database has exhaustive information. Multiple connections ensure complete data.

• The data sources above reflect the information requested by our customer. This solution can be easily customized for any researcher.

Page 16: 1 1

16

16

HHS IGNITE DEMO

Page 17: 1 1

Appendix

Page 18: 1 1

18

SEMOSS Supplementing Insights

1. Private Research Data

2. Online Mendelian Inheritance in Man (OMIM)

3. PubMed

4. HUGO Gene Nomenclature Committee (HGNC)

5. DrugBank

6. Comparative Toxicogenomics Database (CTD)

7. Disease Gene Network (DisGeNet)

8. PubChem

9. PharmGKB

1. Gene Expression

2. Chemical

3. Cellular Pathway

4. Molecular Function

5. Biological Process

6. Cytolocation

7. Cell Component

8. Gene Nomenclature

9. Disease

10. Publication

11. Author

Relevant Data Data Sources

Page 19: 1 1

19

Solution Benefits & Capabilities

Researcher Benefits• Data Accuracy; ensure you are using validated, authoritative sources• Time Efficiency; eliminate days spent reading publications and searching for data• Single Platform; use centralized platform rather than multiple data locations• Rapid Visualization & Analysis; to gain insight and accelerate research • Scientific Collaboration; secure public/private cloud instance for collaboration

Solution Capabilities• Big Data; navigate and distill relevant data seamlessly• Extensible, Scalable Data Model; shared model of understanding• Undirected Research; what questions do we ask public data that we do not have

answers to?• Broad Applicability; across many subject areas and data types• Open Data Initiatives; federal public data initiatives with no data consumption tool

Page 20: 1 1

20

SEMOSS maximizes HHS Open Data ROI by leveraging the vast networks of public and private life science data to promote insight and discovery.

SEMOSS

Federal HealthData Environment

CloudInfrastructure

SEMOSS Platform

End Users

Scientific Use CaseSEMOSS Solution

SEMOSS

Which diseasesare associated with

my genes of interest?

PharmGKB

CTD

PubMedDisGeNet

Cancer Researcher

HGNCMESH

FAERS

Solution Overview

Page 21: 1 1

21

Solution Demonstration

1) Diseases2) Articles3) Collaborators

Data Sources1. Private Data2. HGNC3. OMIM4. DisGeNet5. CTD6. PharmGKB7. PubMed8. FAERS9. VAERS

Page 22: 1 1

22

Solution Demonstration

Data Sources1. Private Data2. HGNC3. OMIM4. DisGeNet5. CTD6. PharmGKB7. PubMed8. FAERS9. VAERS

1) Diseases2) Articles3) Collaborators

Page 23: 1 1

23

Solution Demonstration

1) Diseases2) Articles3) Collaborators

Data Sources1. Private Data2. HGNC3. OMIM4. DisGeNet5. CTD6. PharmGKB7. PubMed8. FAERS9. VAERS

Page 24: 1 1

24

Solution Demonstration

1) Diseases2) Articles3) Collaborators

Data Sources1. Private Data2. HGNC3. OMIM4. DisGeNet5. CTD6. PharmGKB7. PubMed8. FAERS9. VAERS

The platform allows me to analyze and grasp large seemingly incomprehensible datasets.

- Vincent Munster, PhD

Page 25: 1 1

25

SEMOSS Supplementing Insights

Identify Question

SEMOSS pre-packages more than eighty questions across domains that can be readily utilized. New questions can be modeled as reports.

Synthesize Meta Model

SEMOSS has more than ten different domain metamodels. New models can be created / extended to emulate mental models.

Find and Import Data

SEMOSS has industry data across healthcare, infrastructure, data and BPR that can be readily explored. Link excel data or RDBMS to existing data for analysisFind and

Import Data

SEMOSS has industry data across healthcare, infrastructure, data and BPR that can be readily explored. Link excel data or RDBMS to existing data for analysis.

Visual

Analysis

SEMOSS allows automatic linking of data across databases and allows cross-database visualization. Users no longer need to import everything into a single database.

Page 26: 1 1

26

The Team

Special Thanks to…

Prabhu KapaleeswaranAuthor, SEMOSSMHS

Joe CroghanProject SupervisorNIH

Brock SmithProject LeadNIH

Alexander ShermanTechnical SMENIH

Karthik BalakrishnanTechnical LeadNIH

LeeAnn Bailey, PhDScience SMEFDA

Alexandra KwitScience LeadNIH

Regina CoxData SMECDC

Vincent Munster, PhDNIH

Mike TartakovskyNIH

Alex RosenthalNIH

Dawei Lin, PhDNIH

Peter Jahrling, PhDNIH

David ParrishNIH