When Healthcare Meets Data Science (Anastasiia Kornilova Technology Stream)

Preview:

Citation preview

When Healthcare Meets Data Science

Anastasiia Kornilova

http://www.slideshare.net/WebCongress/mars-one-bas-lansdorp

http://www.slideshare.net/WebCongress/mars-one-bas-lansdorp

The Medicine of the Future

http://www.healthbizdecoded.com/2013/05/hies-meeting-the-sustainability-challenge/

http://graphics.wsj.com/infectious-diseases-and-vaccines/

«One or two patient died per week in a certain smallish town because of the lack of information flow between the hospital’s emergency room and the nearby mental health clinic»

[«Doing Data Science», O’Neil ]

60% of US doctors still use paper medical records

Let’s create our own EHR standard

Patient gender Code

Male 0

Female 1

Patient gender Code

Male 1

Female 0

Patient gender Code

Male M

Female F

Unknown U

Let’s code gender

Standart A

Standart B

Standart Cx

x

There 5 key data standards

ICD - diagnostic, billing , world-wide

CPT - procedures, billing , US-specific, classification

LOINC - lab tests and observations, world-wide

NDC - medication, US-specific, classification

SNOMED - medicine

… and a lot of custom standards

Even within one data standard:ICD-9

174 malignant neoplasm of female breast

174.1 malignant neoplasm of central portion of female breast

ICD-10

C50 malignant neoplasm of breast

C50.1 malignant neoplasm of central portion of breast

C50.111 malignant neoplasm of central portion of right female breast

C50.111 malignant neoplasm of central portion of left female breast

You have to be a doctor to handle them

Problem summary

Standart 1

Standart 2

Standart N

medicine expertisea lot of (expensive) hours

Knowledge

Standarts are changing

Artificial Intelligence Way

Feed a lot of medical texts to «medical doctor»

Use NLP power

Make it unsupervised

Key idea:

«Semantically similar words occurs in similar contents» Harris, 1954 «You shall know a word by the company it keeps», Firth, 1957

«It was the year when Udacity, Coursera and edX, the three leading MOOC companies, took the education world by storm and promised a lot» [Huffington Post]

«Many places offer MOOCs, and many more will. But Coursera, Udacity and edX are the leading providers.» [NYTimes]

Distributed Vectors Representation

Two layer neural network

Input: text corpus

Output: set of vectors

Group the vectors of similar words together in vector space (detects similarities matematically)

Predict a word using content

All

youneed

love

is

Resulting vectors

All you

need is

love

[0.2, 0.11, 087, 0.9, … , 0.2] [0.1, 0,98, 01, 0.26, …, 0.82] [0.7, 0.22, 0.3, 0.1, …, 0.45]

[0.5, 0.21, 0,67, 0.82,…, 0.49] [0.6, 034, 0.21, 0.45,…, 0.2]

Vectors Relationships

Vectors Relationships

http://nlp.stanford.edu/projects/glove/images/company_ceo.jpg

http://nlp.stanford.edu/projects/glove/images/comparative_superlative.jpg

ICD-9

174 malignant neoplasm of female breast

174.1 malignant neoplasm of central portion of female breast

ICD-10

C50 malignant neoplasm of breast

C50.1 malignant neoplasm of central portion of breast

C50.111 malignant neoplasm of central portion of right female breast

C50.111 malignant neoplasm of central portion of left female breast

Summary

LinksEfficient Estimation of Word Representation in Vector Space (Mikolov)

Distributed representation of words and phrases and their compositionality (Mikolov)

word2vec Parameter Learning Explaining (Rong)

Questions?

Recommended