46
Exploration, sharing and privacy of data Linked Data – challenges for Imagiology and Radiology Francisco Couto Ciências ULisboa FISMED 2017 – 6 Nov 2017

Linked Data – challenges for Imagiology and Radiology

Embed Size (px)

Citation preview

Page 1: Linked Data – challenges for Imagiology and Radiology

Exploration, sharing and privacy of data

Linked Data – challenges for Imagiology and Radiology

Francisco Couto Ciências ULisboa

FISMED 2017 – 6 Nov 2017

Page 2: Linked Data – challenges for Imagiology and Radiology
Page 3: Linked Data – challenges for Imagiology and Radiology

Images

Text

Data

Linked Data

Page 4: Linked Data – challenges for Imagiology and Radiology

Linked Data

Spouse

Daughter

Page 5: Linked Data – challenges for Imagiology and Radiology

Application

• Similar medical images?

• Computer-assisted image processing

• Related medical images ?

• Which are not necessarily similar

• Linked Data

Page 6: Linked Data – challenges for Imagiology and Radiology

Example

Discover new drugs to treat Alzheimer’s

what proteins are involved in signal transduction

and

are related to pyramidal neurons?

Tim Berners-Lee’s Linked Data slides, from TED 2009

best known as the inventor of the World Wide Web http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

Page 7: Linked Data – challenges for Imagiology and Radiology

Many hits but no relevant results

Page 8: Linked Data – challenges for Imagiology and Radiology

Linked healthcare data 32 hits, 32 results

Page 9: Linked Data – challenges for Imagiology and Radiology

LINKED DATA

Page 10: Linked Data – challenges for Imagiology and Radiology

Definition

The term Linked Data refers to a set of best

practices for publishing and connecting

structured data on the Web using international

standards of the World Wide Web Consortium

Page 11: Linked Data – challenges for Imagiology and Radiology

Wikipedia

• Infopages

• box in the upper right of the page

• names, dates, places

• DBpedia project

• extracts this structured data

• publish as Linked Data

http://dbpedia.org

Page 12: Linked Data – challenges for Imagiology and Radiology
Page 13: Linked Data – challenges for Imagiology and Radiology

Topic Datasets %

Government 183 18.05%

Publications 96 9.47%

Life Sciences 83 8.19%

User-generated content 48 4.73%

Cross-domain 41 4.04%

Media 22 2.17%

Geographic 21 2.07%

Social Web 520 51.28%

Total 1014

State of the LOD Cloud 2014

Page 14: Linked Data – challenges for Imagiology and Radiology

Important properties

• easily combined with other Linked Data

• best reason to explore and use Linked Data

• Important for data sharing

• self-documenting

• immediately figure out what a term means

• Linked Data can be private

• widely deployed behind enterprise firewalls on private networks

Page 15: Linked Data – challenges for Imagiology and Radiology

Linked Data from Forms

• Manual human annotation like in wikipedia

• only works with a wide set of users

• Selecting the right terms is non-trivial

• takes time

• requires knowledge

• Common problems

• missing values

• too generic to be useful B. Inácio, J. Ferreira, and F. Couto, Metadata analyser: measuring metadata quality, in Practical Applications of Computational Biology and Bioinformatics (PACBB), pp. 197-204, 2017

Page 16: Linked Data – challenges for Imagiology and Radiology

Alternatives for Imagiology

• Explore the written reports

• Text mining approaches

• Use a specialized terminology (RadLex)

• multilingual approaches

• Sharing and privacy

• Not images only the metadata

• Still there are privacy concerns

Page 17: Linked Data – challenges for Imagiology and Radiology

Imagiology Written Reports

• written in free-text

• well structured

• more than any other clinical notes

• since the information has to pass from one health care professional to another

Page 18: Linked Data – challenges for Imagiology and Radiology

Text Mining Tools

• Dictionary-based

• only requires a common terminology

• limited to that terminology

• ambiguity of terms

• Machine Learning

• requires a trainning set

• not limited to a terminology

• learns with experience

Page 19: Linked Data – challenges for Imagiology and Radiology

RadLex

• RSNA has produced RadLex(R)

• ontology focused on radiology

• terms (i.e. a lexicon) and the relationships

• over 75,000 terms and synonyms

https://www.rsna.org/RadLex.aspx

Page 20: Linked Data – challenges for Imagiology and Radiology

WHY USING A COMMON TERMINOLOGY?

Page 21: Linked Data – challenges for Imagiology and Radiology

Cell

Page 22: Linked Data – challenges for Imagiology and Radiology

22

London Bills of Mortality listed possible ways to die throughout the

sixteenth, seventeenth and eighteenth centuries

Source: http://faculty.up.edu/asarnow/popular7.htm

Page 23: Linked Data – challenges for Imagiology and Radiology

23

Aggregated Stats

Page 24: Linked Data – challenges for Imagiology and Radiology

<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc= "http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Sintra_Collar"> <dc:description> Gold collar. It was made from three circular sectioned and tapering gold bars that are fused at the ends forming a penannular neck-ring. </dc:description> <dc:date>1250BC-800BC (circa)</dc:date>

<dc:location> Sintra, Portugal </dc:location> <dc:type> Gold </dc:type>

</rdf:Description> </rdf:RDF>

Page 25: Linked Data – challenges for Imagiology and Radiology

<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc= "http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Sintra_Collar"> <dc:description> Gold collar. It was made from three circular sectioned and tapering gold bars that are fused at the ends forming a penannular neck-ring. </dc:description> <dc:date>1250BC-800BC (circa)</dc:date>

<dc:location> Sintra, Portugal http://yboss.yahooapis.com/geo/placefinder?woeid=748874 </dc:location> <dc:type> Gold http://purl.obolibrary.org/obo/CHEBI_30050 </dc:type> </rdf:Description> </rdf:RDF>

Page 26: Linked Data – challenges for Imagiology and Radiology

Metal

Silver

Coinage Precious

Palladium Gold Platinum Copper

is-a

mappings

Page 28: Linked Data – challenges for Imagiology and Radiology
Page 29: Linked Data – challenges for Imagiology and Radiology
Page 30: Linked Data – challenges for Imagiology and Radiology
Page 31: Linked Data – challenges for Imagiology and Radiology
Page 32: Linked Data – challenges for Imagiology and Radiology

Language

Usually written in the native language

Text Mining tools mostly developed for English

RadLex is currently only available in English

German and Portuguese translations of RadLex are currently in development

Obstacle in the sharing information

Page 33: Linked Data – challenges for Imagiology and Radiology

Two approaches

1) Translate the lexicon itself

• new version of lexicon requires translation

2) Translate the reports

• efficient automatic translation services nowadays available

• state-of-the-art Text Mining tools tuned for English

Page 34: Linked Data – challenges for Imagiology and Radiology

Multilingual Reports

• reports accessible to any doctor

• who understands English

• tourists access their reports in their language

• send them to their personal doctor at home

• hospital can get a highly specialized second opinion

• in complex clinical cases from international experts

Page 35: Linked Data – challenges for Imagiology and Radiology

Translation Efficiency

• 51 Research Articles related to Radiology

• originally written in Portuguese

• a human translation in English was available

• We measured NER accuracy

• using Yandex, Google and Unbabel L. Campos, V. Pedro, and F. Couto, Impact of translation on named-entity recognition in radiology texts, Database, vol. 2017, no. bax064, pp. 1-9, 2017

Page 36: Linked Data – challenges for Imagiology and Radiology
Page 37: Linked Data – challenges for Imagiology and Radiology

Multilingual System Prototype

• Given a report

• Automatically translates the report

• Find the most similar reports

• According to most similar RadLex terms

Page 38: Linked Data – challenges for Imagiology and Radiology
Page 39: Linked Data – challenges for Imagiology and Radiology
Page 40: Linked Data – challenges for Imagiology and Radiology
Page 41: Linked Data – challenges for Imagiology and Radiology

PRIVACY

Page 42: Linked Data – challenges for Imagiology and Radiology

Privacy in Imagiology

• Like sharing photos

• After being in the web stays in the web

• An issue can be passed from family to family

• Encrypt data

• If important is a question of time

• Cloud storage

• Is it secure?

• Split through different providers.

Page 43: Linked Data – challenges for Imagiology and Radiology

Privacy in linked data

• You can share metadata without sharing really privacy sensitive information

• show what kind of images are available

• You control which person to give that data

• using a secure channel M. Fernandes, J. Decouchant, F. Couto, and P. Verissimo, Cloud-assisted read alignment and privacy, in Practical Applications of Computational Biology and Bioinformatics (PACBB), pp. 220-227, 2017

Page 44: Linked Data – challenges for Imagiology and Radiology

Final Remarks (1)

• Linked Data is not free or open data

• is not sound data

• it can have access restrictions

• be incomplete and have errors

• But many successful use cases in the Life and Health Sciences

M. Barros and F. Couto, Knowledge representation and management: a linked data perspective, IMIA Yearbook of Medical Informatics, pp. 178-183, 2016

Page 45: Linked Data – challenges for Imagiology and Radiology

Final Remarks (2)

• go beyond technological advances

• create motivation mechanisms

• encourage data owners to share their data

• in a meaningful way

• Science is about replication

• without access to data there is no replication

Page 46: Linked Data – challenges for Imagiology and Radiology

Thanks!

Current Team:

• André Lamúrias and Tânia Maldonado (Text Mining)

• Gonçalo Figueiró (MRIR)

• Maria Fernandes and Mariana Pinhão (Genomic Privacy)

Publications & Tools

• http://labs.fc.ul.pt/

• http://webpages.fc.ul.pt/~fjcouto/?page_id=100