Linked Data – challenges for Imagiology and Radiology

Preview:

Citation preview

Exploration, sharing and privacy of data

Linked Data – challenges for Imagiology and Radiology

Francisco Couto Ciências ULisboa

FISMED 2017 – 6 Nov 2017

Images

Text

Data

Linked Data

Linked Data

Spouse

Daughter

Application

• Similar medical images?

• Computer-assisted image processing

• Related medical images ?

• Which are not necessarily similar

• Linked Data

Example

Discover new drugs to treat Alzheimer’s

what proteins are involved in signal transduction

and

are related to pyramidal neurons?

Tim Berners-Lee’s Linked Data slides, from TED 2009

best known as the inventor of the World Wide Web http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

Many hits but no relevant results

Linked healthcare data 32 hits, 32 results

LINKED DATA

Definition

The term Linked Data refers to a set of best

practices for publishing and connecting

structured data on the Web using international

standards of the World Wide Web Consortium

Wikipedia

• Infopages

• box in the upper right of the page

• names, dates, places

• DBpedia project

• extracts this structured data

• publish as Linked Data

http://dbpedia.org

Topic Datasets %

Government 183 18.05%

Publications 96 9.47%

Life Sciences 83 8.19%

User-generated content 48 4.73%

Cross-domain 41 4.04%

Media 22 2.17%

Geographic 21 2.07%

Social Web 520 51.28%

Total 1014

State of the LOD Cloud 2014

Important properties

• easily combined with other Linked Data

• best reason to explore and use Linked Data

• Important for data sharing

• self-documenting

• immediately figure out what a term means

• Linked Data can be private

• widely deployed behind enterprise firewalls on private networks

Linked Data from Forms

• Manual human annotation like in wikipedia

• only works with a wide set of users

• Selecting the right terms is non-trivial

• takes time

• requires knowledge

• Common problems

• missing values

• too generic to be useful B. Inácio, J. Ferreira, and F. Couto, Metadata analyser: measuring metadata quality, in Practical Applications of Computational Biology and Bioinformatics (PACBB), pp. 197-204, 2017

Alternatives for Imagiology

• Explore the written reports

• Text mining approaches

• Use a specialized terminology (RadLex)

• multilingual approaches

• Sharing and privacy

• Not images only the metadata

• Still there are privacy concerns

Imagiology Written Reports

• written in free-text

• well structured

• more than any other clinical notes

• since the information has to pass from one health care professional to another

Text Mining Tools

• Dictionary-based

• only requires a common terminology

• limited to that terminology

• ambiguity of terms

• Machine Learning

• requires a trainning set

• not limited to a terminology

• learns with experience

RadLex

• RSNA has produced RadLex(R)

• ontology focused on radiology

• terms (i.e. a lexicon) and the relationships

• over 75,000 terms and synonyms

https://www.rsna.org/RadLex.aspx

WHY USING A COMMON TERMINOLOGY?

Cell

22

London Bills of Mortality listed possible ways to die throughout the

sixteenth, seventeenth and eighteenth centuries

Source: http://faculty.up.edu/asarnow/popular7.htm

23

Aggregated Stats

<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc= "http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Sintra_Collar"> <dc:description> Gold collar. It was made from three circular sectioned and tapering gold bars that are fused at the ends forming a penannular neck-ring. </dc:description> <dc:date>1250BC-800BC (circa)</dc:date>

<dc:location> Sintra, Portugal </dc:location> <dc:type> Gold </dc:type>

</rdf:Description> </rdf:RDF>

<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc= "http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Sintra_Collar"> <dc:description> Gold collar. It was made from three circular sectioned and tapering gold bars that are fused at the ends forming a penannular neck-ring. </dc:description> <dc:date>1250BC-800BC (circa)</dc:date>

<dc:location> Sintra, Portugal http://yboss.yahooapis.com/geo/placefinder?woeid=748874 </dc:location> <dc:type> Gold http://purl.obolibrary.org/obo/CHEBI_30050 </dc:type> </rdf:Description> </rdf:RDF>

Metal

Silver

Coinage Precious

Palladium Gold Platinum Copper

is-a

mappings

Language

Usually written in the native language

Text Mining tools mostly developed for English

RadLex is currently only available in English

German and Portuguese translations of RadLex are currently in development

Obstacle in the sharing information

Two approaches

1) Translate the lexicon itself

• new version of lexicon requires translation

2) Translate the reports

• efficient automatic translation services nowadays available

• state-of-the-art Text Mining tools tuned for English

Multilingual Reports

• reports accessible to any doctor

• who understands English

• tourists access their reports in their language

• send them to their personal doctor at home

• hospital can get a highly specialized second opinion

• in complex clinical cases from international experts

Translation Efficiency

• 51 Research Articles related to Radiology

• originally written in Portuguese

• a human translation in English was available

• We measured NER accuracy

• using Yandex, Google and Unbabel L. Campos, V. Pedro, and F. Couto, Impact of translation on named-entity recognition in radiology texts, Database, vol. 2017, no. bax064, pp. 1-9, 2017

Multilingual System Prototype

• Given a report

• Automatically translates the report

• Find the most similar reports

• According to most similar RadLex terms

PRIVACY

Privacy in Imagiology

• Like sharing photos

• After being in the web stays in the web

• An issue can be passed from family to family

• Encrypt data

• If important is a question of time

• Cloud storage

• Is it secure?

• Split through different providers.

Privacy in linked data

• You can share metadata without sharing really privacy sensitive information

• show what kind of images are available

• You control which person to give that data

• using a secure channel M. Fernandes, J. Decouchant, F. Couto, and P. Verissimo, Cloud-assisted read alignment and privacy, in Practical Applications of Computational Biology and Bioinformatics (PACBB), pp. 220-227, 2017

Final Remarks (1)

• Linked Data is not free or open data

• is not sound data

• it can have access restrictions

• be incomplete and have errors

• But many successful use cases in the Life and Health Sciences

M. Barros and F. Couto, Knowledge representation and management: a linked data perspective, IMIA Yearbook of Medical Informatics, pp. 178-183, 2016

Final Remarks (2)

• go beyond technological advances

• create motivation mechanisms

• encourage data owners to share their data

• in a meaningful way

• Science is about replication

• without access to data there is no replication

Thanks!

Current Team:

• André Lamúrias and Tânia Maldonado (Text Mining)

• Gonçalo Figueiró (MRIR)

• Maria Fernandes and Mariana Pinhão (Genomic Privacy)

Publications & Tools

• http://labs.fc.ul.pt/

• http://webpages.fc.ul.pt/~fjcouto/?page_id=100

Recommended