View
43
Download
0
Category
Preview:
Citation preview
Exploration, sharing and privacy of data
Linked Data – challenges for Imagiology and Radiology
Francisco Couto Ciências ULisboa
FISMED 2017 – 6 Nov 2017
Images
Text
Data
Linked Data
Linked Data
Spouse
Daughter
Application
• Similar medical images?
• Computer-assisted image processing
• Related medical images ?
• Which are not necessarily similar
• Linked Data
Example
Discover new drugs to treat Alzheimer’s
what proteins are involved in signal transduction
and
are related to pyramidal neurons?
Tim Berners-Lee’s Linked Data slides, from TED 2009
best known as the inventor of the World Wide Web http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
Many hits but no relevant results
Linked healthcare data 32 hits, 32 results
LINKED DATA
Definition
The term Linked Data refers to a set of best
practices for publishing and connecting
structured data on the Web using international
standards of the World Wide Web Consortium
Wikipedia
• Infopages
• box in the upper right of the page
• names, dates, places
• DBpedia project
• extracts this structured data
• publish as Linked Data
http://dbpedia.org
Topic Datasets %
Government 183 18.05%
Publications 96 9.47%
Life Sciences 83 8.19%
User-generated content 48 4.73%
Cross-domain 41 4.04%
Media 22 2.17%
Geographic 21 2.07%
Social Web 520 51.28%
Total 1014
State of the LOD Cloud 2014
Important properties
• easily combined with other Linked Data
• best reason to explore and use Linked Data
• Important for data sharing
• self-documenting
• immediately figure out what a term means
• Linked Data can be private
• widely deployed behind enterprise firewalls on private networks
Linked Data from Forms
• Manual human annotation like in wikipedia
• only works with a wide set of users
• Selecting the right terms is non-trivial
• takes time
• requires knowledge
• Common problems
• missing values
• too generic to be useful B. Inácio, J. Ferreira, and F. Couto, Metadata analyser: measuring metadata quality, in Practical Applications of Computational Biology and Bioinformatics (PACBB), pp. 197-204, 2017
Alternatives for Imagiology
• Explore the written reports
• Text mining approaches
• Use a specialized terminology (RadLex)
• multilingual approaches
• Sharing and privacy
• Not images only the metadata
• Still there are privacy concerns
Imagiology Written Reports
• written in free-text
• well structured
• more than any other clinical notes
• since the information has to pass from one health care professional to another
Text Mining Tools
• Dictionary-based
• only requires a common terminology
• limited to that terminology
• ambiguity of terms
• Machine Learning
• requires a trainning set
• not limited to a terminology
• learns with experience
RadLex
• RSNA has produced RadLex(R)
• ontology focused on radiology
• terms (i.e. a lexicon) and the relationships
• over 75,000 terms and synonyms
https://www.rsna.org/RadLex.aspx
WHY USING A COMMON TERMINOLOGY?
Cell
22
London Bills of Mortality listed possible ways to die throughout the
sixteenth, seventeenth and eighteenth centuries
Source: http://faculty.up.edu/asarnow/popular7.htm
23
Aggregated Stats
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc= "http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Sintra_Collar"> <dc:description> Gold collar. It was made from three circular sectioned and tapering gold bars that are fused at the ends forming a penannular neck-ring. </dc:description> <dc:date>1250BC-800BC (circa)</dc:date>
<dc:location> Sintra, Portugal </dc:location> <dc:type> Gold </dc:type>
</rdf:Description> </rdf:RDF>
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc= "http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Sintra_Collar"> <dc:description> Gold collar. It was made from three circular sectioned and tapering gold bars that are fused at the ends forming a penannular neck-ring. </dc:description> <dc:date>1250BC-800BC (circa)</dc:date>
<dc:location> Sintra, Portugal http://yboss.yahooapis.com/geo/placefinder?woeid=748874 </dc:location> <dc:type> Gold http://purl.obolibrary.org/obo/CHEBI_30050 </dc:type> </rdf:Description> </rdf:RDF>
Metal
Silver
Coinage Precious
Palladium Gold Platinum Copper
is-a
mappings
Example
• MER
• Minimal Entity Recognizer http://labs.fc.ul.pt/mer/ Couto, F. M., Campos, L., & Lamurias, A. MER: a minimal named-entity recognition tagger and annotation server. Proceedings of the BioCreative, 2017
• DiShIn
• Semantic Similarity Measures using Disjunctive Shared Information http://labs.fc.ul.pt/dishin/ F. Couto and M. Silva, Disjunctive shared information between ontology concepts: application to Gene Ontology, Journal of Biomedical Semantics, vol. 2, no. 5, pp. 1-16, 2011
Language
Usually written in the native language
Text Mining tools mostly developed for English
RadLex is currently only available in English
German and Portuguese translations of RadLex are currently in development
Obstacle in the sharing information
Two approaches
1) Translate the lexicon itself
• new version of lexicon requires translation
2) Translate the reports
• efficient automatic translation services nowadays available
• state-of-the-art Text Mining tools tuned for English
Multilingual Reports
• reports accessible to any doctor
• who understands English
• tourists access their reports in their language
• send them to their personal doctor at home
• hospital can get a highly specialized second opinion
• in complex clinical cases from international experts
Translation Efficiency
• 51 Research Articles related to Radiology
• originally written in Portuguese
• a human translation in English was available
• We measured NER accuracy
• using Yandex, Google and Unbabel L. Campos, V. Pedro, and F. Couto, Impact of translation on named-entity recognition in radiology texts, Database, vol. 2017, no. bax064, pp. 1-9, 2017
Multilingual System Prototype
• Given a report
• Automatically translates the report
• Find the most similar reports
• According to most similar RadLex terms
PRIVACY
Privacy in Imagiology
• Like sharing photos
• After being in the web stays in the web
• An issue can be passed from family to family
• Encrypt data
• If important is a question of time
• Cloud storage
• Is it secure?
• Split through different providers.
Privacy in linked data
• You can share metadata without sharing really privacy sensitive information
• show what kind of images are available
• You control which person to give that data
• using a secure channel M. Fernandes, J. Decouchant, F. Couto, and P. Verissimo, Cloud-assisted read alignment and privacy, in Practical Applications of Computational Biology and Bioinformatics (PACBB), pp. 220-227, 2017
Final Remarks (1)
• Linked Data is not free or open data
• is not sound data
• it can have access restrictions
• be incomplete and have errors
• But many successful use cases in the Life and Health Sciences
M. Barros and F. Couto, Knowledge representation and management: a linked data perspective, IMIA Yearbook of Medical Informatics, pp. 178-183, 2016
Final Remarks (2)
• go beyond technological advances
• create motivation mechanisms
• encourage data owners to share their data
• in a meaningful way
• Science is about replication
• without access to data there is no replication
Thanks!
Current Team:
• André Lamúrias and Tânia Maldonado (Text Mining)
• Gonçalo Figueiró (MRIR)
• Maria Fernandes and Mariana Pinhão (Genomic Privacy)
Publications & Tools
• http://labs.fc.ul.pt/
• http://webpages.fc.ul.pt/~fjcouto/?page_id=100
Recommended