1
Name Linking Application (NLA) REST API INTRODUCTION Large research organizations have begun deploying Research Networking Systems (RNSs) that allow for easy search of their research experts and related networks as a means of fostering research and collaborations. However, increasing the adoption of RNSs has posed challenges. Previous data show that cross-linking (i.e., links to the RNSs on other websites) serves as an effective tool to significantly increase the traffic to and adoption of RNSs. A strategy to embed those cross-links on other websites requires the establishment of partnerships and is a time intensive effort that may take months or years before yielding significant results. We developed a more effective solution: an automated approach (i.e., NLA, Name Linking Application) that identifies researchers' names on web pages (e.g., University news, directory, department pages) and links them to their respective researcher page (if it exists). RESULTS METHODS Our NLA utilizes natural language processing based on Named Entity Recognition (NER). We use an NER library based on CRF (Conditional Random Fields) originally developed at Stanford University to identify names in text. Our application searches the RNS database for the found names. Every match in the RNS database gets a score based on the quality of the match. We apply a threshold score to filter out distant (poor) matches, which further filters out false positives. Each article was manually checked for accuracy ensuring that all names in a news article were automatically recognized and linked to the correct profiles. To make NLA easy to use, we have given it a RESTful interface to which any entity on the Internet can send content scanning requests. Thus, the NLA can be used with any website) dynamically, by injecting a client side script into the website's pages, or ii) statically, by integrating NLA with a Content Management System (CMS) used by the website. CONCLUSION Previous data show that the RNS search traffic is heavily dominated by commercial search engines such as Google and Bing, which is why we believe that this application will serve research organizations well to efficiently promote and increase the adoption of their RNSs without requiring time-intensive manual cross-linking. The tool allows for automatic and accurate cross-linking within news articles and website content to specific profile pages, improving search engine ranking and generating referral traffic. NEXT STEPS Our NLA can be extended for any RNS (including VIVO) with minimal code modifications. We will develop a client side library that can be installed on any website. This library will allow the NLA tool to automatically link content on partner websites. ACKNOWLEDGEMENTS This project was partially supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR000130 (formerly by the National Center for Research Resources, Award Number UL1RR031986). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. It’s All About the Links An Automated Approach to Promote Research Networking Systems More Efficiently Anirudha Kumar, Praveen Angyan, Francis Ukpolo, Katja Reuter, PhD Southern California Clinical and Translational Science Institute (SC CTSI) 100% Of full names in news articles were detected. Definition of full name: First name followed by last name with an optional middle name. 97.4% Of names in news articles were linked to the correct page on our Profiles RNS installation. In four instances the NLA did not succeed due to name contractions or where the legal name had changed. 10 Sec per Article 144 news articles were processed by the NLA, including 156 names. HOW IT WORKS News Article on Organizational Website Linked Investigator Name in News Article Linked Investigator Profile Framework of the NLA Content Management System (CMS) Stanford Named Entity Recognizer (NER) Content Processor Unprocessed HTML Content Processed HTML Content (with Profile links) Profiles RNS REST API Full Name Search Search Results

It’s All About the Links: An Automated Approach to Promote Research Networking Systems More Efficiently

Embed Size (px)

DESCRIPTION

SC CTSI Team Members Present at International VIVO Conference: Over 115 authors from five continents presented at the 2014 VIVO conference focused on opportunities created by advancing data sharing and team science. Topics ranged from implementation to ontologies, visualization, and collaboration in research and scholarship. Large research organizations have become increasingly interested in VIVO and other Research Networking Systems (RNSs) as a means of addressing multidisciplinary research challenges, fostering collaboration, and increasing the visibility of individual investigators. Read the full story: http://sc-ctsi.org/index.php/news/sc-ctsi-team-members-present-at-international-vivo-conference#.U_aiTmK9mKU

Citation preview

Page 1: It’s All About the Links: An Automated Approach to Promote Research Networking Systems More Efficiently

Name Linking Application (NLA) REST API

INTRODUCTION Large research organizations have begun deploying Research Networking Systems (RNSs) that allow for easy search of their research experts and related networks as a means of fostering research and collaborations. However, increasing the adoption of RNSs has posed challenges. Previous data show that cross-linking (i.e., links to the RNSs on other websites) serves as an effective tool to significantly increase the traffic to and adoption of RNSs. A strategy to embed those cross-links on other websites requires the establishment of partnerships and is a time intensive effort that may take months or years before yielding significant results. We developed a more effective solution: an automated approach (i.e., NLA, Name Linking Application) that identifies researchers' names on web pages (e.g., University news, directory, department pages) and links them to their respective researcher page (if it exists).

RESULTS

METHODS Our NLA utilizes natural language processing based on Named Entity Recognition (NER). We use an NER library based on CRF (Conditional Random Fields) originally developed at Stanford University to identify names in text. Our application searches the RNS database for the found names. Every match in the RNS database gets a score based on the quality of the match. We apply a threshold score to filter out distant (poor) matches, which further filters out false positives. Each article was manually checked for accuracy ensuring that all names in a news article were automatically recognized and linked to the correct profiles. To make NLA easy to use, we have given it a RESTful interface to which any entity on the Internet can send content scanning requests. Thus, the NLA can be used with any website) dynamically, by injecting a client side script into the website's pages, or ii) statically, by integrating NLA with a Content Management System (CMS) used by the website.

CONCLUSION Previous data show that the RNS search traffic is heavily dominated by commercial search engines such as Google and Bing, which is why we believe that this application will serve research organizations well to efficiently promote and increase the adoption of their RNSs without requiring time-intensive manual cross-linking. The tool allows for automatic and accurate cross-linking within news articles and website content to specific profile pages, improving search engine ranking and generating referral traffic.

NEXT STEPS Our NLA can be extended for any RNS (including VIVO) with minimal code modifications. We will develop a client side library that can be installed on any website. This library will allow the NLA tool to automatically link content on partner websites.

ACKNOWLEDGEMENTS This project was partially supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR000130 (formerly by the National Center for Research Resources, Award Number UL1RR031986). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

It’s  All  About  the  Links An Automated Approach to Promote Research Networking Systems More Efficiently

Anirudha Kumar, Praveen Angyan, Francis Ukpolo, Katja Reuter, PhD

Southern California Clinical and Translational Science Institute (SC CTSI)

100% Of full names in news articles were detected. Definition of full name: First name followed by last name with an optional middle name.

97.4% Of names in news articles were linked to the correct page on our Profiles RNS installation. In four instances the NLA did not succeed due to name contractions or where the legal name had changed.

10 Sec per Article 144 news articles were processed by the NLA, including 156 names.

HOW IT WORKS

News Article on Organizational Website

Linked Investigator Name in News Article

Linked Investigator Profile

Framework of the NLA

Content Management

System (CMS)

Stanford Named Entity Recognizer (NER) Content Processor

Unprocessed HTML Content

Processed HTML Content (with Profile links)

Profiles RNS

REST API

Full Name Search

Search Results