1
Integrating and Building on Elasticsearch and VIVO-ISF Data Don Elsborg, Nate Prewitt and Alex Viggio https://experts.colorado.edu Moving Forward The Case for CU Experts Map Why Elastic and Faceted Search? VIVO only facets on direct class hierarchies. Attempts were made at a VIVO hackathon to create faceted views of profile data that crossed class hierarchies. It proved a complex undertaking, involving a deep understanding of many VIVO Java classes. This effort hasn’t continued past the hackathon. RPI’s Deep Carbon site adopted the FacetView2 JavaScript frontend which easily creates facets for data. However, it is based on Elasticsearch, an alternative to Apache Solr. RPI created a Python extract script to pull VIVO data via SPARQL and import it into Elasticsearch. Once the data was in Elasticsearch it was not challenging to create VIVO faceted views for lists of People and Equipment/Analytical Services. We plan on replacing other VIVO list views in the near future. CU Boulder doesn’t rely on VIVO’s functionality to edit data. The CU Experts data is refreshed nightly from systems of record, allowing us to reuse that Python script to update our Elasticsearch backend. What’s special about Facets? Facets allow a user to narrow down results by selecting the types of items that should be included in a result set. They can be used to return: Number of faculty members in the English and Humanities departments with an expertise in “Literary Historyand “Literary CriticismWith a search term and a few clicks, a user can quickly find what they are looking for. Facets are helpful to refine a search result set. However, it is challenging to add data to a result set once it has been narrowed down. This led us to implement a version of the Capability Map visualization developed by the VIVO team at the University of Melbourne. The top image demonstrates how CU Experts research terms are mined from the researchers in a selection. Any research terms that are not in the selection can be clicked on and added to the map. The use of indexing is fundamental to providing aggregate summary data in modern web applications, whether it be Elasticsearch, Apache Solr, or Funnelback. As a community we should strive to provide a consistent API for client developers. Work with the VIVO community to ensure that all referenced ontologies are accurate, persistent, and routable via the Internet. Explore the utility of a JavaScript library that harnesses VIVO-ISF JSON-LD data more readily for presentation. This library would traverse the context nodes, VIVO faux properties, and generate structures with simple relationships. For example, publication/author relationships use context nodes which aren’t easily comprehended by external developers. Work with VIVO community to enhance VIVO’s linked data content generation.. Should the content on a VIVO web page and a VIVO LOD document be consistent? We must keep sight of the end game: Institutions hold a vast amount of metadata about research and scholarship. VIVO is a mechanism for aggregating and surfacing this metadata. The public and industry would derive great value from this metadata. Let’s focus on getting this metadata out to the public! Acknowledgements Special thanks to our collaborators who provided leadership and guidance in this space: Stephan Zednik, Patrick West, and Simon Porter. References https://deepcarbon.net/ http://findanexpert.unimelb.edu.au https://github.com/CottageLabs/facetview2 https://github.com/elastic/elasticsearch In the screen shot above, any of the blue research terms can be clicked and added to the Experts Map. This is in addition to the initial Capability Map functionality developed by the University of Melbourne VIVO team to remove terms, highlighted in yellow, from the map. Currently Elasticsearch only holds basic index information for people. So the researcher’s information shown in the Info tab above comes out of Elasticsearch, and links to a standard VIVO researcher profile page. It would be nice if the Experts Map could display all of the researcher’s profile data if requested. Should this additional data be stored in and delivered by Elasticsearch, or should it be read from a VIVO API?

Integrating and Building on Elasticsearch and VIVO-ISF · PDF fileIntegrating and Building on Elasticsearch and VIVO-ISF Data Don Elsborg, Nate Prewitt and Alex Viggio https: ... content

Embed Size (px)

Citation preview

Page 1: Integrating and Building on Elasticsearch and VIVO-ISF · PDF fileIntegrating and Building on Elasticsearch and VIVO-ISF Data Don Elsborg, Nate Prewitt and Alex Viggio https: ... content

Template ID: deepnavy Size: 36x48

Integrating and Building on Elasticsearch and VIVO-ISF DataDon Elsborg, Nate Prewitt and Alex Viggio

https://experts.colorado.edu

Moving ForwardThe Case for CU Experts MapWhy Elastic and Faceted Search?

• VIVO only facets on direct class hierarchies. Attempts were made at a VIVO hackathon to create faceted views of profile data that crossed class hierarchies. It proved a complex undertaking, involving a deep understanding of many VIVO Java classes. This effort hasn’t continued past the hackathon.

• RPI’s Deep Carbon site adopted the FacetView2JavaScript frontend which easily creates facets for data. However, it is based on Elasticsearch, an alternative to Apache Solr. RPI created a Python extract script to pull VIVO data via SPARQL and import it into Elasticsearch.

• Once the data was in Elasticsearch it was not challenging to create VIVO faceted views for lists of People and Equipment/Analytical Services. We plan on replacing other VIVO list views in the near future.

• CU Boulder doesn’t rely on VIVO’s functionality to edit data. The CU Experts data is refreshed nightly from systems of record, allowing us to reuse that Python script to update our Elasticsearch backend.

What’s special about Facets?• Facets allow a user to narrow down results by

selecting the types of items that should be included in a result set. They can be used to return: Number of faculty members in the English and Humanitiesdepartments with an expertise in “Literary History” and “Literary Criticism”

• With a search term and a few clicks, a user can quickly find what they are looking for.

• Facets are helpful to refine a search result set. However, it is challenging to add data to a result set once it has been narrowed down. This led us to implement a version of the Capability Map visualization developed by the VIVO team at the University of Melbourne.

• The top image demonstrates how CU Experts research terms are mined from the researchers in a selection. Any research terms that are not in the selection can be clicked on and added to the map.

• The use of indexing is fundamental to providing aggregate summary data in modern web applications, whether it be Elasticsearch, Apache Solr, or Funnelback. As a community we should strive to provide a consistent API for client developers.

• Work with the VIVO community to ensure that all referenced ontologies are accurate, persistent, and routable via the Internet.

• Explore the utility of a JavaScript library that harnesses VIVO-ISF JSON-LD data more readily for presentation. This library would traverse the context nodes, VIVO faux properties, and generate structures with simple relationships. For example, publication/author relationships use context nodes which aren’t easily comprehended by external developers.

• Work with VIVO community to enhance VIVO’s linked data content generation.. Should the content on a VIVO web page and a VIVO LOD document be consistent?

• We must keep sight of the end game:

Institutions hold a vast amount of metadata about research and scholarship. VIVO is a mechanism for aggregating and surfacing this metadata. The public and industry would derive great value from this metadata. Let’s focus on getting this metadata out to the public!

Acknowledgements

Special thanks to our collaborators who provided leadership and guidance in this space: Stephan Zednik, Patrick West, and Simon Porter.References• https://deepcarbon.net/• http://findanexpert.unimelb.edu.au• https://github.com/CottageLabs/facetview2• https://github.com/elastic/elasticsearch

• In the screen shot above, any of the blue research terms can be clicked and added to the Experts Map. This is in addition to the initial Capability Map functionality developed by the University of Melbourne VIVO team to remove terms, highlighted in yellow, from the map.

• Currently Elasticsearch only holds basic index information for people. So the researcher’s information shown in the Info tab above comes out of Elasticsearch, and links to a standard VIVO researcher profile page. It would be nice if the Experts Map could display all of the researcher’s profile data if requested.

• Should this additional data be stored in and delivered by Elasticsearch, or should it be read from a VIVO API?