Changing Landscape of Science & Technology Libraries...

Preview:

Citation preview

Innovations in Digital Library Curation, Ranking, & Management

Mayank Singh Assistant Professor

Dept. of CSE, Indian Institute of Technology Gandhinagar

Changing Landscape of Science & Technology Libraries (CLSTL 2019)

Scientific documents

Secondary Sources

Books

Scientific LiteraturePatent articles

Primary Sources

Review articles

Scientific Literature

Tertiary Sources

Magazines Blogs Crowd-sourced platforms

Focus of the Talk

Platforms that curate, rank & manage

Scientific Documents

Scientific documents

Google Scholar

Michael Gusenbauer, Scientometrics’18

389 million documents including articles, citations and patents

Google Scholar

Google Scholar

Microsoft Academic Search

212M documents, 256M authors

Microsoft Academic Search

Microsoft Academic Search

Semantic Scholar

GrapAL

And many more….

● Ferosa

● CLScholar

● Discern

● ….

Scholarly Ranking

Scholarly RankingTitle

Older papers

URLS

Scholarly RankingTitle

Older papers

Well-knownURLS

Can we rank papers based on performance?

Tabular Information

Comparative

Descriptive

Performance Improvement Graphs

Embedding comparative information into graphs to rank research papers

Dataset

Local Sanitization (prune noisy extractions)

improvement scores

Prune edges having improvement scores > 100%.

Improvement score

Im(B,C)= 100*(0.6-0.1) 0.1

Local Sanitization (combining multi-edges)

No guarantee of the same data set or experimental conditions across different tables, leave alone different papers.

1. UNW — Unweighted Graph

2. ALL — Weighted Graph (Total number of comparisons)

3. UNQ —Weighted Graph (Unique number of metric comparisons)

4. SIG — Sigmoid of actual improvements on edges (MAX and Average)

Ranking Schemes and SOTA Lists

1. Sink nodes

2. Cocitation

3. Linear tournament

4. Exponential tournament

5. PageRank

https://github.com/sbrugman/deep-learning-papers

Performance Evaluation

● GS and SS are mediocre

● Among baselines, sink node search

led to worst performance

● Co-citation performed quite well

● UNW is better than UNQ

Organic Leaderboards

Metrics

Competing papers

Ranks

Generating Organic Leaderboards

Conclusion and Future Directions● Framework to mine experimental performance from papers embedded

within comparative tables.

● Information extraction from figures and tables embedded in PDF research articles.

● Extension to non-CS domains.

Thank you for your attention!

Contact me: mayank4490@gmail.com, singh.mayank@iitgn.ac.in

Webpage: http://mayank4490.github.io/

Recommended