47
Continuing toward a Global Digital Mathematics Library {The International Mathematical Knowledge Trust} Patrick D. F. Ion 1 Olaf Teschke 2 1 MR AMS ret’d & University of Michigan, MI USA [email protected] 2 zbMath, Berlin, Germany [email protected] 10 January 2018 / JMM 2018 — Special Session 83A

Continuing toward a Global Digital Mathematics Library ... · History Otlet I (1895 + ) A highly advanced index card machine: “a moving desk shaped like a wheel, powered by a network

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Continuing toward aGlobal Digital Mathematics Library

{The International Mathematical Knowledge Trust}

Patrick D. F. Ion 1 Olaf Teschke 2

1 MR AMS ret’d & University of Michigan, MI USA [email protected]

2 zbMath, Berlin, Germany [email protected]

10 January 2018 / JMM 2018 — Special Session 83A

GDML — Global Digital Mathematics LibraryWhat is it?

I Global — for all the World, from all the WorldI Digital — using current technologyI Mathematics — for a specific subject, especially researchI Library — a knowledge base

GDMLPerhaps better

Worldwide Information Systemfor

Digitally Organized Mathematics

HistoryAncient

I Great Library of Alexandria in the Mouseion founded ca.323 BCE by Ptolemy.

I Archimedes (287-212 BCE)I Eratosthenes (276-195 BCE)I Apollonius (262-190 BCE)I Aristarchus of Samos (310- 230 BCE)I Hero (ca. 10 CE-70 CE)I Hypatia, daughter of Theon, the last director of the

Mouseion lynched by a rabble in 415 CEI New Bibliotheca Alexandrina

I ca. 700 years

Bibliotheca AlexandrinaAlexandria

HistoryRecent

I Pasigraphy: E. Schröder, G. Peano at ICM 1897I Georg Valentin’s mathematical bibliography to 1928I Paul Otlet and Henri La Fontaine:

“Repertoire Bibliographique Universel" (RBU) from 1895Mundaneum 1924 to ca. 1941 in Mons

I Vannevar Bush imagined Memex in 1945 (Shannon)

HistoryOtlet

I (1895 + ) A highly advanced index card machine:“a moving desk shaped like a wheel, powered by a networkof hinged spokes beneath a series of moving surfaces.The machine would let users search, read and writetheir way through a vast mechanical database stored onmillions of 3×5 index cards.This new researchenvironment would do more than just let users retrievedocuments; it would also let them annotate therelationships between one another, the connectionseach [document] has with all other [documents], formingfrom them what might be called the Universal Book.”

Around 1900 - Otlet’s Vision

HistoryOtlet

I (1934) Otlet suggests plans for a global network of electrictelescopes that would allow people to search and browsethrough millions of interlinked documents, images, audioand video files. He described how people would use thedevices to send messages to one another, share files andeven congregate in online social networks. He called thewhole thing a “réseau”.

I Otlet described a networked world where “anyone in hisarmchair would be able to contemplate the whole ofcreation”.

Around 1940 - Otlet’s End

Vannevar Bush - Memex - 1945

Royal McBee LGP30 - 1958

HistoryWDML

I Late 1990’s: initial visionI 1998: WDML endorsed by the International Mathematical

Union (IMU)I 2001: IMU issues “Call to All Mathematicians to Make

Publications Electronically Available”I 2000’s: large digitization projectsI 2006: IMU Report Digital Mathematics Library:

“A Vision for the Future”

HistoryWDML

I 2010: European Digital Mathematics Library (EuDML)I Digital Public Library of America launches with

support of Sloan FoundationI 2011: Alfred P. Sloan Foundation funds WDML workshop

at NAS November, 2012I 2013: NAS/NRC Report “The Mathematical Sciences in

2025”, JanuaryI 2014: NAS/NRC Report “Developing a 21st Century

Global Library for Mathematics Research” , March[Daubechies, Lynch]

GDML

I 2014: Seoul ICM Meeting, AugustI Creation of GDML WGI 2015: Recognized as WG of IMU CEIC:

Committee on Electronic Communication and Information

GDML Working GroupAustria 1, Canada 1, France 1, Germany 2, USA 3

I Thierry Bouche (Université Joseph Fourier, Grenoble, France)I Bruno Buchberger (Johannes Kepler Universität, Linz, Austria)I Patrick Ion (AMS & UM, Ann Arbor MI, USA)I Michael Kohlhase (Friedrich-Alexander-Universität

Erlangen-Nürnberg, Germany)I Jim Pitman (University of California, Berkeley CA, USA)I Olaf Teschke (zbMATH, Berlin, Germany)I Stephen Watt (University of Waterloo, Waterloo ON,

Canada)I Eric Weisstein (Wolfram Research Inc, Champaign IL,

USA)

GDMLMission

To construct, as a global public good, an open knowledge baseencompassing the results of the world’s mathematics throughcollaborations deploying both present and new technology, andto foster a supporting community.

GDMLGoals

I To enhance openness and accessibility of all mathematicalknowledge world-wide, present, past and future.

I To serve research mathematics, education and thescientific and technological use of mathematics.

I To be a resource for developing tools to promote use anddevelopment of mathematics.

I To facilitate creation, dissemination and archiving ofsemantically annotated mathematical material.

I To encourage the collaborative development of servicesbased on semantic annotation.

GDMLRole

The GDML tries to achieve its goals by building collaborations.The effort involves the creation of standards and indications ofbest practices, encouraging the instantiation of such standardswith content, and making such content openly available.

IssuesCategories

I Organization, Governance & CommunityI Corpus & CollectionI Tools & ServicesI Knowledge Management

IssuesOrganization, Governance & Community

I Chicken and egg: International Mathematical Union, WGI International — legal, communication: examples

I HathiTrust, DPLA, JSTOR, COS, ...I Mathematics as a Universal Language: math community

IssuesCorpus & Collection

I What must a GDML include?I How big is our literature?I What progress is there in digitization, both in quality and

quantity?I Where are digital math collections today, whether records

of print, newer document types or software and data?I What are copyright and other restrictions on our

mathematical legacy?

IssuesCorpus & Collection

I BoundariesI Advanced Research Mathematics (mostly)I Applied Mathematics (the theoretical)I Any natural language (mostly English presently)I MR ∪ zb ?? [What is MR ∩ zb ?]I Legacy material versus broad present

I OwnershipI Publishing is a businessI Mathematics is a branch of knowledgeI Mathematical facts are not patentableI Much publication metadata is publicI Collections of such are not intrinsically held to be public

I Competition to be replaced by collaboration

Math Literature

I Formal literatureI Informal literatureI Research monographsI Expository works (surveys, tutorials, user guides)I Specialized collections for specific topics:

Where is Mathematical Knowledge?

I PeopleI Research JournalsI Textbooks and MonographsI Informal literatureI DatasetsI SoftwareI Web sites

Math KnowledgeVariety

I Conjectures that turn out to be false.I Proofs with flaws.I Proofs sketches and analogies.I Application of results where conditions are not verified.I Approximations and probabilistic statements.I “most of these terms are probably wrong, but a little

inaccuracy sometimes saves tons of explanation” [H. H.Munro (“Saki”) 1870-1916]

WebArticle Repositories

I Publisher archives [Science Direct, SpringerLink . . . ]I JSTORI HathiTrustI arXivI EuDMLI NUMDAMI GallicaI Göttinger DigitalisierungszentrumI RusDMLI Private Lists

WebIndexes, Reviews, Author Databases

I InspecI MathSciNet [Mathematical Reviews]I zbMATH [Zentralblatt; Jahrbuch FM]I Math Genealogy ProjectI Google ScholarI ResearchGateI WikipediaI Mathworld

WebSpecialized Tools and Databases

I OEIS: Online Encyclopedia of Integer SequencesI DLMF: : Digital Library of Mathematical FunctionsI DRMF: Digital Repository of Mathematical FormulasI DDMF: Dynamic Dictionary of Mathematical FunctionsI Online Integral CalculatorI Inverse Symbolic CalculatorI Atlas of Finite Simple GroupsI LFMFDB: L-functions and modular forms databaseI Combinatorial Statistic FinderI A Catalogue of LatticesI Encyclopedia of Triangle Centers

WebProof Libraries

I Mizar Mathematical LibraryI Archive of Formal ProofsI MetaMath Proof ExplorerI . . .

WebSoftware Systems

I GeogebraI Pari/GPI ProofWebI Wolfram|Alpha ...I MapleI MathematicaI FlyspecI Coq,I . . .I Guide to Available Mathematical SoftwareI GitHub

DigitizationKnowledge from Documents

1. Assemble document collections2. Capture page sets (born-digital vs. scanned)3. OCR of text and formulas4. Capture metadata / index and link5. Semantic capture6. Apply knowledge tools

IssuesCorpus & Collection

I Materials: just discussedI Cataloging: metadata standards; EuDML

I zbMATH, MathSciNetI EuDML, Beebe, other public aggregations

I Authority, Trust, Provenance: current standards?I Reproducible Research Standard - V. StoddenI Crowd sourcingI Annotation and personal collecting: Mendeley, Bibsonomy

IssuesTools & Services

I Multilingual: UnicodeI Formulas: MathML, OpenMath, TEX / LATEXI Multiform: XML for description, or whatever’s neededI Listings; Annotation: lack of full support; W3C AnnotationI Data-mining: LDA; NLP+: MathWordNet

Blei & Lafferty; Zanibbi & GilesI Corpus structure: graph analysis & visualization;

— simplicial complex homology; persistent homology

IssuesKnowledge Management

I Classification: MSC 2010 in SKOS (Linked Open Data)MSC 2020 Revision

I OntologyI Semantic Intermediate Abstraction Language

I between basic markup and formalizationI previous attempts — flexiformalityI Part of Math taggingI semantic search, . . .

I Previous attemptsI Automath (1960), . . .I Maple, Mathematica

I Issues of proofI Computer AssistedI Four-Color, Kepler-Hales, Odd-Order, . . . TheoremsI JVM and chip verification

StatusContent aggregation (I)

I Corpus estimation: Based on a large sample, MR ∩ zb ∼60% of zb / 66% of MR (±5% matching error)

I Total since 1868 about 190,000 books with average ∼350p., about 3.9 million articles with average ∼14.4 pages

I Makes up about 120 million pages of mathematics, almostevenly distributed between books and articles (note: thisration has changed significantly through the years!)

I Note: Consistent with older estimates (70-100 millionpages some years ago by Keith Dennis); older items arerelatively small in numbers much require high digitizationefforts (e. g., Göttingen bequests collection)

StatusDigitization

Various levels of digititation have been achieved:

I Scan/pdf (∼ 80% of documents, ∼ 60% of pages)I Open available pdf (∼ 20% of documents, ∼ 10% of pages)I Open available LATEX, XML, MathML ready for content

analysis, formula processing . . . (∼ 5% of documents,∼ 2.5% of pages)

StatusContent aggregation (II)

Beyond literature, mathematical information is aggregated inincreasingly diverse form

I Mathematical software: GAMS, swMATH, repositories . . .

I Research data collections: OEIS, DMLF, LMFDB, ManifoldAtlas Project, Electronic Geometry Models, ATLAS ofFinite Group Representations . . .

I Oral and visual mathematics (conference videos,collections of slides, visualizations . . .)

I Discussion/Collaboration platforms (MathOverflow,Polymath, Encyclopedia of Mathematics, Wikidata . . .)

Examples

GDML2016

I JMM Special Session on Mathematical Information in theDigital Age of Science, Seattle Jan 9-11 2016

I Semantic Representation of Mathematical KnowledgeWorkshop, Fields Institute February 3–5 2016, withWolfram Research as Sloan grant recipient

I Applied for and received Sloan grant to found anInternational Mathematical Knowledge Trust (IMKT)

GDML2017

I Foundation of IMKT based in Waterloo ON, Canada [July]I Boards: Governance and Scientific AdvisoryI Work groups

I Short term: Outreach, seed projects, coordinationI Long term: Make available the “totality” of mathematical

knowledge in digital form employing human- andmachine-usable knowledge tools

I InitiativesI FAbstractsI Special Function ConcordanceI FHarmonyI Document analysis: n-gram studies

I FAbstracts and FHarmony at Big Proof, Cambridge 10–14July 2017

InitiativeFAbstracts

I FAbstracts means formal abstracts.I Extract the main results from published mathematical

papers into language both human and machine readable.I Each mathematical term used should be defined in

language both human and machine readable.I Ultimately the statements and definitions should be so

precise that they can be translated in a fully automatedway into statements and definitions in a proof assistant.

I The language that is used for the FAbstracts should be soexpressive that ordinary mathematicians should be able tounderstand entries.

I A start could be made on such a project by choosing asuitable area for which there is already a good basis offormalized results available.

InitiativeFHarmony

I FHarmony is a harmonization project for formal systems.This is related naturally to the previous necessity forcommunication between parallel efforts in FAbstracts.

I FHarmony is concerned with the technical aspects ofconstructing bridgework and crossovers between formalframeworks. For instance, how similar or different aredifferent formalizations of the Jordan Curve Theorem, say.Perhaps this is a particularly good example because of thecontroversy over the history of its proof.

I HoTT libraries for Coq, HOL Light, Agda, Lean, . . .

GDML2018

I JMM Special Session on Mathematical Information in theDigital Age of Science, San Diego, Jan 9–11 2018

I ICM 2018, 1–9 August 2018, Panel on Digital Libraries

FutureI Organization, Governance & Community

I Community building, Asian, US and European Trust entities,I Web presence and Wiki on the initiatives

I Collection DevelopmentI Collaboration with EuDML, arXiv and EuclidI Collaboration with Wikipedia, WikiDataI Contact with potential Asian partners

I Tools & ServicesI Mathematical Object Identifiers (MOI)I Proposal toward open access book identificationI Digitization Catalog & DML documents and wikiI IMU Proceedings

I KnowledgeI InitiativesI Stacks Project?I Machine Learning results: Lafferty & Blei; Zanibbi & GilesI Collaboration with WRI

Our Mission

To construct, as a global public good, an open knowledge baseencompassing the results of the world’s mathematics throughcollaborations deploying both present and new technology, andto foster a supporting community.

Grand Challenge

To extract and mechanize the world’s mathematical knowledge.

I Mathematical knowledge appears in the literature, dataand code.

I Mathematical knowledge management tools becomeuseful and necessary when dealing with large corpora.

I Collecting the entirety of published mathematics andapplying MKM is within reach.