Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Continuing toward aGlobal Digital Mathematics Library
{The International Mathematical Knowledge Trust}
Patrick D. F. Ion 1 Olaf Teschke 2
1 MR AMS ret’d & University of Michigan, MI USA [email protected]
2 zbMath, Berlin, Germany [email protected]
10 January 2018 / JMM 2018 — Special Session 83A
GDML — Global Digital Mathematics LibraryWhat is it?
I Global — for all the World, from all the WorldI Digital — using current technologyI Mathematics — for a specific subject, especially researchI Library — a knowledge base
HistoryAncient
I Great Library of Alexandria in the Mouseion founded ca.323 BCE by Ptolemy.
I Archimedes (287-212 BCE)I Eratosthenes (276-195 BCE)I Apollonius (262-190 BCE)I Aristarchus of Samos (310- 230 BCE)I Hero (ca. 10 CE-70 CE)I Hypatia, daughter of Theon, the last director of the
Mouseion lynched by a rabble in 415 CEI New Bibliotheca Alexandrina
I ca. 700 years
HistoryRecent
I Pasigraphy: E. Schröder, G. Peano at ICM 1897I Georg Valentin’s mathematical bibliography to 1928I Paul Otlet and Henri La Fontaine:
“Repertoire Bibliographique Universel" (RBU) from 1895Mundaneum 1924 to ca. 1941 in Mons
I Vannevar Bush imagined Memex in 1945 (Shannon)
HistoryOtlet
I (1895 + ) A highly advanced index card machine:“a moving desk shaped like a wheel, powered by a networkof hinged spokes beneath a series of moving surfaces.The machine would let users search, read and writetheir way through a vast mechanical database stored onmillions of 3×5 index cards.This new researchenvironment would do more than just let users retrievedocuments; it would also let them annotate therelationships between one another, the connectionseach [document] has with all other [documents], formingfrom them what might be called the Universal Book.”
HistoryOtlet
I (1934) Otlet suggests plans for a global network of electrictelescopes that would allow people to search and browsethrough millions of interlinked documents, images, audioand video files. He described how people would use thedevices to send messages to one another, share files andeven congregate in online social networks. He called thewhole thing a “réseau”.
I Otlet described a networked world where “anyone in hisarmchair would be able to contemplate the whole ofcreation”.
HistoryWDML
I Late 1990’s: initial visionI 1998: WDML endorsed by the International Mathematical
Union (IMU)I 2001: IMU issues “Call to All Mathematicians to Make
Publications Electronically Available”I 2000’s: large digitization projectsI 2006: IMU Report Digital Mathematics Library:
“A Vision for the Future”
HistoryWDML
I 2010: European Digital Mathematics Library (EuDML)I Digital Public Library of America launches with
support of Sloan FoundationI 2011: Alfred P. Sloan Foundation funds WDML workshop
at NAS November, 2012I 2013: NAS/NRC Report “The Mathematical Sciences in
2025”, JanuaryI 2014: NAS/NRC Report “Developing a 21st Century
Global Library for Mathematics Research” , March[Daubechies, Lynch]
GDML
I 2014: Seoul ICM Meeting, AugustI Creation of GDML WGI 2015: Recognized as WG of IMU CEIC:
Committee on Electronic Communication and Information
GDML Working GroupAustria 1, Canada 1, France 1, Germany 2, USA 3
I Thierry Bouche (Université Joseph Fourier, Grenoble, France)I Bruno Buchberger (Johannes Kepler Universität, Linz, Austria)I Patrick Ion (AMS & UM, Ann Arbor MI, USA)I Michael Kohlhase (Friedrich-Alexander-Universität
Erlangen-Nürnberg, Germany)I Jim Pitman (University of California, Berkeley CA, USA)I Olaf Teschke (zbMATH, Berlin, Germany)I Stephen Watt (University of Waterloo, Waterloo ON,
Canada)I Eric Weisstein (Wolfram Research Inc, Champaign IL,
USA)
GDMLMission
To construct, as a global public good, an open knowledge baseencompassing the results of the world’s mathematics throughcollaborations deploying both present and new technology, andto foster a supporting community.
GDMLGoals
I To enhance openness and accessibility of all mathematicalknowledge world-wide, present, past and future.
I To serve research mathematics, education and thescientific and technological use of mathematics.
I To be a resource for developing tools to promote use anddevelopment of mathematics.
I To facilitate creation, dissemination and archiving ofsemantically annotated mathematical material.
I To encourage the collaborative development of servicesbased on semantic annotation.
GDMLRole
The GDML tries to achieve its goals by building collaborations.The effort involves the creation of standards and indications ofbest practices, encouraging the instantiation of such standardswith content, and making such content openly available.
IssuesCategories
I Organization, Governance & CommunityI Corpus & CollectionI Tools & ServicesI Knowledge Management
IssuesOrganization, Governance & Community
I Chicken and egg: International Mathematical Union, WGI International — legal, communication: examples
I HathiTrust, DPLA, JSTOR, COS, ...I Mathematics as a Universal Language: math community
IssuesCorpus & Collection
I What must a GDML include?I How big is our literature?I What progress is there in digitization, both in quality and
quantity?I Where are digital math collections today, whether records
of print, newer document types or software and data?I What are copyright and other restrictions on our
mathematical legacy?
IssuesCorpus & Collection
I BoundariesI Advanced Research Mathematics (mostly)I Applied Mathematics (the theoretical)I Any natural language (mostly English presently)I MR ∪ zb ?? [What is MR ∩ zb ?]I Legacy material versus broad present
I OwnershipI Publishing is a businessI Mathematics is a branch of knowledgeI Mathematical facts are not patentableI Much publication metadata is publicI Collections of such are not intrinsically held to be public
I Competition to be replaced by collaboration
Math Literature
I Formal literatureI Informal literatureI Research monographsI Expository works (surveys, tutorials, user guides)I Specialized collections for specific topics:
Where is Mathematical Knowledge?
I PeopleI Research JournalsI Textbooks and MonographsI Informal literatureI DatasetsI SoftwareI Web sites
Math KnowledgeVariety
I Conjectures that turn out to be false.I Proofs with flaws.I Proofs sketches and analogies.I Application of results where conditions are not verified.I Approximations and probabilistic statements.I “most of these terms are probably wrong, but a little
inaccuracy sometimes saves tons of explanation” [H. H.Munro (“Saki”) 1870-1916]
WebArticle Repositories
I Publisher archives [Science Direct, SpringerLink . . . ]I JSTORI HathiTrustI arXivI EuDMLI NUMDAMI GallicaI Göttinger DigitalisierungszentrumI RusDMLI Private Lists
WebIndexes, Reviews, Author Databases
I InspecI MathSciNet [Mathematical Reviews]I zbMATH [Zentralblatt; Jahrbuch FM]I Math Genealogy ProjectI Google ScholarI ResearchGateI WikipediaI Mathworld
WebSpecialized Tools and Databases
I OEIS: Online Encyclopedia of Integer SequencesI DLMF: : Digital Library of Mathematical FunctionsI DRMF: Digital Repository of Mathematical FormulasI DDMF: Dynamic Dictionary of Mathematical FunctionsI Online Integral CalculatorI Inverse Symbolic CalculatorI Atlas of Finite Simple GroupsI LFMFDB: L-functions and modular forms databaseI Combinatorial Statistic FinderI A Catalogue of LatticesI Encyclopedia of Triangle Centers
WebProof Libraries
I Mizar Mathematical LibraryI Archive of Formal ProofsI MetaMath Proof ExplorerI . . .
WebSoftware Systems
I GeogebraI Pari/GPI ProofWebI Wolfram|Alpha ...I MapleI MathematicaI FlyspecI Coq,I . . .I Guide to Available Mathematical SoftwareI GitHub
DigitizationKnowledge from Documents
1. Assemble document collections2. Capture page sets (born-digital vs. scanned)3. OCR of text and formulas4. Capture metadata / index and link5. Semantic capture6. Apply knowledge tools
IssuesCorpus & Collection
I Materials: just discussedI Cataloging: metadata standards; EuDML
I zbMATH, MathSciNetI EuDML, Beebe, other public aggregations
I Authority, Trust, Provenance: current standards?I Reproducible Research Standard - V. StoddenI Crowd sourcingI Annotation and personal collecting: Mendeley, Bibsonomy
IssuesTools & Services
I Multilingual: UnicodeI Formulas: MathML, OpenMath, TEX / LATEXI Multiform: XML for description, or whatever’s neededI Listings; Annotation: lack of full support; W3C AnnotationI Data-mining: LDA; NLP+: MathWordNet
Blei & Lafferty; Zanibbi & GilesI Corpus structure: graph analysis & visualization;
— simplicial complex homology; persistent homology
IssuesKnowledge Management
I Classification: MSC 2010 in SKOS (Linked Open Data)MSC 2020 Revision
I OntologyI Semantic Intermediate Abstraction Language
I between basic markup and formalizationI previous attempts — flexiformalityI Part of Math taggingI semantic search, . . .
I Previous attemptsI Automath (1960), . . .I Maple, Mathematica
I Issues of proofI Computer AssistedI Four-Color, Kepler-Hales, Odd-Order, . . . TheoremsI JVM and chip verification
StatusContent aggregation (I)
I Corpus estimation: Based on a large sample, MR ∩ zb ∼60% of zb / 66% of MR (±5% matching error)
I Total since 1868 about 190,000 books with average ∼350p., about 3.9 million articles with average ∼14.4 pages
I Makes up about 120 million pages of mathematics, almostevenly distributed between books and articles (note: thisration has changed significantly through the years!)
I Note: Consistent with older estimates (70-100 millionpages some years ago by Keith Dennis); older items arerelatively small in numbers much require high digitizationefforts (e. g., Göttingen bequests collection)
StatusDigitization
Various levels of digititation have been achieved:
I Scan/pdf (∼ 80% of documents, ∼ 60% of pages)I Open available pdf (∼ 20% of documents, ∼ 10% of pages)I Open available LATEX, XML, MathML ready for content
analysis, formula processing . . . (∼ 5% of documents,∼ 2.5% of pages)
StatusContent aggregation (II)
Beyond literature, mathematical information is aggregated inincreasingly diverse form
I Mathematical software: GAMS, swMATH, repositories . . .
I Research data collections: OEIS, DMLF, LMFDB, ManifoldAtlas Project, Electronic Geometry Models, ATLAS ofFinite Group Representations . . .
I Oral and visual mathematics (conference videos,collections of slides, visualizations . . .)
I Discussion/Collaboration platforms (MathOverflow,Polymath, Encyclopedia of Mathematics, Wikidata . . .)
GDML2016
I JMM Special Session on Mathematical Information in theDigital Age of Science, Seattle Jan 9-11 2016
I Semantic Representation of Mathematical KnowledgeWorkshop, Fields Institute February 3–5 2016, withWolfram Research as Sloan grant recipient
I Applied for and received Sloan grant to found anInternational Mathematical Knowledge Trust (IMKT)
GDML2017
I Foundation of IMKT based in Waterloo ON, Canada [July]I Boards: Governance and Scientific AdvisoryI Work groups
I Short term: Outreach, seed projects, coordinationI Long term: Make available the “totality” of mathematical
knowledge in digital form employing human- andmachine-usable knowledge tools
I InitiativesI FAbstractsI Special Function ConcordanceI FHarmonyI Document analysis: n-gram studies
I FAbstracts and FHarmony at Big Proof, Cambridge 10–14July 2017
InitiativeFAbstracts
I FAbstracts means formal abstracts.I Extract the main results from published mathematical
papers into language both human and machine readable.I Each mathematical term used should be defined in
language both human and machine readable.I Ultimately the statements and definitions should be so
precise that they can be translated in a fully automatedway into statements and definitions in a proof assistant.
I The language that is used for the FAbstracts should be soexpressive that ordinary mathematicians should be able tounderstand entries.
I A start could be made on such a project by choosing asuitable area for which there is already a good basis offormalized results available.
InitiativeFHarmony
I FHarmony is a harmonization project for formal systems.This is related naturally to the previous necessity forcommunication between parallel efforts in FAbstracts.
I FHarmony is concerned with the technical aspects ofconstructing bridgework and crossovers between formalframeworks. For instance, how similar or different aredifferent formalizations of the Jordan Curve Theorem, say.Perhaps this is a particularly good example because of thecontroversy over the history of its proof.
I HoTT libraries for Coq, HOL Light, Agda, Lean, . . .
GDML2018
I JMM Special Session on Mathematical Information in theDigital Age of Science, San Diego, Jan 9–11 2018
I ICM 2018, 1–9 August 2018, Panel on Digital Libraries
FutureI Organization, Governance & Community
I Community building, Asian, US and European Trust entities,I Web presence and Wiki on the initiatives
I Collection DevelopmentI Collaboration with EuDML, arXiv and EuclidI Collaboration with Wikipedia, WikiDataI Contact with potential Asian partners
I Tools & ServicesI Mathematical Object Identifiers (MOI)I Proposal toward open access book identificationI Digitization Catalog & DML documents and wikiI IMU Proceedings
I KnowledgeI InitiativesI Stacks Project?I Machine Learning results: Lafferty & Blei; Zanibbi & GilesI Collaboration with WRI
Our Mission
To construct, as a global public good, an open knowledge baseencompassing the results of the world’s mathematics throughcollaborations deploying both present and new technology, andto foster a supporting community.
Grand Challenge
To extract and mechanize the world’s mathematical knowledge.
I Mathematical knowledge appears in the literature, dataand code.
I Mathematical knowledge management tools becomeuseful and necessary when dealing with large corpora.
I Collecting the entirety of published mathematics andapplying MKM is within reach.