13
With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 www.formulasearchengine.com 1 Mathoid Robust, Scalable, Fast and Accessible Math Rendering for Wikipedia Math on Wikipedia Meeting 29.5.2015 15h (CEST) #wikimedia-services on freenode

With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

With slides from CICM 2014 by

Moritz Schubotz and Gabriel Wicke

29.05.2015 www.formulasearchengine.com 1

Mathoid Robust, Scalable, Fast and Accessible Math

Rendering for Wikipedia

Math on Wikipedia Meeting 29.5.2015 15h (CEST)

#wikimedia-services on freenode

Page 2: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

History of Math and Wikipedia

• Math support since 2003

• 10-2010 Client side MathJax support

• 11-2011 MathML setting is removed

• 10-2013 Mathoid implementation ready

• 06-2014 Majority of the new code is reviewed

• 09-2014 Mathoid is deployed in production

• 03-2015 Mathoid 0.2.6 with speech output

• 05-2015 Mathoid 0.2.8 complies with MediaWiki services template

29.05.2015 www.formulasearchengine.com 2

Page 3: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

The users preferences (2014)

29.05.2015 www.formulasearchengine.com 3

PNG; 8664

HTML; 13720

MathJax; 7801 MODERN;

47603

MATHML; 10124

SOURCE; 2441

invalid; 552

Page 4: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

Bringing MathML to Wikipedia

• Dimensions

– Coverage

– Speed

– Robustness

– Maintainability

– Accessibility

29.05.2015 www.formulasearchengine.com 4

Page 5: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

Browser support

• MathML support in Firefox

• No MathML support in Chrome

• Fallback SVG images

29.05.2015 www.formulasearchengine.com 5

Page 6: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

Mathoid

29.05.2015 www.formulasearchengine.com 6

Page 7: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

Wikipedia Dataset

• Wikipedia (en) 446 485 formulae (27 671 pages)

– ~280k distinct formula

– ~3GB formulae (presentation + content markup)

– Generation on workstation 150MB source data

29.05.2015 www.formulasearchengine.com 7

Page 8: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

Comparison of rendering methods

29.05.2015 www.formulasearchengine.com 8

LaTeXML got faster and supports SVG now

Page 9: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

Mathoid needs you

• Help with testing of MathML in Wikipedia

• http://math-preview.wmflabs.org

• Report bugs

https://phabricator.wikimedia.org/tag/math/

• Review code

29.05.2015 www.formulasearchengine.com 9

Page 10: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

Technical Details: Caching

Rendering Database Table Page Cache Browser cache

PNG filename Special page yes

SVG yes Special page yes

MathML yes Page output

Livetime infitiy one week (Specialpage)

One hour (Specialpage)

29.05.2015 www.formulasearchengine.com 10

Texvc generates an outputhash for each PNG image. This outputhash can not be computed from the TeX input without running texvc, and changes after texvc is recompiled. In the database the filename of the PNG file was stored. The specialpage gets the hash of the tex input to retrieve the image. If the image file is considered as correct (valid XML for SVG and existing for PNG) the special page header sets caching information. Currently mathoid displays the texvc generated PNG for IE6 and older.

Page 11: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

Proposed changes (1)

• PNG generation independent of texvc

– Currently JAVA

– Next step Node SVG 2 PNG conversion

• Minimal change database caching layer

29/05/2015 www.formulasearchengine.com 11

Rendering Database Table Page Cache Browser cache

PNG yes Special page yes

SVG yes Special page yes

MathML yes Page output

Livetime infitiy one week (Specialpage)

One hour (Specialpage)

Page 12: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

Proposed changes (2)

• Use standard Image pipeline for image fall back

• Minimal change database caching layer

29/05/2015 www.formulasearchengine.com 12

Rendering Database Table Page Cache Browser cache

MathML yes Page output

Livetime infitiy one week (Specialpage)

One hour (Specialpage)

Page 13: With slides from CICM 2014 by Moritz Schubotz and Gabriel ... · With slides from CICM 2014 by Moritz Schubotz and Gabriel Wicke 29.05.2015 1 Mathoid Robust, Scalable, Fast and Accessible

Proposed changes (3)

• Completely remove database caching layer

• MathSearch extension maintains math index and is updated via the (already existing) hook

29/05/2015 www.formulasearchengine.com 13