The challenge
The authoring platform of choice in many math-heavy disciplines is LaTeX. It produces typeset documents of excellent quality and handles formulas and mathematical diagrams extremely well. Practically every researcher or instructor in mathematics, physics, and computer science is adept at using it, and it has a wide user base outside these core disciplines as well (e.g., philosophy and economics).
Unfortunately, it only produces PDF output. PDF is not an accessible format: it does not scale well to display on tablets or phones, text does not reflow, it contains no semantic information (e.g., what’s a heading or what’s a list), images, formulas, and diagrams are only visually accessible. This creates difficulties for readers who rely on alternative presentations of material (in other colors, text sizes, fonts, or in non-visual formats, i.e., audio or Braille) or who simply want to access the material on a device not the size of a printed page (e.g., on a smartphone or small e-reader).
A partial solution is to provide the content in HTML. HTML deals with accessibility much better than PDF, and technology that converts HTML to other formats is widely available. HTML is also accessible to screen reader software specifically designed for users with low or no vision, and simpler text-to-speech (TTS) software which many sighted users also rely on (e.g., those with dyslexia or ADHD). In math-heavy disciplines, the widespread reliance on LaTeX and PDF only for producing OERs poses a unique challenge (e.g., only about half of the textbooks on the American Institute for Mathematics list are provided in HTML).
The availability of material in HTML format to ensure accessibility is a desideratum for all OER. For math-heavy disciplines, the presentation of mathematical formulas in an HTML version of the material poses a second and difficult challenge. Mathematical formulas have long caused problems for display on web pages. Early solutions included displaying pictures or recreating formulas as text with special formatting and fonts. The modern solution is MathML, a special format for representing mathematical formulas that can be included in HTML documents. MathML is not universally supported by web browsers. The most widespread solution is for a webpage to include the polyfill browser extension MathJax in the webpage, which displays MathML to the user. MathML is a low-level format and not a suitable format for humans to write formulas in. However, good conversion utilities from LaTeX formula notation to MathML exist, and MathJax can also directly display LaTeX formulas embedded in webpages. For instance, the code \int_{x=0}^\infty \frac{1}{x^2} dx
produces: $$\int_{x=0}^\infty \frac{1}{x^2}$$ whereas the MathML representation is unintelligble (right-click on the formula, select “Show Math As > MathML Code” to see it). MathJax can display the formulas itself, display the LaTeX code used to generate it, or produce code in some other format that it lets the browser render (e.g., MathML, HTML, or SVG; right-click on the formula, select “Math Settings > Math Renderer” to see the differences).
Alternatives to LaTeX
One option is to avoid LaTeX as the authoring platform from the start, or to convert existing LaTeX code to a format that is itself more easily converted into HTML. The following are three options, which all allow the use of LaTeX notation for entering mathematical symbols and formulas.
- Pressbooks is a web-based authoring and publishing tool for OERs, which supports LaTeX formulas and support for export to PDF for printing. It is built on top of WordPress, so in a sense it is web-first. While it is possible to use mathematical formulas in a Pressbooks project, it is not a popular option for math-heavy disciplines. Example: A Concise Introduction to Logic (note that formal proofs are displayed as images, images have no ALT tags, and stand-alone formulas don’t use MathML or even unicode characters, e.g., the logical and symbol is presented as a caret ^ and the logical or as the letter “v”).
- PreTeXt is a platform for authoring mathematics textbooks in XML, and converts the XML source to other formats (including LaTeX for printing, HTML for display on a web browser, and ePub for display on e-readers such as Kindle). PreTeXt is one of the oldest open publishing solutions and popular with with mathematicians. For open textbooks, free help for conversion to PreTeXt is available. Example: Abstract Algebra
- Markdown is a simple markup language that can easily be converted to other formats (including HTML, LaTeX, PDF, and Word) using the pandoc package. R Markdown (and its extension/successor Quarto) and Bookdown are popular interfaces for authoring and publishing Markdown documents (and use pandoc and LaTeX “under the hood”). Mathematical formulas and symbols can be included using simplified LaTeX code. Because of the close connection to the statistics package R, this option is popular with statisticians, economists, psychologists, and data scientists. Examples, e.g.: Modern Statistical Methods for Psychology, Odds & Ends
All of the above come with advantages and drawbacks. Depending on the scope and complexity of the project, and the functionality required, converting an existing project to, e.g., Markdown or PreTeXt may be a viable option, and should be considered especially for new projects. A significant advantage of Markdown is that it can be easily converted to other formats (including LaTeX).
An obvious barrier to use of the above is that authors have to learn a new system and/or language and the use of unfamiliar tools. A more significant disadvantage is that the LaTeX ecosystem is huge. LaTeX (or at least its predecessor, TeX) has been around for almost half a century. There are numerous packages that aid in the production of documents, from sophisticated citation managers to packages for the production of specialized diagrams and complex layout of mathematical formulas. LaTeX is also easily extensible; authors can define their own macros quite easily. Very few of these features are available to documents authored in Markdown or PreTeXt, and almost none in Pressbooks. Converting an entire existing textbook will usually require a substantial amount of work, in part because many things that LaTeX does easily will have to be recreated from scratch.
LaTeX to HTML conversion
A second option is to use software to automatically convert a LaTeX project to HTML. Because of the complexity and variability of LaTeX projects, there are few good conversion utilities. The solution I prefer is LaTeXML. It is a reimplementation of LaTeX, but outputs to XML instead of to PDF, and can compile mathematical formulas to MathML. LaTeXML is what ar5iv uses: a project to compile everything on the arXiv to HTML.
Because LaTeXML simulates what LaTeX is actually doing, it can (to a large extent) deal with packages and LaTeX programming directly. It does natively support a large number of popular packages and classes, but packages it does not support can be loaded and “compiled” using the --includestyles
flag. This support is not perfect (e.g., many newer packages that rely in turn on the expl3
package cannot yet be compiled.) LaTeXML is under active development and is likely to keep improving and be supported for the foreseeable future. In any case, because many commonly used packages are supported already or work with the --includestyles
flag, LaTeXML is probably the best candidate for a tool to convert an existing LaTeX project to HTML.
The output produced by LaTeXML directly is not terribly visually appealing. Since the HTML output will not just be used by screen readers (where visual presentation is secondary), some effort is required to style the HTML produced by LaTeXML using CSS to produce webpages that look attractive and display well on a range of devices and browsers (i.e., responsive web pages).
One available and simple solution is BookML, developed by mathematician Vincenzo Mantova at the University of Leeds. BookML uses LaTeXML to produce webpages that use a style modified from that used by Bookdown. LaTeXML and BookML provide additional features to authors to provide different code depending on whether LaTeX is used to produce a PDF, or LaTeXML to produce HTML. BookML extends this capability, e.g., by adding the possibility of directly adding HTML code into the webpages produced, or adding alt text to images produced other than by LaTeX’s \includegraphics
command. BookML also automatically produces a SCORM bundle of the project that can be uploaded to a learning management system (such as Brightspace, Canvas, or Moodle). This is especially useful for authors who don’t have an easy way of hosting the resulting website on a server. LaTeXML (but not yet BookML) can also produce ePub.
Case study: An open textbook on formal logic
The University of Calgary Department of Philosophy teaches symbolic logic in its PHIL 279 course to over 700 students (mainly Computer Science majors). With support from the Taylor Institute for teaching and Learning, we adapted the open textbook forall x by P.D. Magnus; the resulting open textbook forall x: Calgary has been in use in PHIL 279 since 2017. The Calgary version is now also widely adopted and has been translated to German and Portuguese.
I converted this text to HTML in 2024 using LaTeXML and BookML. The basic (error-free) conversion to HTML was simple, and required about a day of work. It involved mainly changing bits of LaTeX code that LaTeXML couldn’t handle. Approximately another week of work was required to fine-tune the LaTeX code and CSS so that it produced better HTML and visual output. E.g., markup to produce lists sometimes resulted in odd spacing on the resulting web page. LaTeX’s mechanisms for producing links also sometimes didn’t work (produced incorrect links or link text when run through LaTeXML). Many of these issues were caused by oddities of the legacy LaTeX code from which we started, and wouldn’t be necessary for a LaTeX project with clean source code that uses standard packages.
The impetus for carrying out the conversion was a request from the University of Cincinnati Accessibility Center who needed to accommodate a blind student in a course using this textbook. I took this as an opportunity to make the HTML version as accessible as possible, specifically, to make it work well with screen readers.
- Add
ALT
text to all diagrams and images. - Provide accessible alternatives to some text elements (e.g., we use a long underline to indicate a blank in a sentence, but this long underline cannot be interpreted by screen readers).
- Switch the language on foreign terms and names so that screen readers can pronounce them in the right voice.
- Develop a non-visual representation of formal proofs and rewrite the code to produce them so that LaTeXML and BookML could a) display them on the HTML version cleanly using CSS and b) screen readers could provide the missing visual information in textual (i.e., auditory) form. The proofs in PDF are produced with the
fitch
package. When run through LaTeXML/BookML they are produced usingfitchml.sty
and styled with CSS withfitchml.css
in the project source. The non-visual presentation is described in the accessibility notes for forall x. (Thanks to Patrick Girard and Audrey Yap for discussions on how to present proofs non-visually. The image at the top of this post is an example.)
There is still work to be done, and the results haven’t been tested by actual students with low or no vision, on their own or in the context of using the materials in a course.
Pitfalls and tricks
It is difficult to test web versions of OER for accessibility. There are basic tools (e.g., WAVE) that automatically check for various things, e.g., that contrast and colors are suitable for colorblind readers, images have ALT
tags, etc. Code produced by LaTeXML generally does well on everything that can be automatically checked (the developers have accessibility in mind), and anything the available resources for OER authors provide guidance on (e.g., the BCcampus Open Accessibility Toolkit). But detailed testing is a challenge for an author with no accessibility training or experience.
What might work in the screen reader you have (say, VoiceOver on MacOS or Narrator on Windows) may not work with others, may work on one version but not others, and any hacks used to make it work might break on others. Testing on a wide range of assistive technologies for non-experts is near impossible: you’d need several different computers and ability to install various assistive technologies on them, some of them are not free. Testing Braille requires at least knowledge of Braille if not separate hardware.
That said, it’s usually best to use documented best practice. (E.g., I originally used the aria-label
tag to provide explicit hints for how things should be pronounced. But support of aria-label
is inconsistent.)
I felt pulled in competing directions when fine-tuning code and deciding on various settings, between providing an optimal experience for readers using TTS extensions casually and not degrading the experience for users relying on true screen readers like JAWS and NVDA. TTS extensions tend to have poor support for pronouncing unicode characters, MathML with assistive alternative text, and tables. Dong things one way might get VoiceOver on Macs or Windows Narrator to read out formulas and special symbols, but then prevent NVDA from working properly. I also had a hard time maneuvering accessibility advice and was unable to obtain advice or support from on-campus sources like our accessibility service center.
Tricking screen readers into pronouncing things the right way is in any case a fool’s errand and may have unintended side effects. (See The Curious Case of “iff” and Overriding Screenreader Pronunciations by Ben Myers). It’s usually best to “leave things be” but provide guidance in a page on accessibility (here is the one for forall x). Screen reader users are accustomed to changing the settings of their preferred software to fix things. You can help by letting them know what to watch for. A good screen reader can replace text with other text that produces better pronunciation. E.g., depending on the voice synthesizer, the letter “A” in a formula might be pronounces as a schwa (i.e., like “uh”). MathJax will tell the screen reader to read a symbol “A” as “upper A”, and the user can replace this everywhere with “upper Eh”.
Links
- Accessible Mathematics
- Converting LaTeX to HTML: technical notes
- Teaching logic to blind students
- Sample output of forall x with screen readers:
Another LaTeX to MathML option that I have had a good experience with is temml — it works in the browser or server side with Node.js.
https://temml.org/
Hi Richard,
thanks for the good summary, yes, accessible research is hard. At arXiv, we are dealing with this on a daily basis, trying to provide all of our corpus in accessible HTML.
We organized the arXiv Accessibility Forum 2024 https://accessibility2024.arxiv.org/ last year, which has some good comparisons between PDF and HTML regarding screen readers, and much more.
PDF UA/2 is also avaiable, and it hopefully makes PDFs also accessible (in a real sense, not like the current state), but producing those PDF UA/2 is no “no-brainer” by now. After our current project at arXiv is finished, I want to look into generating PDF UA/2 for all new submissions, too.
BTW, Deyan Ginev, one of the maintainers of LaTeXML, is now also part of the arXiv team, so we hopefully can improve the conversion quality even more.
@rzach TeXmacs (www.texmacs.org) can also output to HTML. Despite the name it does not depend on TeX, and has its own format and typesetter.