| Word 2007 Math
(Updated 06-22-07)
Last year we reported to eXtyles® customers that Word 2007's
new equation editor did not have support for MathML. We have
recently learned that this is not correct. Several recent blog
postings, both by Microsoft employees:
http://blogs.msdn.com/brian_jones/archive/2006/08/16/700494.aspx
http://blogs.msdn.com/murrays/archive/2007/03/16/math-find-replace-and-rich-text-searches.aspx
http://blogs.msdn.com/murrays/archive/2007/06/05/science-and-nature-have-difficulties-with-word-2007-mathematics.aspx
and by non-Microsoft employees:
http://dpcarlisle.blogspot.com/2007/04/xhtml-and-mathml-from-office-20007.html
http://www.robweir.com/blog/2007/04/math-markup-marked-down.html
have shed new light on this situation.
Word 2007 does have support to convert equations to/from
MathML via the clipboard, although this feature is not turned
on by default, and a recent Google search shows that the
control to turn it on/off is virtually undocumented by
Microsoft. To turn this support on in Word 2007, you must
select the Insert Ribbon, add an equation (using the new Word
2007 equation editor), and then in the Design Ribbon that
appears, you must click the down arrow to the right of
"tools" and select the option "Copy MathML to the Clipboard
as plain text" in the Equation Options dialog that appears.
The transformation that allows you to copy/paste equations
via MathML is driven by two XSLT scripts (omml2mml.xsl and
mml2omml.xsl). These scripts can be used outside of Word if
you are reading or manipulating DOCX XML files directly.
So if Word 2007 supports MathML, then what's all the
fuss about? We at Inera still see several problems:
1. If a Word 2007 file is read into an earlier
version of Word, the equations from the new equation editor
change to graphics.
This point is a key problem. Most publishers are not yet
ready to upgrade to Word 2007, and if they edit documents
in earlier versions of Word, the equations must be re-keyed
for the remainder of the publication workflow (typesetting
and/or XML production).
2. Scholarly publishers aren't ready to upgrade to Word
2007.
Most publishers are a long way from upgrading to Word 2007.
There are two main reasons for this delay. First, other
systems in the publication workflow, most notably online
submission/peer review systems and editorial tools, are not
yet compatible with DOCX files. If the surrounding
infrastructure doesn't support DOCX, there's no impetus to
switch internally. Second, the user interface is very
different from previous versions of Word, and this change
brings transition issues, especially with editors who may balk
at the degree of change and see Word as a tool to do their
work, not a tool to relearn with each new release.
3. We have concerns about the Microsoft MathML transform.
Even if all parties in the workflow change to Word 2007,
we still have concerns about the Microsoft MathML transform.
At the micro level, we have talked with several XML
professionals who have tested this transform, and all noted
bugs within an hour of testing. So we do not believe that the
transform is "ready for prime time." At the macro level, we at
Inera developed a transform many years ago from a (different)
linear format to MathML. The transform, after much engineering
work, was not flawless, and we discovered that certain linear
constructs could not be unambiguously converted to correct
MathML without human intervention. So we have concerns with
the degree to which this Microsoft conversion can be made
robust. As an additional data point, MathType, which has been
in the business of math much longer than Microsoft, is still
tuning their MathML conversion to this day because it's just
not a simple task.
So where does all this leave us? We expect that it is
inevitable that publishers will have to accept DOCX files,
and publishers must prepare for this day sooner rather than
later. However, until internal systems, especially in
editorial, have been switched to Word 2007, we are strongly
recommending that publishers discourage use of the new Word
2007 equation editor by authors. Instead, they should
recommend that authors continue to use the legacy equation
editor, which can be accessed from Insert Object from the
Insert Ribbon. Only when publishers have fully converted
inside to Word 2007 will it be reasonable to consider
accepting DOCX files with the new equation format. However,
even in this case, we believe that the transform to MathML
requires further testing and tuning before it can be used
in production. For this reason, we recommend updating
instructions to authors to advise use of the legacy equation
editor for the foreseeable future.
And where does eXtyles fit in with all
of this? As of June 1, 2007, eXtyles is compatible with Word
2007, excepting equations inserted with the new equation
editor. We have a strategy for this last part that we expect
to implement over the summer, and hope to report by September
that eXtyles is fully compatible with Word 2007. However, this
strategy is dependent on Word 2007's MathML transforms, and
so it will only be as robust as the transforms themselves.
Note: Design Science, which produces MathType, has posted
a press release about Word 2007, equations, and scientific
journal submissions. This press release includes a link to instructions for adding
a button for the legacy equation editor to the Word 2007
quick-access toolbar.
Information about the Word 2007 DOCX file format is available
here.
|