| Word 2007 Scholarly Publishing Update
(Updated 04-07-08)
Last June, we wrote about our
concerns regarding Word 2007 in scholarly publishing workflows.
As the New Year begins, it is a good time to revisit this blog.
Scholarly Publishing Status
In late July 2007, a group of scholarly publishers and vendors
met with Microsoft to discuss these issues (see meeting summary
by Howard Ratner).
In response, a web site
was set up by Microsoft in December 2007
that includes information of use to scholarly authors,
publishers, and vendors who service publishers.
Today, most scholarly publishers are not ready to accept DOCX
files, and several (e.g., Science, Nature) have
explicitly asked authors to avoid using Word 2007 or to use it
in compatibility mode. Many of the specialized applications used
by publishers are not yet fully compatible with Office 2007. For
example, most online submission systems are not fully
compatible with Office 2007 as of this writing,
and so authors find they must revert to DOC format for manuscript
submission to journals. Inera expects most of the specialized applications to
become fully compatible during 2008.
In an unscientific survey of its customers, Inera has found few
DOCX files have actually been submitted to their journals. Inera
believes this small number of DOCX submissions may be attributable
to more than submission-system incompatibility. The collaborative
nature of scholarly research may contribute to this dearth of DOCX
files. When a researcher emails a DOCX file for review, the recipient
who lacks Word 2007 is likely to report back the inability to open the
file, especially if their institution does not provide them administrator
privileges to install the Word 2003 compatibility pack. As a result,
authors may set Word 2007 to save to DOC rather than DOCX format by
default, and therefore significant numbers of Word 2007 users may not
be using the new DOCX file format because of failed attempts to share
their documents. For these reasons, Inera believes that DOCX files will
not be widely used by researchers before 2009.
eXtyles® and Word 2007 Equation Builder
Inera completed eXtyles compatibility for Word 2007 last October.
However, during testing, our engineering team encountered two significant
problems with Word's new Equation Builder feature.
1. Word 2007 has a frequently occurring bug in which some
characters in Equation Builder equations become corrupt when DOCX files
are saved in RTF format (part of the eXtyles process). For example, note
the equation in the following excerpt:
.gif)
and its appearance after the file is saved to RTF:
.gif)
The "W" changes to "r". In broader testing, a more
common occurrence is for Chinese characters to appear, such as when
the "n" in this line:
.gif)
is saved to RTF:
.gif)
On March 18, 2008, Microsoft resolved this problem via a
hotfix.
Inera advises all organizations that use RTF as part of their
workflow when handling DOCX files to obtain this patch from Microsoft.
2. The transform provided in Word 2007 to convert Equation Builder
math to MathML (OMML2MML.XSL) has bugs. For example, the expression:

is incorrectly converted by Word 2007 to this MathML:
<munder>
<mo>∑</mo>
<mi>i</mi><mo>,</mo><mi>j</mi>
</munder>
<mrow>
</mrow>
The correct MathML is:
<munder>
<mo>∑</mo>
<mrow>
<mi>i</mi><mo>,</mo><mi>j</mi>
</mrow>
</munder>
Though Inera has not conducted exhaustive testing of the OMML-to-MathML
transform, we remain concerned that several problems were encountered in
testing fewer than 100 equations.
Inera has found that there are no differences among the transforms
shipped with the beta, release, and SP1 versions of Word 2007. We believe
that Microsoft's transform has not been adequately tested and debugged.
This problem stems in part from lack of use. Many users have not discovered
the transform because it is undocumented by Microsoft (a search in Word
2007 SP1 Help for "MathML" yields no results).
It is likely that many other problems will need to be flushed
out before the OMML-to-MathML transform will be sufficiently robust for use
by scholarly publishers. Microsoft is working on an update to the
transforms, but we do not have any information about a release plan.
Summary
As of January 2008, most scholarly publishers are not ready to accept
DOCX files. Inera expects that most third-party applications essential
for scholarly publishers will be compatible with Office 2007 by the end
of 2008. Beyond the lack of fully compatible systems, most publishers we
have (unscientifically) surveyed have not upgraded their editorial and
production staff to Office 2007. Inera expects such upgrades to start
slowly in 2008 but not to accelerate before 2009.
For 2008, Inera recommends that publishers continue to develop and
test their workflows for handling DOCX files and postpone accepting such
files from authors until all systems are fully tested.
Inera recommends that publishers who handle content with any significant
amount of math decline to accept files that use Microsoft's Equation Builder
until the two specific issues listed above are resolved by Microsoft and
fully tested by publishers.
Additional information about Word 2007 math is available
here.
Additional information about the Word 2007 DOCX file format is available
here.
|