Creating semantic structure using paragraph styles, part 3: Challenges and pain points

In our long experience of helping editorial and production staff transition to using eXtyles, we’ve observed that making the mental leap from formatting Word content to applying semantic structure to Word content is often one of the most challenging aspects of adopting an XML-driven workflow.

What are some differences between formatting and semantically structuring documents? Why is this challenging transition so important to make? What are some benefits of a semantic mindset? In this 3-part series of posts, we’ll discuss

Are you sitting comfortably? Then let’s begin!

What’s difficult about switching?

Over many years of helping customers with the transition from a formatting-based workflow that relies on human interpretation to a structure-based workflow, we’ve observed some common threads.

“Why do I have to do all this extra work?”

Changing from “making it look right” to “identifying what it is” can feel like more work for editorial staff, who now need to make semantic distinctions explicit rather than relying on typesetters, designers, or production staff (“the typesetters know what to do!”).

While this is a valid concern, remember that the work hasn’t increased, just shifted around! At some point in every publishing workflow, someone must identify the elements of the content and assign the appropriate styles or tags.

Remember, too, that shifting the work of document structuring upstream means assigning it to the people most qualified to make decisions about what things are—copyeditors and authors. This shift makes it less likely that structural errors (e.g., assigning the wrong level to a heading) will make it all the way to proofs, and it makes errors easier to correct, since they will be detected earlier in the process.

“All of these things look the same, so why do they need different styles/tags?”

Many text designs use essentially the same formatting for front-matter headings (e.g., Abstract), level 1 headings (e.g., Introduction, Conclusion), and back-matter headings (e.g., References, Acknowledgements), and changing your practice to distinguish all these different types of headings when applying styles to a document can be frustrating. Similarly, if your design uses the same formatting for authors’ affiliations, correspondence information, disclaimers, acknowledgements, and other author-related text, it’s tempting to use the same style for all of them. After all, no matter which style you choose, it will all come out looking the way it should in the published article … right?

Well … yes and no. Why is this kind of distinction important? Remember that you can look at several chunks of text that have the same format—but occur in different contexts and contain different words—and see what makes them different, but a machine reading the structure of your content needs other cues.

This fact about how machines read content doesn’t just affect how typesetting software lays out your content! It also affects

  • The online discoverability of your content (by correctly designating metadata elements and distinguishing front matter from body text or back matter, for example)
  • The accessibility of your content to readers with print disabilities (by making it possible for screen readers to identify the parts of an article correctly)
  • The possibilities for repurposing your content in the future
  • The ability to use a highly structured single source (such as an XML file) to easily output multiple publication formats (PDF, HTML, EPUB), each with its own style sheets and formatting rules for various document elements

“But our typesetters/vendors already know what to do with our content!”

That’s great! But …

  • What happens when you switch vendors, temporarily or long term? Or when you decide to adopt a new workflow, such as single-source publishing from XML?
  • Are typesetters in the best position to identify and mark up the structure of your content? Or would it make more sense for these decisions to be made by editorial staff who are engaging closely with the text?
  • When someone makes the wrong decision about a content element, how much time, effort, and cost is involved in fixing the problem (assuming the problem is caught, which it might not be)?

It’s also worth considering that if you’re looking to increase the automation and lower the costs of your typesetting, using semantic markup to clearly identify what things are in your content is a key first step!

“Okay, you’ve made your argument for how this helps the overall workflow. But what’s in it for me?”

Adopting a workflow based on semantic markup has big-picture benefits, but that’s not all! Using eXtyles to apply meaningful paragraph styles to your content in Word helps you do more, even before you export XML. That’s because eXtyles leverages document structuring to enforce editorial styles and check the accuracy and completeness of your documents. For example, because it knows what’s what in your document thanks to paragraph styles, eXtyles can

  • apply editorial rules to certain document elements (e.g., delete punctuation at the ends of headings; enforce your preferred format for figure or table titles)
  • automatically copyedit journal and book references according to your editorial style
  • search PubMed and Crossref databases to find DOI and PMID links
  • warn you if a reference, figure, or table is not cited in the text, if a callout is out of order, or if an in-text citation has no matching reference

Wrapping it all up

Why should you switch from formatting manuscript files to using styles to structure them? Here’s the short version:

  • It can help you adopt an XML-first or XML-early workflow.
  • It will make your post-XML workflow (e.g., typesetting) less prone to error, and will make errors easier to spot and resolve.
  • It removes guesswork and reduces the potential for errors in typesetting.
  • It will help you meet your accessibility goals.
  • It can be leveraged to automate key editorial tasks with eXtyles.
  • It’s a kindness to your future self, and to future users of your content!