Standardizing Standards 7: Multilingual Standards Publishing with XML

Transcript

Hello, my name’s Antti Saari, I’m from the Finnish Standards Association SFS, here to tell you about the way we do multilingual standards publishing with STS XML.

The Finnish Standards Association is a national standards body in Finland, one of three. We do most of our standardization work in ISO and CEN, and work with the other national standards bodies in Finland to also publish and sell national adoptions and translations from CENELEC and IEC.

We also do some Finnish national standards—those are not a big priority for us, but we do publish a few and they are included in the numbers that I’ll show you in our later slides.

Since the topic of this presentation is multilingual standards, I think that we should first define what we mean by a multilingual standard.

In our context, most of what we publish is adoptions of CEN and ISO standards. We publish those in English, the original English text of the standard. We are bound by CEN rules to publish everything or at least adopt everything that they publish, so that accounts for something around a thousand individual standards per year.

Now, we look through those standards and try to find the ones that are particularly interesting to Finnish industry, Finnish stakeholders, and provide a Finnish translation of those CEN and ISO standards.

The Finnish translation that we do does not have the same status as the original text of the standard. We have this disclaimer on the cover page of every translation that we publish, saying that in case there’s anything different about our translation, or if the translation doesn’t seem to match with the original, the original text is the one that applies—follow the original.

So, the customers who buy the translation, we have to provide them with the original text as well. What we call a multilingual standard consists of the Finnish translation and the English original standard in one product, or one PDF right now.

And it has to be made in a way that makes it easy for the user to read through the text in one language and then cross-check with the original text, and then get back to the translation and continue reading.

In terms of numbers, these are from last year, we published around 250 of this kind of multilingual standard. The top row says 268, but like I mentioned before, that includes our national standards as well, so that’s a handful, a dozen, maybe a bit more. There are about 250 translations for multilingual publications.

Of those, we did a bit over 200 in STS XML using a workflow including eXtyles and Typefi, and the rest were done in our old workflow that I’ll cover in the next slide.

The reason we’re not doing everything with STS XML is because sometimes the original standard, the English original coming from ISO, CEN, CENELEC, whoever, is not made in STS XML. And if it’s not made in STS XML then well, we’re not going to do it either.

Our idea is to, when we’re making these multilingual standards, is to use the English XML we get from the international organization as is, and then just add our translation to it. So if we don’t get the English XML, then we just don’t produce an XML publication from it.

The percentage of XML to non-XML is roughly similar to what CEN itself is doing. CEN manages to publish XML with over 80 percent of their publications and we’re in the same ball park.

Now, we’ve been doing this kind of multilingual standard since way before STS XML was a thing. I’m not sure about the exact year we started, but way before my time.

The way we used to do those, and the way we still do anything that we can’t get in XML, is with Adobe FrameMaker and a custom plug-in that we have for that.

I think the main reason this is interesting is that the workflow we had for FrameMaker was, and is, very similar to what we are doing now with eXtyles and Typefi.

So we had a Word file from CEN and ISO. Since we are a member we get all of their output—we get their PDF files, their Word files, and their XML files if they produce them.

We take the Word file, and have it translated so we get another Word file. Someone would go through those Word files and style everything according to our specification, using just Word’s built-in tools.

Then we’d have a plug-in that would read through the Word file, convert it to XML, apply structure, and our editors could work through it in FrameMaker.

The reason we were using FrameMaker is because it allows you to have multiple text flows, among other things. When we started doing standards like this, we were mostly thinking about people reading standards as a physical book where you have actual binding and you have to turn a page, and stuff like that.

So, the idea was, in that book we’d have always the left hand side page in Finnish, and the right hand side page would be in English. FrameMaker lets you do that pretty easily, and our plug-in made it easier still.

The only thing our editors really had to do was to make sure that the standard would proceed in both languages at the same rate. So normally Finnish takes a bit more space than the English does, and they would have to make the Finnish take up less space by using layout tricks like reducing the spacing between paragraphs or between words, or even between letters.

Now, when we moved to our current publishing process with STS XML using eXtyles and Typefi, we also wanted to rethink the end product we have since years have passed and those book standards are not so important anymore.

More people are reading standards on computer screens or even tablets, and if you’re reading a standard on an iPad you can’t really fit more than one page on the screen at a time.

So if you had that kind of book layout we had before, that would mean that every other page is in Finnish and every other page is in English, and that would be really annoying to read if you could only see one page at a time and you were trying to read through it in one language.

So, when we moved to eXtyles and Typefi we just went to rethink that.

Also, there’s the thing that making that kind of change that our editors used to make to compress the space Finnish language takes, is kind of hard to automate. At least, that’s how we felt back in 2014.

I’ve heard now that some of Typefi’s customers are doing the same kind of thing, using Typefi automation to basically make that kind of side by side layout automatically, but in 2014 it seemed difficult for us.

So, the way we’re publishing standards now with eXtyles and Typefi, we have one process that we use for everything that we publish, including our national stuff, and standards from CEN and ISO.

The way the process works—well first of all someone has to decide that ok, we want to translate the standard. So, we send the translator a Word file. We’re still dealing with Word files for the translation part.

And they just type their translation on top of the Word file, or if they have access to special software like Trados, they might import the Word file into Trados and re-export it back into the same structure.

As it happens, the translated Word file we receive will be already styled in the exact same styles that we need for eXtyles to be able to export XML from it. That is because the Word file we send to the translator has already been treated by CEN or ISO, and their editors.

And while it’s possible that the translator might make a bit of a mess with some of Word’s built-in styles—do something unexpected—I’d say 90 percent of the styling work that the eXtyles is used for in this process has already been done in the translation. We really only have to do some minor cleanup at this point.

The reason we are using Word files for translation, even though XML is available, is that the dedicated software that you’d need to do XML translations is kind of expensive. Not all of our translators have access to that software, so that’s an issue.

Also, if you know there’s a mistake in the translation, and the translation only exists in XML, then the only person who can fix it is the guy who has access to the translation software, and so they can re-export the XML.

The only alternative is really to manually fix the XML file and we are trying to avoid that—that’s not something our editors are generally comfortable doing.

So, the part where Typefi comes into this process is where we already have two XML files. The first one is the one we’ve produced in-house—it has all of the translated content of the standard in Finnish, and all of our metadata is in there.

And then there’s the second file. It’s the same XML file as ISO or CEN created and let their members download. It’s the same file, we do not add anything to that file.

We feed those two files to Typefi, and it merges the contents in the order that we want them to be.

So there’s our cover page, then it gets the cover page from CEN coming from the CEN XML file, then comes all the Finnish contents from our XML file, followed by the English contents from the CEN XML file.

Something like that. There are some variations to this, but that’s the general idea.

While Typefi is doing this, every time it sees a section heading it says ok, this section heading came from the Finnish XML file. So Typefi will create a cross-reference to the section heading with the exact same ID attribute value that comes from the English XML file.

Typefi doesn’t check whether that kind of section really exists, it’s just assumed that there will be a corresponding section with this exact same ID attribute. The link is created, and we ensure that there will be a target for that link.

At this point I’ll just quickly show what a PDF file like this looks like.

You’re looking at an SFS cover page of an ISO standard that has been adopted by CEN and then again adopted and translated by us.

So really, this actually has contents from three XML files, one from us, one from CEN, and one from ISO. The general idea is the same.

Now, the first actual section in the standard is the foreword coming from CEN. And right next to the section heading you see a hyperlink. When I click the hyperlink it takes me to the same section in English.

I can scroll through the standard in English, I get the terms and definitions, and there’s a cross-reference to the terms and definitions section in Finnish—so I could do it like this, and now I’m at terms and definitions in Finnish.

That’s basically all there is to it. It’s stupidly simple, even. But it’s kind of convenient I think, it’s an easy way to cross-reference between languages, and you could apply the same idea to online platforms as well.

We’re doing it with PDF files for now, but it’s the same kind of idea behind how you create links and allow the customer to read through the standard in different languages and check the other language.

If that’s the kind of thing you want to do, this is a surprisingly easy way to do it.

Now, being able to do it like this is not a given, I have to stress that. It’s easy for us because when we started our project with eXtyles and Typefi, we copied everything that the ISO had done.

ISO had already started a project, they were publishing everything in XML by that time.

CEN had already started as well, they weren’t quite finished in the sense that they were still developing parts of their project, but they had all the eXtyles templates and customizations were already in place. So we could just march in and get all their stuff.

We wanted our stuff to be as similar as possible to what they were doing, so we just told eXtyles and Typefi, don’t make any customizations, just give us what they’re doing and we’ll work with that.

What that leads to is that when we have sections, all sections do have an ID attribute. That’s something that the ISO and CEN have decided, and that’s something that we’ve decided to follow them on, so it’s easy for us to always assume that ok, we have a section heading, we can create this kind of hyperlink, because the ID attribute will be there.

Similarly, the ID attribute values need to match because, like I said, we are not actually trying to look for a matching section in the other language, we’re just assuming that a section with the exact same ID will be there. So that means that the section ID schemes must be the same in the translation and in the original text.

I’m bringing this up because in the NISO STS, the official documentation for ID attribute—I have to speak here from the documentation—and it shows an ID attribute with the value s6.7.1.5.

I suppose that’s a good way to identify your section 6.7.1.5 but that is different from what ISO is actually doing, and what CEN is doing, and what we’re doing.

So, if someone were to look at the NISO STS documentation and decide that ok, this is how the section IDs work in the examples, we’re going to do exactly that, it would make sense for them.

But if we wanted to produce a translation and a bilingual standard using that XML from that organization, we would be in trouble, because our ID attribute schemes do not match. Well, it wouldn’t be hard, but we would have to create some kind of mapping to be able to match them.

All in all, these are not huge issues, we have some custom sections there to handle national content and we’ve created workarounds for them, but all in all this is a pretty easy thing to do.

Now, how has this all worked out for us? This shows how many pages per year we’re publishing. Our production of STS XML started in full in the middle of 2015, and in 2016 we were using STS XML / eXtyles / Typefi as our main publishing process.

The page count in those two years was pretty ok, normal. The amount of pages isn’t really limited by what we can publish, it’s more limited by what our translators are able to produce, so if they produce 17,000 pages of translations then that’s what we publish. No more, and no less.

The thing to take away from this is that the page count is normal, it’s the same as it was before.

Which takes me to the next slide. On top, the line shows how many days it took from when a draft translation was first submitted to our publishing department, to when there was a ready translation, ready to be sold to our customers.

You can see a fairly dramatic drop in there, almost a month compared to 2015. But in 2015 we had a lot of issues because, you know, we were starting this whole new publication process that took a lot of resources.

But even compared to the previous years when we were still completely using the old system, we’ve managed to decrease the amount of time it takes to create a publication by quite a lot.

And, furthermore, we’re doing it with a lot less effort required. The bars or columns on the bottom show how many person-years we’ve worked in our publishing department per year. So it’s not just creating those translations, it’s everything that our publishing department does, including creating advertisement materials and publishing the thousand adoptions that we do, all that stuff.

The amount of work, you can see, has gone down a lot, by what, one and a half person-years compared to when we were only using the XML method.

Compare it to 2014 when we produced a very comparable amount of pages, also 17,000, but we had over eight people working to create those pages.

So we’re doing a lot of pages, we’re doing it faster than before, and we’re doing it with less work required than before.

That’s all from me, thank you for listening.