A deep dive on DOIs

eXtyles has been linking to Crossref for a very long time—DOI linking and correction is a core element of our reference processing. Now we’ve extended our Crossref linking to not only better handle data citations but also improve how eXtyles verifies DOIs for all reference types.

Remind me how eXtyles DOI linking works?

Until now, eXtyles has found and added DOI links by first parsing unstyled references into component pieces, then performing a metadata lookup on Crossref using all available metadata except an author-provided DOI. For example, with a journal reference, eXtyles DOI linking queries Crossref based on the first author’s surname, article title, journal name, volume, first page, and year. For a conference reference, eXtyles often queries Crossref using only the first author’s name, paper title, and year.

When the reference included an author-supplied DOI, eXtyles did one of 3 things:

  1. If the author’s DOI matched the DOI returned by the Crossref query, then eXtyles simply used the DOI returned by the Crossref query.
  2. If the author’s DOI did not match the DOI returned by the Crossref query, then eXtyles used the DOI returned by the Crossref query, and inserted a warning that the author’s DOI had been corrected.
  3. If no match was returned by Crossref, eXtyles silently ignored the DOI.

This means that while eXtyles has been very good at finding and adding DOIs based on citation metadata (excluding the DOI), until now it hasn’t been so good at flagging problems with author-supplied DOIs.

Until a few years ago, this was usually a valid approach, not only because authors rarely provided DOIs but because, in our experience, author-provided DOIs had a 20% error rate! Now that authors are increasingly citing online-only materials, such as journal articles published online ahead of print, preprints, and data sets, we increasingly see author-provided DOIs in references—so it was high time to revisit our methodology.

So how does DOI linking work now?

We’ve shifted from using metadata queries exclusively, to a multi-step process that allows eXtyles to also verify author-supplied DOIs. Here’s how it works:

If the author has provided a DOI, then eXtyles will

  1. Query Crossref using available metadata (except the DOI), just as it always has. If Crossref returns a DOI, then eXtyles follows the logic described above. If not,
  2. eXtyles checks to see if the author-supplied DOI is malformed (e.g., missing the 10.XXXX prefix). If the DOI is malformed, then eXtyles adds a warning comment. If the DOI syntax is valid,
  3. eXtyles queries Crossref to see if the DOI is registered. If the return from Crossref is valid, then eXtyles turns the author-supplied DOI into a hyperlink. If Crossref indicates that the DOI is not registered with Crossref,
  4. eXtyles queries Crossref again to discover the registration agency for the DOI. If Crossref returns a registration agency, eXtyles adds a comment identifying the agency. If not,
  5. eXtyles adds a warning comment to indicate that the DOI is unknown, and that the author should be queried for a corrected DOI.

→ Did you know? Crossref is not the only registration agency for DOIs! For example, DataCite DOIs are registered with DataCite, not with Crossref.

This means that eXtyles now verifies every single DOI in a reference list, for every type of reference!