What’s new in eXtyles Reference Processing: 2020 edition

Our Development team is continually working to improve how our software solutions handle customers’ content. Collaborating closely with our Support and Configuration teams to keep on top of what improvements are needed, Development works on every aspect of our software, and a major focus is always our suite of reference processing tools, which were originally developed for eXtyles but also form the basis of Edifix.

Every new version of eXtyles we ship to a customer includes Release Notes detailing what we’ve fixed, upgraded, and added since the previous release.

If you haven’t been reading your Release Notes, we encourage you to do so!

Meanwhile, here’s a quick snapshot of some improvements we made to eXtyles reference processing during 2020.

Expanding coverage: Improvements in what we identify, parse, and validate

  • As part of the ongoing expansion of our journal database, we’ve added about 800 new journal titles from the NCBI Molecular Biology Journals database (Build 4428) and about 1,100 from PubMed (Build 4439)!
  • We’ve also improved the ability of eXtyles Reference Processing to recognize variously formatted supplement numbers in journal references (Build 4584).
    → For example:

    • Smieszek T, Pouwels KB, Dolk FCK, Potential for reducing inappropriate antibiotic prescribing in English primary care. J Antimicrob Chemother. 2018;73(suppl_2):36-43.
    • Kleinman JT, Mlynash M, Zaharchuk G, et al. Yield of CT perfusion for the evaluation of transient ischaemic attack. Int J Stroke. 2015;10(suppl A100):25-29.
  • We’ve built out eXtyles Reference Processing to parse references to preprints, which are now tagged as <prpt> rather than <jrn> (Build 4551) or, as in the case of references with arXiv links, as <eref> (Build 4602, which also includes improved recognition of arXiv IDs and other elements).
  • For all reference types, eXtyles will now use bib_article and bib_title, whichever is found, in looking up an entry on PubMed and Crossref (Build 4527), so that if for some reason a title is styled in an unexpected way, eXtyles will still be able to find the PubMed or Crossref record if one exists.
  • As part of expanding eXtyles Reference Processing to parse references to data sets, we’ve updated our XML export filters to remap <article-title> to <data-title> in data references, or <source> to <data-title> if the reference contains no <article-title> element (Build 4615).
  • In addition to parsing preprint references, eXtyles can now link those references on Crossref and insert or verify DOIs. When possible (i.e., when the necessary information is available in Crossref’s database), eXtyles will also add comments flagging (a) journal articles previously published as preprints, and (b) preprints which have since been published as journal articles (Build 4551).
  • We’ve added support to report an Expression of Concern in PubMed Reference Checking (Build 4614).
    → For example, in this reference entry:

    • Bombardier C, Laine L, Reicin A, et al. Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. VIGOR Study Group. N Engl J Med 2000;343:1520-1528. Medline
      eXtyles will add a Comment linking to this Expression of Concern.
  • For reference types (or parts of references) that eXtyles can’t restructure, we’ve made sure that face markup won’t be lost (Build 4665)

Managing metadata: Improvements to how we handle persistent identifiers

  • We’ve updated our code to deal with DOIs that follow a page number and a period with no space (Build 4656) and DOIs to which spaces have been added (Build 4595).
    → For example:

    • Andersson, P. (2008). Happiness and health: Well-being among the self-employed. The Journal of Socio-Economics, 37(1), 213-236.https://doi.org/10.1016/j.socec.2007.03.003.
    • Andersson P. Happiness and health: well-being among the self-employed. J Socio-Economics. 2008;37(1):213–36. https://doi.org/ 10.1016/j.socec. 2007.03.003.
  • We’ve also updated how eXtyles parses references that include a DOI expressed as a hyperlink, so that hyperlinked and plain-text DOIs are handled consistently (Build 4536), and improved parsing of references that include a malformed DOI (Build 4535).
  • eXtyles will now add a Word comment when an author-supplied DOI is malformed and therefore can’t be queried at Crossref (Build 4598).
    → For example:

    • Bhoori S, Rossi RE, Citterio D, Mazzaferro V. COVID-19 in long-term liver transplant patients: preliminary experience from an Italian transplant centre in Lombardy. The Lancet Gastroenterology & Hepatology. 2020 doi: 10·1016/S2468–1253(20)30116–3. [Epub ahead of print].
    • Bernardes de Jesus, B., Vera, E., Schneeberger, K., Tejera, A.M., Ayuso, E., Bosch, F., & Blasco, M.A. (2012). Telomerase gene therapy in adult and old mice delays aging and increases longevity without increasing cancer. EMBO Mol. Med. 4, 691-704. doi:610.1002/emmm.201200245.
  • As part of building preprint handling into our reference processing, we’ve added support for a dot between arXiv prefix and arXiv ID (Build 4498).
    → For example:

    • Moeck P, Fraundorf P. Structural fingerprinting in the transmission electron microscope: Overview and opportunities to implement enhanced strategies for nanocrystal identification [preprint]. arXiv. Posted 14 Jun 2007. arXiv.0706.2021
  • And speaking of preprints, eXtyles can also now insert DOIs deposited to Crossref as (Build 4524).
  • We’ve updated PubMed Reference Linking and Correction to cope with references that have more than one DOI marked and to better handle references that include a “corrupted” or malformed DOI, like the examples above (Build 4644).
  • We’ve also updated PubMed Reference Linking and Correction to use NCBI’s new URL format, as the old format has been deprecated and will eventually stop working (Build 4660).
  • Finally, we’ve made two key updates to the code behind our Crossref linking and correction process that help eXtyles to verify DOIs for data citations and improve DOI verification for all reference types (Build 4582):
    • If a reference can’t be matched on Crossref but does contain an author-supplied DOI, eXtyles will query Crossref again, using the DOI as the query value, to verify the DOI.
    • If there’s still no match, eXtyles will then query Crossref to find the registration agency for the DOI and either report the registration agency for valid DOIs—most commonly, we see DOIs for data citations deposited with DataCite—or flag the DOI as invalid.

Tidying up: Improvements to manage creative punctuation

  • To ensure we can correctly parse more references and to make our URL Checking process more useful, we’ve added support for various dash characters (rather than hyphens) in URL domains (Build 4657).
    → For example, in this reference entry:

    • Martin, C.M., User guide for ABC – Analysis of Bearing Capacity, Department of Engineering Science, University of Oxford, Report No. OUEL 2261/03, available from http://www‐civil.eng.ox.ac.uk/people/cmm/software/abc/

    The string http://www‐civil contains a character that looks like a hyphen, but isn’t. eXtyles can now recognize the entire URL as a URL, rather than ignoring this malformed string, which means both that the whole reference entry will be correctly parsed and that URL Checking will be able to verify the whole URL.

  • We’ve improved how eXtyles handles various formats for expressing access dates and online publication dates, including 1-1-1989 and 1.1.89 (Build 4538) or 1989-01-01 (Build 4575).
  • In addition, eXtyles can now allow a period after a spelled-out month in a reference date (Build 4565), cope with a hard return before an access date (Build 4567), and do a better job of parsing journal references in which a period incorrectly follows a year, as in 1995.346:530-6 (Build 4529).
  • We’ve also updated eXtyles to support various content numbering formats, including document numbers with slashes and hyphens in book references (Build 4494); alphanumeric reference numbers separated by a dot, such as A.1 (Build 4560); and a page prefix after a dot after a year, such as 2003.p.1303-23 (Build 4539).
  • In reference parsing, we now support page numbers in the Cell Press STAR Methods format, i.e., a last page with a suffix like “.e4”, e.g. “673-684.e4” (Build 4611).
  • Finally, eXtyles can now support reference numbers formatted as superscript numbers in baseline parentheses or brackets, as well as reference numbers preceded by a period (Build 4637).

What’s next?

That’s a great question!

As we discussed at this year’s XUG, we’re able to implement these improvements for our customers thanks to continual review of current content—staying up to date with what authors are doing and what publishers are seeing remains a top priority for our team, and we’re eager to see what 2021 brings to the world of references!