Skip to main content
Version: Next

PubMed

This section describes the mapping implemented for MEDLINE/PubMed.

Input

The native data is collected from the ftp baseline site. It contains XML records compliant with the schema available at www.nlm.nih.gov.

Incremental harvesting

Pubmed exposes an entry point FTP with all the updates for each one. ftp baseline update. We collect the new file and generate the new dataset by upserting the existing item.

Entity Mapping

The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.

OpenAIRE Result field pathPubMed record field xpathNotes
Publication Mapping
id//PMIDid in the form pmid_________::md5(pmid)
pid//PMIDclassid = classname = pmid
publicationdate//PubmedPubDateclean and normalize the format of the date to be YYYY-mm-dd
maintitle//Title
description//AbstractText
language//Languagecleaning vocabulary -> dnet:languages
subjects//DescriptorNameclassId, className = keyword
Author Mapping
author.surname//Author/LastName
author.name//Author/ForeName
author.fullname//Author/FullNameConcatenation of forename + lastName if exist
author.rankFOR ALL AUTHORSsequential number starting from 1
Journal Mapping
container.conferencedate//Journal/PubDatemap the date of the Journal
container.name//Journal/Titlename of the journal
container.vol//Journal/Volumejournal volume
container.issPrinted//Journal/ISSNthe journal issn
container.iss//Journal/IssueThe journal issue
Instance Mapping
instance.type//PublicationTypeif the article contains the typology Journal Article then we apply this type else We have to find a terms that match the vocabulary otherwise we discard it
type
  • \attributes\types\resourceType
  • \attributes\types\resourceTypeGeneral
  • attributes\types\schemaOrg
Using the dnet:result_typologies vocabulary, we look up the instance.type synonym to generate one of the following main entities:
  • publication
  • dataset
  • software
  • otherresearchproduct
instance.pid//PMIDmap the pmid in the pid in the instance
instance.url//PMIDcreates the URL by prepending https://pubmed.ncbi.nlm.nih.gov/ to the PMId
instance.alternateIdentifier//ArticleId[./@IdType="doi"]
instance.publicationdate//PubmedPubDateclean and normalize the format of the date to be YYYY-mm-dd