Skip to main content
Version: 9.0.0

Microsoft Academic Graph

Data acquisition

The Microsoft Academic Graph dataset is generated from the latest released version of the graph, 06-12-2021.

Changes from the previous version

  • New workflow: MAG is no longer created within the DOIBoost process. Now, a new workflow normalizes the various MAG tables into a single table, from which the action set is generated.
  • MAG discontinued: It is important to note that MAG has been finished. Therefore, normalization only occurs once data is imported from a complete dump of MAG.

Process

The Microsoft Academic Graph (MAG) is a heterogeneous graph that contains scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study. The MAG schema is designed to capture the rich and complex relationships between these entities.

The main node types in the MAG schema are:

  • Paper: Publications represent works of scientific research, such as articles, books, and book chapters.
  • PaperAbstractsInvertedIndex: used to map the paper abstracts
  • Authors: Authors represent the people who wrote the publications. Institutions: Institutions represent the organizations with which the authors are affiliated.
  • Journals: Journals represent the periodical series in which the publications are published.
  • Conferences: Conferences represent the academic meetings in which the publications are presented.

The main edge types in the MAG schema are:

  • Citation relationships: Citation relationships connect citing publications to cited publications.
  • Affiliation relationships: Affiliation relationships connect authors to the institutions with which they are affiliated.

Preprocess

In the first phase, a normalized table is defined containing all papers and associated relationships.

Mapping MAG properties into the OpenAIRE Graph

Properties in OpenAIRE research products are set based on the logic described in the following table:

OpenAIRE Research Product field pathMAG path(s)Notes
idPaperIdid in the form mag_________::md5(PaperId)
instance.alternateIdentifier[@type = DOI]DoiDOI intersected with Crossref. Only MAG papers with a DOI present in Crossref are filtered
instance.instancetypeDocTypeUsing the dnet:result_typologies vocabulary, we look up the DocType synonym to generate one of the following main entities:
  • publication
  • dataset
  • software
  • otherresearchproduct
maintitleOriginalTitle
publicationdateYearpublication date if Date is not available
publicationdateDate
publicationdateOnlineDateDate the article was put online
publisherPublisher
journal.nameConferenceName
journal.issnPrintedJournalISSN
journal.editionJournalPublisher
journal.ConferencePlaceConferenceLocation
journal.conferencedateConferenceStartDate, ConferenceEndDateconference date as an append of conferencestartdate-conferenceenddate
journal.volVolume
journal.issIssue
journal.spFirstPage
journal.epLastPage
abstractPaper abstract
Author Mapping
author.fullnameAuthorName
organization.legalnameAffiliationName
organization.idAffiliationIdid in the form mag_________::md5(AffiliationId)
organization.idAffiliationIdfor each affiliation we generate an affiliation relation between paper and organization
author.pid[@type = mag]AuthorId
author.rankAuthorSequenceNumber
organization.pidGridId