Skip to main content

OpenAIRE Graph

Scientific Discovery. Bibliometrics . Open Science Monitor

Methodology - Infrastructure - Access

It's all about transparency.

What is the OpenAIRE Graph?

The OpenAIRE Graph is a free and open resource that brings together and interlinks hundreds of millions of metadata records from over 100k data sources trusted by researchers. The project broke ground in 2012 being one of the first research knowledge graphs and has now grown into one of the world's largest and is the authoritative source for the European Open Science Cloud (EOSC). Here, researchers, communities, institutions, companies, and citizens thrive by freely sharing research products and related information.

Why the need?

As Open Science gradually becomes the norm in research, the way researchers collaborate, publish, discover, and access scientific knowledge has been changing.

For the past decade, researchers have been increasingly publishing research products beyond the article, to share all scientific products generated during experiments or research projects. These include research data, research software, experiments, and other research outputs.

These are then published in scholarly communication data sources (e.g., data archives, institutional repositories, research software repositories), rely where possible on persistent identifiers (e.g., DOI, ORCID, Grid.ac, PDBs), and specify meaningful links to other research products (e.g. supplementedBy, citedBy, versionOf), projects, and funders.

By following such practices, researchers are implicitly constructing a Global Open Science Graph.

The OpenAIRE Graph is available and obtained as a collection of this metadata and links, which are then further enriched with even more metadata and links.

What is a knowledge graph?

By "graph" we mean a large database of research data that is stored in a graph format. A graph is a way of storing data that represents relationships between different entities. In the OpenAIRE Graph, these entities are research outputs, such as papers, datasets, and software. The relationships between these entities represent informative data such as citations, funding, and collaborations.

The OpenAIRE Graph in Comparison

  •   ItemsDatasetsSoftwarePublicationsOther Research ProductsFundersGrantsCitationsOpen Access WorksCost
     OpenAIRE Graph  256.7M 59M 364.1K 175.1M 21.2M 184 3.4M 2.4B 80M Free
     Scopus  94M     94M       1.7B 23.4M  Paid
     Web of Science*  217M 14.5M   ? ? ? ? 2.5B 38M  Paid
     Dimensions*  331.6M 29M ? 140M ? ? 7M 1.7B 33.5M  Paid
     Google Scholar*  >400M (**est.)     >400M (**est.) ?     ? ?  Free
    As of March 2024.

    *Database does not (or it is not stated that they) submit data to deduplication; figures present include multiple counts of the same records and therefore aren't representative of the actual range of data nor the actual number of citations.

    ** Estimation based off the study, Gusenbauer, M. Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases. Scientometrics 118, 177–214 (2019). https://doi.org/10.1007/s11192-018-2958-5
  •   FoS ClassificationsSDG IndicatorsOpen Access IndicatorsAffiliationsDeduplicationOpen DataImpact & Usage Metrics
    OpenAIRE Graph  Y Y Y Y Y Y Y
    Scopus  Y N Y Y Y N N
    Web of Science  Y N Y ? ? N Y
    Dimensions Y Y Y ? N ?
    Google Scholar  N N N N ? N N

    Glossary

    FoS Classifications: Fields of Science Classifications

    SDG Indicators: Indicators of products relevant to the United Nation's Sustainable Development Goals

    Open Access Indicators: Indicators of products that are Open Access Note: The OpenAIRE Graph also includes indicators for tracking Open Access publication models

    Affiliations: Links measured between authors & organisations, or papers & organisations

    Deduplication: Records and citations are subjected to deduplication, meaning that they are not counted more than once, allowing for more accurate monitoring and evaluation

    Open Data: All data provided is free & Open meaning it can be used for your own bibliometric and other analyses

    Impact & Usage Metrics: Indicators on the impact of research products in science and on their usage (e.g., views and downloads in repositories)

Where is it used?

Scientific Discovery

The Graph is used in the OpenAIRE EXPLORE service, enabling users to efficiently search and navigate through an interconnected research ecosystem. Additionally, it serves as the primary resource supporting the exploration and recommendations in the EOSC EU Node (EOSC Portal).

Bibliometrics

Used for research evaluation in replacement of proprietary databases such as Web of Science or Scopus, with several benefits, including openness and transparency, embedded citation metrics and indicators, access to information beyond publications such as data and software, and the ability to easily integrate your own databases.  Get a glimpse in OpenAIRE MONITOR.

Open Science Monitoring

The Graph maintains data related to monitoring Open Access and Open Science policies. It is currently used in  services such as the Open Science Observatory, the EOSC Observatory, and the Irish National OA Monitor.  Also check OpenAIRE MONITOR to view out of the box indicators for organisations.


The Data Model

The heart of the OpenAIRE Graph is a data model, which maps the Scholarly Communication Knowledge Model: collection of interlinked descriptions of concepts, entities, relationships and events. The Graph puts data in context via linking and semantic metadata and therefore provides a framework for data integration, unification, analytics, and sharing

In a nutshell, the main entities are:

  • Research Products: outcomes of research activities.
  • Data Sources: sources from which the metadata of Graph objects are collected.
  • Organisations: companies or research institutions involved in projects, responsible for operating data sources or consisting of the affiliations of Product creators.
  • Projects: research project grants funded by a Funding Stream of a Funder.
  • Communities: groups of people with a common research intent (e.g. research infrastructures, university alliances).
  • Persons: individual researchers who are involved in the design, creation or maintenance of research products. Currently, this is a non-materialized entity type in the Graph, which means that the respective metadata (and relationships) are encapsulated in the author field of the respective research products.

How is the OpenAIRE Graph made?

Governance

The OpenAIRE Graph is a public good which operates under community governance.

This effort is spearheaded by OpenAIRE AMKE, a non-profit organisation comprising 47 member entities representing academic and research institutions dedicated to advancing Open Science.

Through its participatory governance model, OpenAIRE AMKE facilitates the endorsement, adoption, operation, and long-term viability of the Graph within its member base, national contexts, and broader research communities.

A complex aggregation-enrichment-deduplication pipeline

Using the OpenAIRE Guidelines and the metadata validation mechanism in PROVIDE, the OpenAIRE Graph aggregates millions of metadata records collected from trusted data sources that facilitate Open Science. 

  • Repositories registered in OpenDOAR, re3data.org, and FAIRSharing.org
  • Journals registered in DOAJ
  • Pre-print servers 

The content then goes through a deduplication process utilising PIDs and other features of entities.

Once the deduplication process has taken place, the Graph team then enriches the data with records from

Crossref

Unpaywall

ORCID

Microsoft Academic

PubMed

DataCite

OpenCitations

UsageCounts

Information is then extracted from full-texts (project, data, software citations, references) from which data's Fields of Science, SDGs, and citations are classified. The information is also analysed to produce statistics for the OpenAIRE MONITOR, the Open Science Observatory

The in-depth enrichment and deduplication stages in this pipeline allow for a knowledge graph that not only includes a comprehensive dataset but also enhances the overall data quality.

Harnessing the power of AI

To enrich the OpenAIRE Graph, we use a state-of-the-art AI-driven analytical workflow, which is constantly being expanded through partnerships with publishers and AI experts.

Enriching Metadata

Metadata of research artefacts is enriched by adding additional information, such as authors' ORCID and organisation IDs, Fields of Science classifications, objects' Open Access status, and links to similar artefacts.

Identifying Relationships

Relationships, such as affiliation, co-authorship, citations (between publications, data, software) and funding, are identified between research objects.

Providing Recommendations

Recommendations are provided for related research objects, funding opportunities, and collaborators.

Facilitating Data Analysis

Structured and interconnected views of research data are provided which can be used to identify trends, patterns, and relationships in research data. Different types of citation indexes are computed and added into the records.


The Infrastructure

The OpenAIRE Graph is operated and maintained at the ICM cutting-edge Technology centre. There, the Graph is hosted in the Okeanos SuperComputer which consists of 26016 cores in total providing 1082 Tflops/s. The whole setup has been designed with energy efficiency in mind and boasts 1.554 Gflops/Watts Power Efficiency, which has resulted in its position at 160th place on the 'Top500 by energy-efficiency' list (as of 2019).

A Big Data Infrastructure

ICM supports the continuous operation of the infrastructure including data aggregation, deduplication, inference, and provision, ensuring seamless 24/7 system uptime and availability. System administration activities cover hardware maintenance and provisioning of the new computational resources, providing high availability solutions to address resilience to failures by service-level redundancy and load balancing to distribute workloads uniformly across servers.

The most crucial parts of the persisted graph are covered with backups along with their well-defined restoration procedures. All the monitoring activities rely on an aggregated system-level monitoring accessible via various dashboards, giving a better overview of system stability and potential requirements for system elements extension. System level monitoring is supplemented with monitoring availability of all the publicly accessible endpoints. Hence, the high standards of the OpenAIRE public API.

All the maintenance operations undertaken by experienced system administrators are founded on well established routines and emergency maintenance procedures.


Access the OpenAIRE Graph

You may access the Graph data in the different ways seen below, but also via our value added services such as EXPLORE, MONITOR, CONNECT. We are currently investigating access via different commercial cloud services.

Unauthenticated Requests

  • Direct Downloads from Zenodo - 6 month updates
  • REST API with up to 60 request per hour
  • Email Support

Authenticated Requests

  • REST API with up to 7200 requests per hour
  • Access to all enrichments, metrics, with trust measures
  • Technical Support