Enrichment by mining | OpenAIRE Graph Documentation

📄️ Affiliation matching

Short description: The goal of the affiliation matching module is to match affiliations extracted from the pdf and xml documents with organizations from the OpenAIRE organization database.

Short description: During a citation matching task, bibliographic entries are linked to the documents that they reference. The citation matching module - one of the modules of the Information Inference Service (IIS) - receives as an input a list of documents accompanied by their metadata and bibliography. Among them, it discovers links described above and returns them as a list. In this document we shall evaluate if the module has been properly integrated with the whole

📄️ Classifiers

Short description: A document classification algorithm that employs analysis of free text stemming from the abstracts of the publications. The purpose of applying a document classification module is to assign a scientific text to one or more predefined content classes.

📄️ Documents similarity

Short description: Document similarity module is responsible for finding similar documents among the ones available in the OpenAIRE Information Space. It produces "similarity" links between the documents stored in the OpenAIRE Information Space. Each link has a similarity score from [0,1] range assigned; it is expected that the higher the score, the more similar are the documents with respect to their content.

📄️ Extraction of acknowledged concepts

Short description: Scans the plaintexts of publications for acknowledged concepts, including grant identifiers (projects) of funders, accession numbers of bioetities, EPO patent mentions, as well as custom concepts that can link research objects to specific research communities and initiatives in OpenAIRE.

📄️ Extraction of cited concepts

Short description: Scans the plaintexts of publications for cited concepts, currently for references to datasets and software URIs.

📄️ Metadata extraction

Short description: Metadata Extraction algorithm is responsible for plaintext and metadata extraction out of the PDF documents. It based on CERMINE project.