Skip to main content
Version: 5.1.2

Extraction of acknowledged concepts

Short description: Scans the plaintexts of publications for acknowledged concepts, including grant identifiers (projects) of funders, accession numbers of bioetities, EPO patent mentions, as well as custom concepts that can link research objects to specific research communities and initiatives in OpenAIRE.

Algorithmic details: The algorithm processes the publication's fulltext and extracts references to acknowledged concepts. It applies pattern matching and string join between the fulltext and a target database which contains the title, the acronym and the identifier of the searched concept.

Parameters: Concept titles, acronyms, and identifiers, publication's identifiers and fulltexts

Limitations: -

Environment: Python, madIS, APSW

References:

  • Foufoulas, Y., Zacharia, E., Dimitropoulos, H., Manola, N., Ioannidis, Y. (2022). DETEXA: Declarative Extensible Text Exploration and Analysis. In: , et al. Linking Theory and Practice of Digital Libraries. TPDL 2022. Lecture Notes in Computer Science, vol 13541. Springer, Cham. doi:10.1007/978-3-031-16802-4_9

Authority: ATHENA RC License: CC-BY/CC-0 Code: iis/referenceextraction