Version: Next

Merge by id

In the metadata aggregation system it is common to find the same record provided by different datasources and, sometimes, even inside the same datasource (especially in case of aggregators). As the harmonisation processes are performed per datasource contents, the relative records are the output of different mapping implementations. This approach has the advantage to be deeply customisable to catch datasource specific aspects, but it leaves room for inconsistencies when evaluating the different mappings across the various datasources.

This phase is therefore responsible to compensate for such inconsistencies and performs a global grouping of every record available in the graph:

entities are grouped by id
relations are grouped by source, target, reltype

This ensures that the same record, possibly assigned to different types by different mappings, appears only once in the graph and under a single typing. In case of clashing identifiers, the properties are merged (including the provencance information), considering the following precedence order for the result typing:

publication > dataset > software > other

The same holds for relationships, as the same (e.g.) DOI-to-DOI citation relation could be aggregated from multiple sources, this grouping phase would collapse all the different duplicates onto a single relation that would however include all the individual provenances.