EMBL-EBIs Protein Data Bank in Europe
This section describes the mapping implemented for EMBL-EBIs Protein Data Bank in Europe.
The Europe PMC RESTful Web Service gives the datalinks API to retrieve data-literature links in Scholix format.
How the data is collected
Starting from the Pubmed collection, the API below is used to obtain the bioentities related to publications for each PubMed identifier.
Example:
curl -s "https://www.ebi.ac.uk/europepmc/webservices/rest/MED/33024307/datalinks?format=json" | jq '.'
{
"version": "6.8",
"hitCount": 9,
"request": {
"id": "33024307",
"source": "MED"
},
"dataLinkList": {
"Category": [
{
"Name": "Nucleotide Sequences",
"CategoryLinkCount": 5,
"Section": [
{
"ObtainedBy": "tm_accession",
"Tags": [
"supporting_data"
],
"SectionLinkCount": 5,
"Linklist": {
"Link": [
{
"ObtainedBy": "tm_accession",
"PublicationDate": "04-11-2022",
"LinkProvider": {
"Name": "Europe PMC"
},
"RelationshipType": {
"Name": "References"
},
"Source": {
"Type": {
"Name": "literature"
},
"Identifier": {
"ID": "33024307",
"IDScheme": "MED"
}
},
"Target": {
"Type": {
"Name": "dataset"
},
"Identifier": {
"ID": "AY278488",
"IDScheme": "ENA",
"IDURL": "http://identifiers.org/ebi/ena.embl:AY278488"
},
"Title": "AY278488",
"Publisher": {
"Name": "Europe PMC"
}
},
[...]
Mapping
The table below describes the mapping from the EBI links records to the OpenAIRE Graph dump format. We filter all the target links with pid type ena, pdb or uniprot For each target we construct a Bioentity with the following mapping
OpenAIRE Result field path | EBI record field xpath | Notes |
---|---|---|
id | target/identifier/ID and target/identifier/IDScheme | id in the form SCHEMA_________::md5(pid) |
pid | target/identifier/ID and target/identifier/IDScheme | classid = classname = schema |
publicationdate | target/PublicationDate | clean and normalize the format of the date to be YYYY-mm-dd |
maintitle | target/Title | |
Instance Mapping | ||
instance.type | Bioentity | |
type | Dataset | |
instance.pid | target/identifier/ID and target/identifier/IDScheme | classid = classname = schema |
instance.url | target/identifier/IDURL | Copy the value as it is |
instance.publicationdate | //PubmedPubDate | clean and normalize the format of the date to be YYYY-mm-dd |
Relation Mapping
OpenAIRE Relation Semantic and inverse | Source/Target type | Notes |
---|---|---|
IsRelatedTo | result/result | we create relationships between the BioEntity and the pubmed publication |