feat(Crossref): add CrossrefPIDProvider#2177
feat(Crossref): add CrossrefPIDProvider#2177mfenner wants to merge 21 commits intoinveniosoftware:masterfrom
Conversation
* register DOIs with Crossref, functionality analogous to DataCitePIDProvider * optionally multiple prefixes with per community prefix * CrossrefXMLSerializer with metadata export
* register DOIs with Crossref, functionality analogous to DataCitePIDProvider * optionally multiple prefixes with per community prefix * CrossrefXMLSerializer with metadata export
…er/invenio-rdm-records into crossref_pid_provider
| community_prefix = dig(community.data, "custom_fields.rs:prefix") | ||
| if community_prefix and community_prefix in [str(p) for p in prefixes]: | ||
| prefix = str(community_prefix) | ||
| current_app.logger.debug( | ||
| f"Using DOI prefix: {prefix} for community: {default_community}" | ||
| ) |
There was a problem hiding this comment.
I think that we need to discuss this internally (ping @slint) because there is a hidden assumption that this custom_field will exist.
I am wondering if we have a better way to handle this, for example by allowing a different custom field name, or using a config map instead of a custom field (just ideas).
Does this support minting it without a community?
There was a problem hiding this comment.
This checks for the custom_field and falls back to the first prefix defined. My assumption was that a future update to invenio-communities would provide the prefix, either as an optional standard field or as a built-in custom field similar to the journal custom field. The current implementation works exactly as DATACITE_PREFIX if only one Crossref prefix is defined.
Co-authored-by: Nicola <ntarocco@gmail.com>
Co-authored-by: Nicola <ntarocco@gmail.com>
| **options, | ||
| ) | ||
|
|
||
| def dump_obj(self, record): |
There was a problem hiding this comment.
This whole function looks not really as a invenio serializer looks like. for me it is to much going on in the dump_obj method.
would it be possible to move this stuff into the CrossrefXMLSchema or a subclass created in rdm-records like class RDMCrossrefXMLSchema(CrossrefXMLSChema):
in my thinking it should not be necessary to override the dump_obj method. it schema.dump should do the magick and return a xml if that is necessary.
There was a problem hiding this comment.
I simplified the dump_obj method, using the write_crossref_xml method from commonmeta-py instead. What makes serialization complicated is that Crossref XML has a different field order depending on content type, therefore the field order has to be changed depending on type before XML serialization.
899d7f6 to
d26656d
Compare
4e4fddf to
a380b57
Compare
|
@slint what is your recommendation regading allowing prefixes per community? Currently this requires a community custom_field, falling back to one Crossref prefix per repo - the same as the DataCite PID Provider. Should prefix become an optional standard field in invenio-communities? Should this also work with the DataCite PID Provider? Or should this be handled internally in the Crossrer PID Provider? I could write an optional function for the Crossref PID Provider that determines which prefix to use. And this could be overwritten in invenio.cfg, e.g. if the prefix is determined not by community but some other criterium, e.g. content type. In this case no changes to invenio-communities would be needed, and at a later time similar functionality could be implemented for the DataCite PID Provider. |
|
@ntarocco @slint I looked into how I can determine the prefix based on the default community without code changes to In my instance I currently have about 175 blog communities and almost all use a standard prefix that could be provided by CROSSREF_PREFIX. Eight (soon 10) communities use a custom prefix. Instead of providing the prefix as community metadatra, I could also define a dict configuration variable, e.g. CROSSREF_PREFIXES, that maps community slug to prefix and encode the logic in the |
|
For the use case of using the DataCite and Crossref (and External) DOI Provider in the same instance, it may make sense to implement the "pick the prefix" logic further upstream, i.e. in the Base Provider The generate_doi method in the Crossref DOI Provider could then be as simple as in the DataCite DOI Provider, with the "prefix picked" based in via self. |
|
I'm thinking the prefix needs to be set at the provider level, and then there needs to be a way to configure what provider gets used. A map from community slug to provider could be a simple initial implementation. For DataCite the prefix gets passed into the client, so I don't think it could just live in the |
|
I would add that it depends if this should be a user choice, or not. In general, once the prefix and provider is configured for your community, it is hardly changed. However, should the community's manager define its own prefix and provider from the web interface? Given that DOI minting requires a contract with the provider, I would say that users should not be able to configure it. |
|
Thinking more about this, I would suggest to separate out the "prefix per community" into a separate PR, as this is not specific to Crossref but equally could be used with DataCite DOIs. I would then refactor this PR to do DOI assignment basically as in the DataCite DOI Provider. Until this second PR is approved and merged, I can use site-specific code for the prefix by community feature in my instance. For. the "prefix per community" second PR would would be good to decide beforehand whether a) we first want to update invenio-communities to include optional prefix metadata, b) whether this PR should also cover using both DataCite and Crossref in one instance as this use case is closely related (use DataCite for the instance with the exception of a handful communities that use Crossref). And I agree with Nico that this doesn't need to be configured in the UI, a COMMUNITY_PREFIXES configuration option that maps communities to prefixes that override the default (ideally using the community slug and not id) would also work |
|
Before you go ahead, another question: in InvenioRDM, you can move records from one community to the other (I don't mean including them in multiple communities, I mean transfer it from the main community). How do we deal with that? Does it create any issue if the DOI was minted with the community prefix, and then transferred to another community? What happens if the user uploads a second version of the record after the transfer? The record will have a different prefix.
If we don't want the user to act on it, is there any advantage to add the prefix as community's metadata? I would only display the configuration in the community's settings, as a greyed out info, disabled, cannot be changed. Just for information (if really interesting to show?)
For me, the important part is to keep PRs small, clean, easy to review and understand the impact. There are more chances that it can be merged more quickly.
Watch out! The community slug can be changed by the user via the web interface. We use the UUID everywhere instead for that reason. |
|
@ntarocco you have convinced me that there really is no strong reason to have prefixes configured in invenio-communities, but that a central config variable mapping id (not slug!) to prefix makes more sense. And that a prefix should never change if a record default community is changed or the record gets a new version. Enough detail to think about that separating this out into a separate PR makes sense. |
|
I have now refactored the pull request and deployed to my production instance. The prefix by community logic is now implemented in a RDM_COMMUNITY_DOI_PREFIXES config variable, which is optional and a dict that maps community_ids to DOI prefixes. It is implemented in the base class for the PIDProvider, so works for the Crossref and DataCite DOI Provider (and hopefully also for a combination, but not tested). I had to make a very small change to the DataCitePIDProvider, adding a kwargs argument that can pass in the community prefix to the DataCiteClient The CrossrefPIDProvider is now simplified and looks very similar to the DataCite Provider, no use of community custom_fields and the prefixes config variable has been renamed to CROSSREF_PREFIX. I ended up not separating this into another pull request as the code change is minimal (optional new config variable and 8 lines of new code in the base class of the PIDProvider, and we are having a longer discussion on this in this PR already. Plus, this makes it easier for me to run this code in a live instance. |
|
Something to keep in mind in this multiple DOI prefix discussion: while this makes DOI registration more complicated, it is not a cost factor. Prefixes are free with DataCite and Crossref (and they are not charged by the DOI Foundation), so if your use case requires multiple DOI prefixes, go for it. |
* enable crossref pid provider in default config * add crossref_mock_client
274695b to
08b278a
Compare
fd28b4a to
a134ae6
Compare
|
I'll drop my two cents here:
Given the complexity (but also good set of use-cases) of |
a134ae6 to
b4e3300
Compare
|
@slint thank you for your feedback. I have removed all code that enables multiple DOI prefixes, and will implement this in a separate PR. The Crossref DOI Provider is now a drop-in replacement for the DataCite DOI Provider (with the limitation that Crossref only supports textual resource types). |
b4e3300 to
05a0f83
Compare
|
I now have implemented a CustomPIDProvider class (currently in invenio.cfg) as a subclass of the CrossrefPIDProvider that overrides the generate_id method to support community-specific DOI prefixes. This should work the same way to support community-specific DOI prefixes with the DataCitePIDProvider. |
3aaf5e3 to
cafce5b
Compare
cafce5b to
9ef5c19
Compare
|
This PR was automatically marked as stale. |
|
This needs a rebase. And we should really move this forward since it's now a fairly straightforward addition of CrossRef in a way that doesn't impact the rest of RDM. |
|
I can try that rebase tomorrow. |
|
Closing this PR in favor of #2249 as too many code changes have happened since originally opening the PR. |
❤️ Thank you for your contribution!
Description
This pull-request adds Crossref DOI registration support to InvenioRDM. The configuration is very similar to DataCite DOI registration, but multiple prefixes are supported via CROSSREF_PREFIXES. The new PID provider should not interfere with DataCite DOI registration. DOI registration with both DataCite and Crossref has not been tested and needs additional work (decision tree which PID provider is used for a record). Multiple prefixes with one prefix per community are currently supported via a community custom_field.
The Crossref PID provider makes heavy use of the commonmeta-py library, similar to how the datacite library is used in the DataCite PID provider. commonmeta-py already was a dependency, but a newer version (>= 0.183) is required for Crossref DOI registration functionality.
Checklist
Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:
Frontend
Reminder
By using GitHub, you have already agreed to the GitHub’s Terms of Service including that: