Convert citations#
Some data sources use temporary identifiers (temp: prefix) to link metadata rows to their citation relationships. These IDs have no meaning outside the processing context and are not persisted in RDF. After Meta processing, the output CSV preserves the mapping between each temp: ID and its assigned OMID (e.g., temp:gesis-ssoar-34729_b1 omid:br/06019115518 in the id column). This script reads that mapping and rewrites citation CSVs with OMIDs, producing output ready for the OpenCitations Index cnc.py.
The standard Index pipeline (meta2redis.py → cnc.py) only loads persistent identifiers (DOI, ISBN, ORCID, etc.) into Redis; temp: IDs are excluded to prevent collisions. When citation data uses temp: IDs, this script is a required pre-processing step: it resolves them to OMIDs before feeding the output to cnc.py. It also validates transitive closure by reporting any citation IDs that could not be mapped.
Usage#
uv run python -m oc_meta.run.meta.convert_citations \
-m /path/to/meta/output/csv/dir \
-c /path/to/citations/dir \
-o /path/to/output/dir
Argument |
Description |
|---|---|
|
Directory containing Meta output CSVs (the curated CSVs with OMIDs in the |
|
Directory containing input citation CSVs with |
|
Directory where converted citation CSVs will be written |
How the mapping works#
Meta’s output CSVs contain all assigned identifiers in the id column, space-separated:
temp:gesis-ssoar-34729_b1 omid:br/06019115518
doi:10.14361/9783839434291 isbn:9783839434291 omid:br/0622049481
The script maps every non-OMID identifier to its OMID. In the first row, temp:gesis-ssoar-34729_b1 maps to omid:br/06019115518. In the second, both the DOI and the ISBN map to omid:br/0622049481.
Input and output format#
Input citation CSVs must have citing_id and cited_id columns. Extra columns are ignored.
citing_id,citing_publication_date,cited_id,cited_publication_date
temp:urn:nbn:de:0168-ssoar-347295,2000,temp:gesis-ssoar-34729_b1,1941
Output CSVs use citing and cited columns with OMIDs, matching the format expected by cnc.py:
citing,cited
omid:br/06019115517,omid:br/06019115518
Unresolvable citations#
If a citing or cited ID has no OMID mapping, the citation row is dropped. The script prints a summary with the count of resolved and unresolved citations, and lists any orphan IDs.
This typically happens when the metadata and citations files were not cleaned together. Both files should always be used as a matched pair from the same cleaning step.