Skip to content

OpenCitations Meta

Process bibliographic metadata and generate RDF compliant with the OpenCitations Data Model

Data curation

Validates, normalizes, and cleans bibliographic metadata from CSV files. Handles identifier validation, duplicate detection, and data normalization.

RDF generation

Converts curated data into RDF following the OpenCitations Data Model. Creates bibliographic resources, responsible agents, and identifiers with provenance tracking.

Duplicate detection

Identifies duplicate entities across the dataset by analyzing identifiers in RDF files. Groups related entities for batch processing.

Entity merging

Merges duplicate entities using Union-Find algorithm. Handles bibliographic resources, responsible agents, and identifiers with parallel processing.

Entity editing

Modifies existing RDF entities: add, update, or delete triples. Generates provenance snapshots for each modification.

CSV generation

Generates CSV dumps from RDF data. Extracts bibliographic metadata back to tabular format for analysis or migration.

Info dir management

Manages Redis counters for entity numbering. Rebuilds counters from RDF files after system recovery or data import.

Migration tools

Imports RDF from external sources, extracts subsets from triplestores, and converts provenance formats.

Benchmarks

Measures processing performance with synthetic data. Supports scalability analysis across dataset sizes.

Install:

Terminal window
pip install oc_meta

Run the main processing pipeline:

Terminal window
python -m oc_meta.run.meta_process -c meta_config.yaml

Meta expects CSV files with these columns:

ColumnDescription
idSpace-separated identifiers (doi:10.1162/qss_a_00292 pmid:38034492)
titleTitle of the work
authorSemicolon-separated names with optional identifiers (Peroni, Silvio [orcid:0000-0003-0530-4305]; Shotton, David)
pub_dateISO 8601 date (2024-01-22, 2024-01, or 2024)
venueContainer title with optional identifier (Quantitative Science Studies [issn:2641-3337])
volumeVolume number
issueIssue number
pagePage range (50-75)
typeResource type (journal article, book chapter, proceedings article, etc.)
publisherPublisher name with optional identifier (MIT Press [crossref:281])
editorSame format as author

See the CSV format reference for the complete specification.