Skip to content

Getting started

Install via pip:

Terminal window
pip install oc_meta

For development, clone the repository and use uv:

Terminal window
git clone https://github.com/opencitations/oc_meta.git
cd oc_meta
uv sync

Meta requires:

  • Python 3.10+
  • Redis for counter handling and caching
  • Triplestore (Virtuoso or Blazegraph) for RDF storage

For local development, you can use Docker.

Redis:

Terminal window
docker run -d --name redis -p 6379:6379 redis:latest

Virtuoso (data):

Terminal window
docker run -d --name virtuoso-data -p 8890:8890 -p 1111:1111 openlink/virtuoso-opensource-7:latest

Virtuoso (provenance):

Terminal window
docker run -d --name virtuoso-prov -p 8891:8890 -p 1112:1111 openlink/virtuoso-opensource-7:latest
  1. Create a configuration file (meta_config.yaml):
triplestore_url: "http://127.0.0.1:8890/sparql"
provenance_triplestore_url: "http://127.0.0.1:8891/sparql"
base_iri: "https://w3id.org/oc/meta/"
context_path: "https://w3id.org/oc/corpus/context.json"
resp_agent: "https://w3id.org/oc/meta/prov/pa/1"
source: "https://api.crossref.org/"
redis_host: "localhost"
redis_port: 6379
redis_db: 0
redis_cache_db: 1
supplier_prefix: "060"
dir_split_number: 10000
items_per_file: 1000
input_csv_dir: "/path/to/input"
  1. Prepare input CSV with these columns:
ColumnExample
iddoi:10.1162/qss_a_00292
titleOpenCitations Meta
authorPeroni, Silvio [orcid:0000-0003-0530-4305]; Shotton, David
pub_date2024-01-22
venueQuantitative Science Studies [issn:2641-3337]
volume5
issue1
page50-75
typejournal article
publisherMIT Press [crossref:281]
editor(same format as author)

See CSV format for supported identifiers and formats

  1. Run processing:
Terminal window
uv run python -m oc_meta.run.meta_process -c meta_config.yaml

See the configuration reference for all available options.

A production workflow usually follows these steps:

  1. Preprocess - Deduplicate input and filter existing IDs
  2. Process - Run the main Meta pipeline
  3. Verify - Check that all identifiers were processed correctly

Preprocess (optional but recommended):

Terminal window
uv run python -m oc_meta.run.meta.preprocess_input input/ preprocessed/ --storage-type redis

Process:

Terminal window
uv run python -m oc_meta.run.meta_process -c meta_config.yaml

Verify:

Terminal window
uv run python -m oc_meta.run.meta.check_results meta_config.yaml --output report.txt