Getting started#

Installation#

Install via pip:

pip install oc_meta

For development, clone the repository and use uv:

git clone https://github.com/opencitations/oc_meta.git
cd oc_meta
uv sync

Prerequisites#

Meta requires:

  • Python 3.10+

  • Triplestore (Virtuoso or Blazegraph) for RDF storage

For local development, you can use Docker.

Virtuoso (data):

docker run -d --name virtuoso-data -p 8890:8890 -p 1111:1111 openlink/virtuoso-opensource-7:latest

Virtuoso (provenance):

docker run -d --name virtuoso-prov -p 8891:8890 -p 1112:1111 openlink/virtuoso-opensource-7:latest

Your first run#

  1. Create a configuration file (meta_config.yaml):

triplestore_url: "http://127.0.0.1:8890/sparql"
provenance_triplestore_url: "http://127.0.0.1:8891/sparql"
base_iri: "https://w3id.org/oc/meta/"
resp_agent: "https://w3id.org/oc/meta/prov/pa/1"
source: "https://api.crossref.org/"

supplier_prefix: "060"
dir_split_number: 10000
items_per_file: 1000

input_csv_dir: "/path/to/input"
  1. Prepare input CSV with these columns:

Column

Example

id

doi:10.1162/qss_a_00292

title

OpenCitations Meta

author

Peroni, Silvio [orcid:0000-0003-0530-4305]; Shotton, David

pub_date

2024-01-22

venue

Quantitative Science Studies [issn:2641-3337]

volume

5

issue

1

page

50-75

type

journal article

publisher

MIT Press [crossref:281]

editor

(same format as author)

See CSV format for supported identifiers and formats

  1. Run processing:

uv run python -m oc_meta.run.meta_process -c meta_config.yaml

See the configuration reference for all available options.

Typical workflow#

A production workflow usually follows these steps:

  1. Preprocess - Deduplicate input and filter existing IDs

  2. Process - Run the main Meta pipeline

  3. Verify - Check that all identifiers were processed correctly

Preprocess (optional but recommended):

uv run python -m oc_meta.run.meta.preprocess_input input/ preprocessed/ --redis-port 6379

Process:

uv run python -m oc_meta.run.meta_process -c meta_config.yaml

Verify:

uv run python -m oc_meta.run.meta.check_results meta_config.yaml report.json

Next steps#