Verification#

After running Meta, use the verification script to check that all identifiers were processed correctly and have associated data in the triplestore.

Running verification#

uv run python -m oc_meta.run.meta.check_results <CONFIG_PATH> <OUTPUT_FILE>

Example:

uv run python -m oc_meta.run.meta.check_results meta_config.yaml report.json

The script exits with code 0 if all checks pass, or 1 if any errors are found.

What it checks#

1. Identifier analysis#

The script parses all identifiers from input CSV files, including:

  • id column (DOIs, PMIDs, etc.)

  • author column (ORCID identifiers)

  • editor column (ORCID identifiers)

  • publisher column (Crossref identifiers)

  • venue column (ISSNs, ISBNs)

2. OMID verification#

For each identifier, the script queries the triplestore to check:

  • Does the identifier have an associated OMID?

  • Does any identifier have multiple OMIDs? (indicates disambiguation issues)

3. Data graph verification#

Since RDF files are always generated:

  • Verifies that RDF files exist for each entity

  • Reports missing data graphs

4. Provenance verification#

For each OMID found:

  • Queries the provenance triplestore

  • Verifies provenance graphs exist

  • Reports OMIDs missing provenance data

Output format#

The script produces a JSON report with the following structure:

{
  "status": "PASS",
  "timestamp": "2026-04-04T12:00:00",
  "config_path": "/path/to/meta_config.yaml",
  "total_files_processed": 3,
  "files": [
    {
      "file": "input.csv",
      "total_rows": 100,
      "rows_with_ids": 95,
      "total_identifiers": 200,
      "identifiers_with_omids": 190,
      "identifiers_without_omids": 10
    }
  ],
  "summary": {
    "total_rows": 100,
    "total_identifiers": 200,
    "identifiers_with_omids": 190,
    "identifiers_without_omids": 10,
    "omids_with_provenance": 185,
    "omids_without_provenance": 5
  },
  "errors": [
    {
      "type": "missing_omid",
      "schema": "doi",
      "value": "10.1234/example",
      "file": "input.csv",
      "row": 5,
      "column": "id"
    }
  ],
  "warnings": [
    {
      "type": "multiple_omids",
      "identifier": "doi:10.1234/duplicate",
      "omid_count": 2,
      "omids": ["https://w3id.org/oc/meta/br/0601", "https://w3id.org/oc/meta/br/0602"],
      "occurrences": [{"file": "input.csv", "row": 10, "column": "id"}]
    }
  ]
}

Status semantics#

  • status: "PASS": all identifiers have OMIDs and all OMIDs have provenance. Exit code 0.

  • status: "FAIL": at least one error found. Exit code 1.

Error types#

  • missing_omid: an identifier from the input CSV has no corresponding OMID in the triplestore. Indicates a processing failure.

  • missing_provenance: an OMID exists in the triplestore but has no provenance record. Indicates incomplete ingestion.

Warning types#

  • multiple_omids: an identifier is associated with more than one OMID across files. Indicates a disambiguation issue that should be resolved via the merge pipeline.