Skip to content

Verification

After running Meta, use the verification script to check that all identifiers were processed correctly and have associated data in the triplestore.

Terminal window
uv run python -m oc_meta.run.meta.check_results <CONFIG_PATH> <OUTPUT_FILE>

Example:

Terminal window
uv run python -m oc_meta.run.meta.check_results meta_config.yaml report.json

The script exits with code 0 if all checks pass, or 1 if any errors are found.

The script parses all identifiers from input CSV files, including:

  • id column (DOIs, PMIDs, etc.)
  • author column (ORCID identifiers)
  • editor column (ORCID identifiers)
  • publisher column (Crossref identifiers)
  • venue column (ISSNs, ISBNs)

For each identifier, the script queries the triplestore to check:

  • Does the identifier have an associated OMID?
  • Does any identifier have multiple OMIDs? (indicates disambiguation issues)

Since RDF files are always generated:

  • Verifies that RDF files exist for each entity
  • Reports missing data graphs

For each OMID found:

  • Queries the provenance triplestore
  • Verifies provenance graphs exist
  • Reports OMIDs missing provenance data

The script produces a JSON report with the following structure:

{
"status": "PASS",
"timestamp": "2026-04-04T12:00:00",
"config_path": "/path/to/meta_config.yaml",
"total_files_processed": 3,
"files": [
{
"file": "input.csv",
"total_rows": 100,
"rows_with_ids": 95,
"total_identifiers": 200,
"identifiers_with_omids": 190,
"identifiers_without_omids": 10
}
],
"summary": {
"total_rows": 100,
"total_identifiers": 200,
"identifiers_with_omids": 190,
"identifiers_without_omids": 10,
"omids_with_provenance": 185,
"omids_without_provenance": 5
},
"errors": [
{
"type": "missing_omid",
"schema": "doi",
"value": "10.1234/example",
"file": "input.csv",
"row": 5,
"column": "id"
}
],
"warnings": [
{
"type": "multiple_omids",
"identifier": "doi:10.1234/duplicate",
"omid_count": 2,
"omids": ["https://w3id.org/oc/meta/br/0601", "https://w3id.org/oc/meta/br/0602"],
"occurrences": [{"file": "input.csv", "row": 10, "column": "id"}]
}
]
}
  • status: "PASS": all identifiers have OMIDs and all OMIDs have provenance. Exit code 0.
  • status: "FAIL": at least one error found. Exit code 1.
  • missing_omid: an identifier from the input CSV has no corresponding OMID in the triplestore. Indicates a processing failure.
  • missing_provenance: an OMID exists in the triplestore but has no provenance record. Indicates incomplete ingestion.
  • multiple_omids: an identifier is associated with more than one OMID across files. Indicates a disambiguation issue that should be resolved via the merge pipeline.