Check info dir#
Verifies that filesystem counter files are consistent with the provenance data in the RDF files. Performs two checks:
Entity counters (
info_file_*.txt): the global counter for each entity type must be greater than or equal to the maximum resource number found in the provenance files.Provenance counters (
prov_file_*.txt): the counter value for each entity must match the maximum snapshot number found in its provenance file.
Usage#
uv run python -m oc_meta.run.infodir.check <directory> <info_dir> [-o OUTPUT]
Parameters#
Parameter |
Required |
Description |
|---|---|---|
|
Yes |
Path to the RDF directory |
|
Yes |
Base directory for counter files |
|
No |
Output JSON report path (default: |
Process#
Loads all counter files from the info directory into memory
Collects all provenance ZIP files from the RDF directory
Processes each ZIP in parallel: extracts entity URIs and max snapshot numbers, compares against in-memory provenance counters
After processing all files, compares the global max resource number per entity type against entity counters
Writes a structured JSON report
Example#
uv run python -m oc_meta.run.infodir.check /srv/oc_meta/rdf /srv/oc_meta/info_dir -o /tmp/report.json
Output#
A JSON report with the following structure:
{
"timestamp": "2026-05-01T12:00:00+00:00",
"root_path": "/srv/oc_meta/rdf",
"info_dir": "/srv/oc_meta/info_dir",
"total_zip_files": 1364452,
"total_mismatched_entity_counters": 1,
"total_mismatched_prov_counters": 3,
"mismatched_entity_counters": [
{
"prefix": "060",
"short_name": "br",
"expected_min": 500000,
"actual": 400000
}
],
"mismatched_prov_counters": [
{
"entity_uri": "https://w3id.org/oc/meta/br/06101234",
"expected": 3,
"actual": 2,
"zip_file": "/srv/oc_meta/rdf/br/060/10000/1000/prov/se.zip"
}
]
}