Provenance to N-Quads
Recursively searches for se.zip provenance files, extracts JSON-LD content, converts it to N-Quads format, and verifies the quad count matches between input and output.
uv run python -m oc_meta.run.migration.provenance_to_nquads <input_dir> <output_dir> [options]Parameters
Section titled “Parameters”| Parameter | Required | Default | Description |
|---|---|---|---|
input_dir | Yes | - | Directory containing se.zip files (searched recursively) |
output_dir | Yes | - | Output directory for converted .nq files |
-w, --workers | No | CPU count | Number of worker processes |
Process
Section titled “Process”- Recursively finds all
se.zipfiles in the input directory - For each archive, extracts the JSON-LD content
- Converts JSON-LD to N-Quads using rdflib
- Writes the output to a flat directory with filenames derived from the path
- Verifies that the quad count matches between input and output
Output naming
Section titled “Output naming”Output filenames are derived from the relative path of the source file, with path separators replaced by dashes:
- Input:
ra/0610/10000/1000/prov/se.zip - Output:
ra-0610-10000-1000-prov-se.nq
Example
Section titled “Example”uv run python -m oc_meta.run.migration.provenance_to_nquads /srv/oc_meta/rdf /data/provenance_nquads \ --workers 8Report
Section titled “Report”At completion, the script reports the number of successfully processed files and failures:
Final report Success: 12345 Failed: 0