RDF to N-Quads#
Recursively searches for ZIP files containing JSON-LD data and converts the content to N-Quads format.
Usage#
uv run python -m oc_meta.run.migration.rdf_to_nquads <input_dir> <output_dir> [options]
Parameters#
Parameter |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Directory containing ZIP files (searched recursively) |
|
Yes |
- |
Output directory for converted .nq files |
|
No |
|
Mode: |
|
No |
CPU count |
Number of worker processes |
|
No |
disabled |
Compress output files using 7z format |
Modes#
All mode (default)#
Processes all ZIP files, both entity data and provenance.
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/nquads
Data mode#
Searches for numeric ZIP files (e.g., 1000.zip, 2000.zip) excluding se.zip and files inside prov/ directories.
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/meta_nquads --mode data
Provenance mode#
Searches for se.zip files in prov/ directories. These contain provenance snapshots.
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/provenance_nquads --mode prov
Compression#
By default, output files are written as plain text .nq files. Use the --compress flag to compress each output file individually using 7z format:
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/nquads --compress
Each N-Quads file is compressed into its own .nq.7z archive. 7z offers better compression ratios than ZIP, reducing storage requirements for large datasets.
Process#
Recursively finds ZIP files based on the selected mode
For each archive, extracts the JSON-LD content
Converts JSON-LD to N-Quads using rdflib
Writes the output to a flat directory with filenames derived from the path
Output naming#
Output filenames are derived from the relative path of the source file, with path separators replaced by dashes:
Input:
ra/0610/10000/1000/prov/se.zip→ Output:ra-0610-10000-1000-prov-se.nqInput:
br/060/10000/1000.zip→ Output:br-060-10000-1000.nq
With --compress enabled:
Input:
br/060/10000/1000.zip→ Output:br-060-10000-1000.nq.7z
Examples#
Convert all RDF data:
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/all_nquads \
--workers 8
Convert entity data only:
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/meta_nquads \
--mode data --workers 8
Convert provenance only:
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/provenance_nquads \
--mode prov --workers 8
Convert with 7z compression:
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/compressed_nquads \
--compress --workers 8
Report#
At completion, the script reports the number of successfully processed files and failures:
Final report
Success: 12345
Failed: 0