Skip to content

RDF to N-Quads

Recursively searches for ZIP files containing JSON-LD data and converts the content to N-Quads format.

Terminal window
uv run python -m oc_meta.run.migration.rdf_to_nquads <input_dir> <output_dir> [options]
ParameterRequiredDefaultDescription
input_dirYes-Directory containing ZIP files (searched recursively)
output_dirYes-Output directory for converted .nq files
-m, --modeNoallMode: all for all ZIP files, data for entity data only, prov for provenance only
-w, --workersNoCPU countNumber of worker processes
-c, --compressNodisabledCompress output files using 7z format

Processes all ZIP files, both entity data and provenance.

Terminal window
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/nquads

Searches for numeric ZIP files (e.g., 1000.zip, 2000.zip) excluding se.zip and files inside prov/ directories.

Terminal window
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/meta_nquads --mode data

Searches for se.zip files in prov/ directories. These contain provenance snapshots.

Terminal window
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/provenance_nquads --mode prov

By default, output files are written as plain text .nq files. Use the --compress flag to compress each output file individually using 7z format:

Terminal window
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/nquads --compress

Each N-Quads file is compressed into its own .nq.7z archive. 7z offers better compression ratios than ZIP, reducing storage requirements for large datasets.

  1. Recursively finds ZIP files based on the selected mode
  2. For each archive, extracts the JSON-LD content
  3. Converts JSON-LD to N-Quads using rdflib
  4. Writes the output to a flat directory with filenames derived from the path

Output filenames are derived from the relative path of the source file, with path separators replaced by dashes:

  • Input: ra/0610/10000/1000/prov/se.zip → Output: ra-0610-10000-1000-prov-se.nq
  • Input: br/060/10000/1000.zip → Output: br-060-10000-1000.nq

With --compress enabled:

  • Input: br/060/10000/1000.zip → Output: br-060-10000-1000.nq.7z

Convert all RDF data:

Terminal window
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/all_nquads \
--workers 8

Convert entity data only:

Terminal window
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/meta_nquads \
--mode data --workers 8

Convert provenance only:

Terminal window
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/provenance_nquads \
--mode prov --workers 8

Convert with 7z compression:

Terminal window
uv run python -m oc_meta.run.migration.rdf_to_nquads /srv/oc_meta/rdf /data/compressed_nquads \
--compress --workers 8

At completion, the script reports the number of successfully processed files and failures:

Final report
Success: 12345
Failed: 0