Skip to content

RDF from export

Processes gzipped RDF files (JSON-LD or N-Quads) and organizes them into OC Meta’s standard directory structure with configurable splitting parameters.

Terminal window
uv run python -m oc_meta.run.migration.rdf_from_export <input_folder> <output_root> [options]
ParameterRequiredDefaultDescription
input_folderYes-Folder containing gzipped input files
output_rootYes-Root folder for output OC Meta RDF files
--base_iriNohttps://w3id.org/oc/meta/Base URI of entities
--file_limitNo10000Number of files per folder
--item_limitNo1000Number of items per file
--zip_outputNoTrueZip output JSON files
--input_formatNojsonldInput format: jsonld or nquads
--chunk_sizeNo1000Files to process before merging
--cache_fileNoNoneFile to store processed file names
--stop_fileNo./.stopFile to signal process termination
  1. Reads gzipped RDF files from the input folder
  2. Parses each file according to the specified format
  3. Extracts entity URIs and determines output paths based on OC Meta’s structure
  4. Writes entities to appropriate JSON-LD files in the output directory
  5. Merges files in parallel after each chunk is processed
Terminal window
uv run python -m oc_meta.run.migration.rdf_from_export /data/export /srv/oc_meta/rdf \
--input_format nquads \
--file_limit 10000 \
--item_limit 1000 \
--cache_file /tmp/processed.txt

Create a .stop file (or the path specified with --stop_file) to gracefully terminate the process after the current chunk completes.