RDF from export#
Processes gzipped RDF files (JSON-LD or N-Quads) and organizes them into OC Meta’s standard directory structure with configurable splitting parameters.
Usage#
uv run python -m oc_meta.run.migration.rdf_from_export <input_folder> <output_root> [options]
Parameters#
Parameter |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Folder containing gzipped input files |
|
Yes |
- |
Root folder for output OC Meta RDF files |
|
No |
Base URI of entities |
|
|
No |
10000 |
Number of files per folder |
|
No |
1000 |
Number of items per file |
|
No |
True |
Zip output JSON files |
|
No |
jsonld |
Input format: |
|
No |
1000 |
Files to process before merging |
|
No |
None |
File to store processed file names |
|
No |
./.stop |
File to signal process termination |
Process#
Reads gzipped RDF files from the input folder
Parses each file according to the specified format
Extracts entity URIs and determines output paths based on OC Meta’s structure
Writes entities to appropriate JSON-LD files in the output directory
Merges files in parallel after each chunk is processed
Example#
uv run python -m oc_meta.run.migration.rdf_from_export /data/export /srv/oc_meta/rdf \
--input_format nquads \
--file_limit 10000 \
--item_limit 1000 \
--cache_file /tmp/processed.txt
Graceful shutdown#
Create a .stop file (or the path specified with --stop_file) to gracefully terminate the process after the current chunk completes.