Stream N-Quads#
Converts JSON-LD ZIP archives to N-Quads using rdflib and writes the output to stdout. Designed to pipe directly into QLever’s indexer, eliminating the need for intermediate files on disk.
Usage#
uv run python -m oc_meta.run.migration.stream_nquads <rdf_dir> [options]
Parameters#
Parameter |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
Root directory containing RDF ZIP archives |
|
No |
|
Mode: |
|
No |
min(8, CPU count) |
Number of worker processes |
Modes#
All mode (default)#
Processes all ZIP files, both entity data and provenance.
uv run python -m oc_meta.run.migration.stream_nquads /srv/oc_meta/rdf > output.nq
Data mode#
Processes numeric ZIP files (e.g., 1000.zip, 2000.zip) excluding se.zip and files inside prov/ directories.
uv run python -m oc_meta.run.migration.stream_nquads /srv/oc_meta/rdf --mode data > data.nq
Provenance mode#
Processes se.zip files in prov/ directories.
uv run python -m oc_meta.run.migration.stream_nquads /srv/oc_meta/rdf --mode prov > prov.nq
Piping into QLever#
The primary use case is piping directly into QLever’s indexer via Docker:
uv run python -m oc_meta.run.migration.stream_nquads /srv/oc_meta/rdf --mode data | \
docker run --rm -i -u $(id -u):$(id -g) \
--mount type=bind,src=$(pwd),target=/index -w /index \
--entrypoint qlever-index \
docker.io/adfreiburg/qlever:latest \
-i index-name -s index-name.settings.json -F nq -f - \
--stxxl-memory 50G
See the index.sh scripts in the QLever data directories for ready-to-use examples.
Output format#
Each line is a valid N-Quads statement. Named graphs come from the @id field at the top level of the JSON-LD array:
Data files produce triples in shared graphs like
<https://w3id.org/oc/meta/br/>Provenance files produce triples in per-entity graphs like
<https://w3id.org/oc/meta/br/06790181/prov/>
The N-Quads stream is written to stdout with no other output, so it can be piped directly into other tools.