Skip to content

Triples

Counts RDF triples or quads in files using parallel processing. Supports ZIP, GZIP, and uncompressed files. The output dynamically shows “triples” or “quads” based on the RDF format (quads for nquads and trig, triples for json-ld and turtle).

Terminal window
uv run python -m oc_meta.run.count.triples <DIRECTORY> [OPTIONS]
OptionDefaultDescription
--pattern*.nq.gzGlob pattern for locating files
--formatnquadsRDF format: nquads, json-ld, turtle, trig
--recursivefalseSearch subdirectories recursively
--prov-onlyfalseCount only files in prov subdirectories
--data-onlyfalseCount only files not in prov subdirectories
--workersCPU countNumber of parallel workers
--show-per-filefalsePrint count for each file
--keep-goingfalseContinue processing even if errors occur

Count quads in gzip-compressed N-Quads files:

Terminal window
uv run python -m oc_meta.run.count.triples /data/rdf --recursive

Count triples in ZIP files containing JSON-LD:

Terminal window
uv run python -m oc_meta.run.count.triples /data/rdf --pattern "*.zip" --format json-ld --recursive

Count only data (exclude provenance):

Terminal window
uv run python -m oc_meta.run.count.triples /data/rdf --recursive --data-only

Count only provenance:

Terminal window
uv run python -m oc_meta.run.count.triples /data/rdf --recursive --prov-only

Show per-file counts with 8 workers:

Terminal window
uv run python -m oc_meta.run.count.triples /data/rdf --recursive --workers 8 --show-per-file

Count uncompressed Turtle files:

Terminal window
uv run python -m oc_meta.run.count.triples /data/rdf --pattern "*.ttl" --format turtle