Skip to content

Bulk loader

This script offers a sequential method to load N-Quads Gzipped files (*.nq.gz) into a Virtuoso instance using the standard Virtuoso bulk loading procedure (ld_dir/ld_dir_all followed by rdf_loader_run).

This script only processes files ending in .nq.gz. This restriction is intentional and provides significant performance advantages:

  • Avoiding artificial throttling: By reading compressed data, the load on Virtuoso’s internal processing is more balanced. Loading many uncompressed files can sometimes overwhelm Virtuoso’s internal mechanisms, causing it to introduce artificial delays to manage the workload.

Therefore, for optimal bulk loading performance with Virtuoso’s ld_dir/rdf_loader_run mechanism, using .nq.gz files is strongly recommended.

  1. It first registers files found in the specified directory using the ld_dir (or ld_dir_all for recursive loading) ISQL function, adding them to the DB.DBA.load_list queue
  2. Then, it executes the rdf_loader_run() ISQL function once to process this queue sequentially
  3. Progress and errors can be monitored by querying DB.DBA.load_list
  • When using Docker (--docker-container): -d specifies the absolute path inside the container (e.g., /rdf_mount_in_container). Ensure this path corresponds to a mounted volume and is included in the container’s DirsAllowed configuration. Using virtuoso-launch with --mount-volume handles this automatically.
  • When not using Docker: -d specifies the path on the host system accessible by the Virtuoso process. Ensure this host path is listed in the server’s virtuoso.ini DirsAllowed.
Terminal window
# With pipx (global installation)
virtuoso-bulk-load \
-d /path/accessible/by/virtuoso/server \
-k <your_virtuoso_password>
# With uv (development)
uv run python virtuoso_utilities/bulk_load.py \
-d /path/accessible/by/virtuoso/server \
-k <your_virtuoso_password>
Terminal window
virtuoso-bulk-load \
-d /path/accessible/by/virtuoso/server \
-k <your_virtuoso_password> \
--host <virtuoso_host> \
--port <virtuoso_port> \
--user <virtuoso_user> \
--recursive
Terminal window
# Example: Launch Virtuoso first
virtuoso-launch \
--name my-virtuoso-loader \
--isql-port 1112 \
--data-dir ./virtuoso-loader-data \
--mount-volume /home/user/my_rdf_data:/rdf_mount_in_container
# Then run the bulk loader
virtuoso-bulk-load \
-d /rdf_mount_in_container \
-k <your_virtuoso_password> \
--port 1112 \
--docker-container my-virtuoso-loader \
--recursive

Use virtuoso-bulk-load --help to see all available options:

ArgumentDescription
-d, --data-directoryPath where the script will search for .nq.gz files. See prerequisites for path requirements
-k, --passwordVirtuoso dba user password
ArgumentDescriptionDefault
-H, --hostVirtuoso server hostlocalhost
-P, --portVirtuoso server ISQL port. Use the host port if mapped via Docker1111
-u, --userVirtuoso usernamedba
--recursiveSearch for .nq.gz files recursively (uses ld_dir_all instead of ld_dir)false
--log-levelLogging level: DEBUG, INFO, WARNING, ERROR, CRITICALERROR
ArgumentDescription
--docker-containerName or ID of the running Virtuoso Docker container. If provided, isql will be run via docker exec

The bulk_load function can be imported and called directly from Python code:

from virtuoso_utilities.bulk_load import bulk_load
bulk_load(
data_directory="/path/to/nquads",
password="dba",
host="localhost",
port=1111,
recursive=True,
docker_container="my-virtuoso",
)

The function parameters correspond to the CLI arguments described above.