Bulk loader
This script offers a sequential method to load N-Quads Gzipped files (*.nq.gz) into a Virtuoso instance using the standard Virtuoso bulk loading procedure (ld_dir/ld_dir_all followed by rdf_loader_run).
Performance note: why only .nq.gz?
Section titled “Performance note: why only .nq.gz?”This script only processes files ending in .nq.gz. This restriction is intentional and provides significant performance advantages:
- Avoiding artificial throttling: By reading compressed data, the load on Virtuoso’s internal processing is more balanced. Loading many uncompressed files can sometimes overwhelm Virtuoso’s internal mechanisms, causing it to introduce artificial delays to manage the workload.
Therefore, for optimal bulk loading performance with Virtuoso’s ld_dir/rdf_loader_run mechanism, using .nq.gz files is strongly recommended.
How it works
Section titled “How it works”- It first registers files found in the specified directory using the
ld_dir(orld_dir_allfor recursive loading) ISQL function, adding them to theDB.DBA.load_listqueue - Then, it executes the
rdf_loader_run()ISQL function once to process this queue sequentially - Progress and errors can be monitored by querying
DB.DBA.load_list
Important prerequisites
Section titled “Important prerequisites”- When using Docker (
--docker-container):-dspecifies the absolute path inside the container (e.g.,/rdf_mount_in_container). Ensure this path corresponds to a mounted volume and is included in the container’sDirsAllowedconfiguration. Usingvirtuoso-launchwith--mount-volumehandles this automatically. - When not using Docker:
-dspecifies the path on the host system accessible by the Virtuoso process. Ensure this host path is listed in the server’svirtuoso.iniDirsAllowed.
Basic usage (host Virtuoso)
Section titled “Basic usage (host Virtuoso)”# With pipx (global installation)virtuoso-bulk-load \ -d /path/accessible/by/virtuoso/server \ -k <your_virtuoso_password>
# With uv (development)uv run python virtuoso_utilities/bulk_load.py \ -d /path/accessible/by/virtuoso/server \ -k <your_virtuoso_password>Customized usage (host Virtuoso)
Section titled “Customized usage (host Virtuoso)”virtuoso-bulk-load \ -d /path/accessible/by/virtuoso/server \ -k <your_virtuoso_password> \ --host <virtuoso_host> \ --port <virtuoso_port> \ --user <virtuoso_user> \ --recursiveUsage with Docker
Section titled “Usage with Docker”# Example: Launch Virtuoso firstvirtuoso-launch \ --name my-virtuoso-loader \ --isql-port 1112 \ --data-dir ./virtuoso-loader-data \ --mount-volume /home/user/my_rdf_data:/rdf_mount_in_container
# Then run the bulk loadervirtuoso-bulk-load \ -d /rdf_mount_in_container \ -k <your_virtuoso_password> \ --port 1112 \ --docker-container my-virtuoso-loader \ --recursiveArguments
Section titled “Arguments”Use virtuoso-bulk-load --help to see all available options:
Required arguments
Section titled “Required arguments”| Argument | Description |
|---|---|
-d, --data-directory | Path where the script will search for .nq.gz files. See prerequisites for path requirements |
-k, --password | Virtuoso dba user password |
Optional arguments
Section titled “Optional arguments”| Argument | Description | Default |
|---|---|---|
-H, --host | Virtuoso server host | localhost |
-P, --port | Virtuoso server ISQL port. Use the host port if mapped via Docker | 1111 |
-u, --user | Virtuoso username | dba |
--recursive | Search for .nq.gz files recursively (uses ld_dir_all instead of ld_dir) | false |
--log-level | Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL | ERROR |
Docker options
Section titled “Docker options”| Argument | Description |
|---|---|
--docker-container | Name or ID of the running Virtuoso Docker container. If provided, isql will be run via docker exec |
Programmatic usage
Section titled “Programmatic usage”The bulk_load function can be imported and called directly from Python code:
from virtuoso_utilities.bulk_load import bulk_load
bulk_load( data_directory="/path/to/nquads", password="dba", host="localhost", port=1111, recursive=True, docker_container="my-virtuoso",)The function parameters correspond to the CLI arguments described above.