Extract subset
Extracts a subset of RDF data from a SPARQL endpoint by querying instances of a specified class (or from a file of entity URIs) and recursively following URI references. Outputs the result in N-Quads or N-Triples format.
uv run python -m oc_meta.run.migration.extract_subset [options]Parameters
Section titled “Parameters”| Parameter | Required | Default | Description |
|---|---|---|---|
--endpoint | No | http://localhost:8890/sparql | SPARQL endpoint URL |
--class | No | http://purl.org/spar/fabio/Expression | Class URI to extract instances of (mutually exclusive with --entities-file) |
--entities-file | No | - | File with entity URIs to extract, one per line (mutually exclusive with --class) |
--limit | No | 1000 | Maximum number of initial entities |
--output | No | output.nq | Output file name |
--compress | No | False | Compress output with gzip |
--retries | No | 5 | Maximum retries for failed queries |
--no-graphs | No | False | Disable named graph queries and output N-Triples instead of N-Quads |
Process
Section titled “Process”- Discovers entities by querying instances of a class, or loads them from a file
- For each entity, fetches all triples (or quads) where it appears as subject
- Recursively processes any URI found as object
- Serializes the collected data as N-Quads (default) or N-Triples (
--no-graphs)
Example
Section titled “Example”Extract 500 bibliographic resources with their related entities:
uv run python -m oc_meta.run.migration.extract_subset \ --endpoint http://localhost:8890/sparql \ --class http://purl.org/spar/fabio/Expression \ --limit 500 \ --output subset.nq.gz \ --compressUse cases
Section titled “Use cases”- Create test datasets from production data
- Extract samples for debugging
- Migrate specific portions of a triplestore