Storing data
The Storer class writes the contents of a GraphSet, ProvSet or MetadataSet to files or uploads them to a SPARQL endpoint.
Basic usage
Section titled “Basic usage”from oc_ocdm.graph import GraphSetfrom oc_ocdm import Storer
base_iri = "https://w3id.org/oc/meta/"resp_agent = "https://w3id.org/oc/meta/prov/pa/1"
g_set = GraphSet(base_iri)br = g_set.add_br(resp_agent)br.has_title("OpenCitations Meta")
storer = Storer(g_set)storer.store_graphs_in_file("output.jsonld")Constructor parameters
Section titled “Constructor parameters”Storer takes several optional parameters that control how output is organized:
storer = Storer( g_set, output_format="json-ld", dir_split=0, n_file_item=1, default_dir="_", zip_output=False, context_map=None, modified_entities=None)output_format: the serialization format. Accepted values: json-ld (default), nt, nt11, ntriples, application/n-triples, nquads, application/n-quads.
dir_split: when using store_all(), this controls how entity files are distributed across subdirectories. For example, a value of 10000 means entities 1 through 10000 go in one directory, 10001 through 20000 in the next, and so on. Defaults to 0 (everything in a single directory).
n_file_item: the number of entities per output file when using store_all(). Defaults to 1.
default_dir: when store_all() organizes files, it groups them into subdirectories named after the entity’s supplier prefix (the 060 in https://w3id.org/oc/meta/br/0601). Entities whose IRI has no supplier prefix (e.g. https://w3id.org/oc/meta/br/1) use default_dir as the subdirectory name instead. Defaults to "_". See Supplier prefixes for how prefixes work.
zip_output: if True, output files are compressed as ZIP archives.
context_map: maps JSON-LD @context URLs to local file paths. During serialization, rdflib uses the local copy to produce compact JSON-LD without fetching the URL over the network. The output file still references the original URL. See Context maps for details.
modified_entities: an optional set of entity IRIs. When provided, only entities in this set are stored; others are skipped.
Storing to files
Section titled “Storing to files”Two methods are available.
store_graphs_in_file() writes all entities in the set to a single file:
storer.store_graphs_in_file("output.jsonld")Pass a context_path to embed a JSON-LD context:
storer.store_graphs_in_file("output.jsonld", context_path="https://example.com/context.json")store_all() distributes entities across a directory hierarchy following the OCDM file organization convention. It returns the list of file paths that were written:
written_files = storer.store_all( base_dir="/data/rdf", base_iri=base_iri)The optional process_id parameter appends a suffix to file paths, useful for parallel processing to avoid file conflicts.
Uploading to a SPARQL endpoint
Section titled “Uploading to a SPARQL endpoint”upload_all() computes the SPARQL UPDATE queries for all entities in the set (based on the diff between their current and preexisting state) and sends them to the endpoint:
storer.upload_all("https://opencitations.net/meta/sparql")Queries are sent in batches. The default batch size is 10; adjust it with the batch_size parameter:
storer.upload_all("https://opencitations.net/meta/sparql", batch_size=50)To save the generated SPARQL queries to disk instead of executing them, pass save_queries=True and a base_dir:
storer.upload_all( "https://opencitations.net/meta/sparql", base_dir="/data/queries", save_queries=True)upload() uploads a single entity:
storer.upload(br, "https://opencitations.net/meta/sparql")execute_query() runs an arbitrary SPARQL UPDATE query:
storer.execute_query( "DELETE DATA { <https://w3id.org/oc/meta/br/0605> <http://purl.org/dc/terms/title> 'Old title' }", "https://opencitations.net/meta/sparql")Storing provenance and metadata
Section titled “Storing provenance and metadata”Provenance and metadata sets work the same way. Create a Storer for each set:
from oc_ocdm.prov import ProvSetfrom oc_ocdm.metadata import MetadataSet
prov_set = ProvSet(g_set, base_iri)prov_set.generate_provenance()
prov_storer = Storer(prov_set)prov_storer.store_graphs_in_file("provenance.jsonld")
meta_set = MetadataSet(base_iri)dataset = meta_set.add_dataset("OpenCitations Meta", resp_agent)dataset.has_modification_date("2024-03-01T00:00:00")
meta_storer = Storer(meta_set)meta_storer.store_graphs_in_file("metadata.jsonld")