time_agnostic_library package¶

Submodules¶

time_agnostic_library.agnostic_entity module¶

class time_agnostic_library.agnostic_entity.AgnosticEntity(res: str, config: dict, include_related_objects: bool = False, include_merged_entities: bool = False, include_reverse_relations: bool = False)[source]¶

Bases: object

The entity of which you want to materialize one or all versions, based on the provenance snapshots available for that entity.

Parameters:

res (str) – The URI of the entity
include_related_objects (bool, optional) – True, if you also want to return information on related entities, those that have the URI of the res parameter as an object recursively, False otherwise.
Default: False
include_merged_entities (bool, optional) – True, if you also want to return information on entities that were merged into the current entity, False otherwise.
Default: False
include_reverse_relations (bool, optional) – True, if you also want to return information on entities that have the current entity as an object recursively, False otherwise.
Default: False
config (dict) – The configuration dictionary.

get_history(include_prov_metadata: bool = False) → Tuple[Dict[str, Dict[str, Graph]], Dict[str, Dict[str, Dict[str, str]]]][source]¶

It materializes all versions of an entity. If any of the include_* parameters are True, it also materializes all versions of related entities based on the configured parameters: - include_related_objects: entities that have res as subject (recursively) - include_merged_entities: entities that were merged into res - include_reverse_relations: entities that have res as object (recursively)

If include_prov_metadata is True, the provenance metadata of the returned entity/entities is also returned.

The output is a tuple where the first element is a dictionary mapping timestamps to merged graphs, and the second element is a dictionary containing provenance metadata if requested.

The output has the following format:

(
    {
        RES_URI: {
                TIME_1: ENTITY_GRAPH_AT_TIME_1, 
                TIME_2: ENTITY_GRAPH_AT_TIME_2
        }
    },
    {
        RES_URI: {
            SNAPSHOT_URI_AT_TIME_1': {
                'generatedAtTime': GENERATION_TIME, 
                'wasAttributedTo': ATTRIBUTION, 
                'hadPrimarySource': PRIMARY_SOURCE
            }, 
            SNAPSHOT_URI_AT_TIME_2: {
                'generatedAtTime': GENERATION_TIME, 
                'wasAttributedTo': ATTRIBUTION, 
                'hadPrimarySource': PRIMARY_SOURCE
        }
    } 
)

Returns:: Tuple[dict, Union[dict, None]] – The output is always a two-element tuple. The first is a dictionary containing all the versions of a given resource. The second is a dictionary containing all the provenance metadata linked to that resource if include_prov_metadata is True, None if False.

get_state_at_time(time: Tuple[str | None], include_prov_metadata: bool = False) → Tuple[Graph, dict, dict | None][source]¶

Given a time interval, the function returns the states of the resource and optionally its related entities within the interval, the returned snapshots metadata, and optionally, the hooks to the previous and subsequent snapshots.

Related entities are included based on the configured parameters: - include_related_objects: entities that have res as subject (recursively) - include_merged_entities: entities that were merged into res - include_reverse_relations: entities that have res as object (recursively)

The output has the following format:

(

{

ENTITY_URI_1: {: TIME_1: GRAPH_AT_TIME_1, TIME_2: GRAPH_AT_TIME_2

}, ENTITY_URI_2: {

TIME_1: GRAPH_AT_TIME_1, TIME_2: GRAPH_AT_TIME_2

}, {

ENTITY_URI_1: {
SNAPSHOT_URI_AT_TIME_1: METADATA, SNAPSHOT_URI_AT_TIME_2: METADATA

}, ENTITY_URI_2: {

SNAPSHOT_URI_AT_TIME_1: METADATA, SNAPSHOT_URI_AT_TIME_2: METADATA

}, {

ENTITY_URI_1: {
OTHER_SNAPSHOT_URI_1: METADATA, OTHER_SNAPSHOT_URI_2: METADATA

}, ENTITY_URI_2: {

OTHER_SNAPSHOT_URI_1: METADATA, OTHER_SNAPSHOT_URI_2: METADATA

}

)

Parameters:

time (Tuple[Union[str, None]].) – A time interval, in the form (START, END). If one of the two values is None, only the other is considered. The time can be specified using any existing standard.
include_prov_metadata (bool, optional) – If True, hooks are returned to the previous and subsequent snapshots.
Default: False

Returns:

Tuple[Dict[str, Dict[str, Graph]], Dict[str, Dict[str, Dict[str, str]]], Union[Dict[str, Dict[str, Dict[str, str]]], None]] – The method returns a tuple of three elements: the first is a dictionary mapping entity URIs to their graphs at timestamps within the specified interval; the second contains the snapshots metadata of the states that have been returned; if the include_prov_metadata parameter is True, the third element of the tuple is the metadata on the other snapshots, otherwise an empty dictionary.

time_agnostic_library.agnostic_query module¶

class time_agnostic_library.agnostic_query.AgnosticQuery(query: str, on_time: Tuple[str | None] = (None, None), other_snapshots: bool = False, config_path: str = './config.json', config_dict: dict | None = None)[source]¶

Bases: object

get_full_text_search(uris_in_triple: set) → str[source]¶

class time_agnostic_library.agnostic_query.DeltaQuery(query: str, on_time: Tuple[str | None] = (), changed_properties: Set[str] = {}, config_path: str = './config.json', config_dict=None)[source]¶

Bases: AgnosticQuery

This class allows single time and cross-time delta structured queries.

Parameters:

query (str) – A SPARQL query string. It is useful to identify the entities whose change you want to investigate.
on_time (Tuple[Union[str, None]], optional) – If you want to query specific snapshots, specify the time interval here. The format is (START, END). If one of the two values is None, only the other is considered. Finally, the time can be specified using any existing standard.
Default: ()
changed_properties (Set[str], optional) – A set of properties. It narrows the field to those entities where the properties specified in the set have changed.
Default: {}
config_path (str, optional) – The path to the configuration file.
Default: './config.json'

run_agnostic_query() → Tuple[Dict[str, Dict[str, str]], dict][source]¶

Queries the deltas relevant to the query and the properties set in the specified time interval. If no property was indicated, any changes are considered. If no time interval was selected, the whole dataset’s history is considered. The output has the following format:

{
    RES_URI_1: {
        "created": TIMESTAMP_CREATION,
        "modified": {
            TIMESTAMP_1: UPDATE_QUERY_1,
            TIMESTAMP_2: UPDATE_QUERY_2,
            TIMESTAMP_N: UPDATE_QUERY_N
        },
        "deleted": TIMESTAMP_DELETION
    },
    RES_URI_2: {
        "created": TIMESTAMP_CREATION,
        "modified": {
            TIMESTAMP_1: UPDATE_QUERY_1,
            TIMESTAMP_2: UPDATE_QUERY_2,
            TIMESTAMP_N: UPDATE_QUERY_N
        },
        "deleted": TIMESTAMP_DELETION
    },
    RES_URI_N: {
        "created": TIMESTAMP_CREATION,
        "modified": {
            TIMESTAMP_1: UPDATE_QUERY_1,
            TIMESTAMP_2: UPDATE_QUERY_2,
            TIMESTAMP_N: UPDATE_QUERY_N
        },
        "deleted": TIMESTAMP_DELETION
    },              
}            

:returns Dict[str, Set[Tuple]] – The output is a dictionary that reports the modified entities, when they were created, modified, and deleted. Changes are reported as SPARQL UPDATE queries. If the entity was not created or deleted within the indicated range, the “created” or “deleted” value is None. On the other hand, if the entity does not exist within the input interval, the “modified” value is an empty dictionary.

class time_agnostic_library.agnostic_query.VersionQuery(query: str, on_time: Tuple[str | None] = '', other_snapshots=False, config_path: str = './config.json', config_dict=None)[source]¶

Bases: AgnosticQuery

This class allows time-travel queries, both on a single version and all versions of the dataset.

Parameters:

query (str) – The SPARQL query string.
on_time (Tuple[Union[str, None]], optional) – If you want to query a specific version, specify the time interval here. The format is (START, END). If one of the two values is None, only the other is considered. Finally, the time can be specified using any existing standard.
Default: ''
config_path (str, optional) – The path to the configuration file.
Default: './config.json'

run_agnostic_query() → Tuple[Dict[str, Set[Tuple]], dict][source]¶

Run the query provided as a time-travel query. If the on_time argument was specified, it runs on versions within the specified interval, on all versions otherwise.

:returns Dict[str, Set[Tuple]] – The output is a dictionary in which the keys are the snapshots relevant to that query. The values correspond to sets of tuples containing the query results at the time specified by the key. The positional value of the elements in the tuples is equivalent to the variables indicated in the query.

time_agnostic_library.agnostic_query.get_insert_query(graph_iri: URIRef, data: Graph) → Tuple[str, int][source]¶

time_agnostic_library.prov_entity module¶

class time_agnostic_library.prov_entity.ProvEntity[source]¶

Bases: object

Snapshot of entity metadata: a particular snapshot recording the metadata associated with an individual entity (either a bibliographic entity or an identifier) at a particular date and time, including the agent, such as a person, organisation or automated process that created or modified the entity metadata.

DCTERMS: ClassVar[Namespace] = Namespace('http://purl.org/dc/terms/')¶

OCO: ClassVar[Namespace] = Namespace('https://w3id.org/oc/ontology/')¶

PROV: ClassVar[Namespace] = Namespace('http://www.w3.org/ns/prov#')¶

classmethod get_prov_properties()[source]¶

iri_description: ClassVar[URIRef] = rdflib.term.URIRef('http://purl.org/dc/terms/description')¶

iri_entity: ClassVar[URIRef] = rdflib.term.URIRef('http://www.w3.org/ns/prov#Entity')¶

iri_generated_at_time: ClassVar[URIRef] = rdflib.term.URIRef('http://www.w3.org/ns/prov#generatedAtTime')¶

iri_had_primary_source: ClassVar[URIRef] = rdflib.term.URIRef('http://www.w3.org/ns/prov#hadPrimarySource')¶

iri_has_update_query: ClassVar[URIRef] = rdflib.term.URIRef('https://w3id.org/oc/ontology/hasUpdateQuery')¶

iri_invalidated_at_time: ClassVar[URIRef] = rdflib.term.URIRef('http://www.w3.org/ns/prov#invalidatedAtTime')¶

iri_specialization_of: ClassVar[URIRef] = rdflib.term.URIRef('http://www.w3.org/ns/prov#specializationOf')¶

iri_was_attributed_to: ClassVar[URIRef] = rdflib.term.URIRef('http://www.w3.org/ns/prov#wasAttributedTo')¶

iri_was_derived_from: ClassVar[URIRef] = rdflib.term.URIRef('http://www.w3.org/ns/prov#wasDerivedFrom')¶

time_agnostic_library.sparql module¶

class time_agnostic_library.sparql.Sparql(query: str, config: dict)[source]¶

Bases: object

The Sparql class handles SPARQL queries. It is instantiated by passing as a parameter the path to a configuration file, whose default location is “./config.json”. The configuration file must be in JSON format and contain information on the sources to be queried. There are two types of sources: dataset and provenance sources and they need to be specified separately. Both triplestores and JSON files are supported. In addition, some optional values can be set to make executions faster and more efficient.

blazegraph_full_text_search: Specify an affirmative Boolean value if Blazegraph was used as a triplestore, and a textual index was built to speed up queries. For more information, see https://github.com/blazegraph/database/wiki/Rebuild_Text_Index_Procedure. The allowed values are “true”, “1”, 1, “t”, “y”, “yes”, “ok”, or “false”, “0”, 0, “n”, “f”, “no”.
graphdb_connector_name: Specify the name of the Lucene connector if GraphDB was used as a triplestore and a textual index was built to speed up queries. For more information, see https://graphdb.ontotext.com/documentation/free/general-full-text-search-with-connectors.html.
cache_triplestore_url: Specifies the triplestore URL to use as a cache to make queries faster. If your triplestore uses different endpoints for reading and writing (e.g. GraphDB), specify the endpoint for reading in the “endpoint” field and the endpoint for writing in the “update_endpoint” field. If there is only one endpoint (e.g. Blazegraph), specify it in both fields.

Here is an example of the configuration file content:

{
    "dataset": {
        "triplestore_urls": ["http://127.0.0.1:7200/repositories/data"],
        "file_paths": []
    },
    "provenance": {
        "triplestore_urls": [],
        "file_paths": ["./prov.json"]
    },
    "blazegraph_full_text_search": "no",
    "graphdb_connector_name": "fts",
    "cache_triplestore_url": {
        "endpoint": "http://127.0.0.1:7200/repositories/cache",
        "update_endpoint": "http://127.0.0.1:7200/repositories/cache/statements"
    }
}            

Parameters:: config_path (str, optional) – The path to the configuration file.

run_ask_query() → bool[source]¶

run_construct_query() → Dataset[source]¶

Given a CONSTRUCT query, it returns the results in a Dataset.

Returns:: Dataset – A Dataset containing the results of the query.

run_select_query() → Set[Tuple][source]¶

Given a SELECT query, it returns the results in a set of tuples.

Returns:: Set[Tuple] – A set of tuples, in which the positional value of the elements in the tuples is equivalent to the variables indicated in the query.

time_agnostic_library.support module¶

time_agnostic_library.support.convert_to_datetime(time_string: str, stringify: bool = False) → datetime[source]¶

time_agnostic_library.support.generate_config_file(config_path: str = './config.json', dataset_urls: list = [], dataset_dirs: list = [], dataset_is_quadstore: bool = True, provenance_urls: list = [], provenance_dirs: list = [], provenance_is_quadstore: bool = True, blazegraph_full_text_search: bool = False, fuseki_full_text_search: bool = False, virtuoso_full_text_search: bool = False, graphdb_connector_name: str = '', cache_endpoint: str = '', cache_update_endpoint: str = '') → dict[source]¶

Given the configuration parameters, a file compliant with the syntax of the time-agnostic-library configuration files is generated. :param config_path: The output configuration file path

Default: './config.json'

Parameters:

dataset_urls (list, optional) – A list of triplestore URLs containing data
Default: []
dataset_dirs (list, optional) – A list of directories containing data
Default: []
dataset_is_quadstore (bool, optional) – Indicates if the dataset store is a quadstore
Default: True
provenance_urls (list, optional) – A list of triplestore URLs containing provenance metadata
Default: []
provenance_dirs (list, optional) – A list of directories containing provenance metadata
Default: []
provenance_is_quadstore (bool, optional) – Indicates if the provenance store is a quadstore
Default: True
blazegraph_full_text_search (bool, optional) – True if Blazegraph was used as a triplestore, and a textual index was built to speed up queries
Default: False
fuseki_full_text_search (bool, optional) – True if Fuseki was used as a triplestore, and a textual index was built to speed up queries
Default: False
virtuoso_full_text_search (bool, optional) – True if Virtuoso was used as a triplestore, and a textual index was built to speed up queries
Default: False
graphdb_connector_name (str, optional) – The name of the Lucene connector if GraphDB was used as a triplestore and a textual index was built
Default: ''
cache_endpoint (str, optional) – A triplestore URL to use as a cache to make queries on provenance faster
Default: ''
cache_update_endpoint (str, optional) – If your triplestore uses different endpoints for reading and writing (e.g. GraphDB), specify the endpoint for writing
Default: ''

time_agnostic_library package¶

Submodules¶

time_agnostic_library.agnostic_entity module¶

time_agnostic_library.agnostic_query module¶

time_agnostic_library.prov_entity module¶

time_agnostic_library.sparql module¶

time_agnostic_library.support module¶

Module contents¶