Application Settings

HERITRACE uses a Python configuration file (config.py) to manage all application settings. This page covers all available configuration options.

Configuration Setup

Initial Setup

Copy the template configuration:
Terminal window
```
cp config.example.py config.py
```
Edit config.py with your specific settings
Restart the application to apply changes

Core Application Settings

Basic Configuration

class Config(object):
    # Application Identity
    APP_TITLE = 'Your App Title'
    APP_SUBTITLE = 'Your App Subtitle'

    # Security
    SECRET_KEY = 'your-secret-key-here'  # REQUIRED: Change this!

    # Performance Caching
    CACHE_FILE = 'cache.json'
    CACHE_VALIDITY_DAYS = 7

Setting	Type	Description	Required
`APP_TITLE`	String	Main application title shown in UI	No
`APP_SUBTITLE`	String	Subtitle displayed in interface	No
`SECRET_KEY`	String	Flask secret key for security	Yes
`CACHE_FILE`	String	Path to the performance cache file.	No
`CACHE_VALIDITY_DAYS`	Integer	Days until the performance cache expires.	No

Performance Caching Explained

The application uses a specific caching mechanism to improve startup performance. Here’s how it works:

Purpose: The cache’s primary role is to prevent the application from running an expensive SPARQL query on every startup. This query initializes the entity counters (used for URI generation) by reading the entire provenance database and populating a Redis instance.
CACHE_FILE: This file (e.g., cache.json) does not store application data. Instead, it holds a single timestamp that marks the last time the counter initialization was successfully performed.
CACHE_VALIDITY_DAYS: This setting determines how long the timestamp in CACHE_FILE is considered valid. After this period, the application will run the initialization query again to refresh the counters in Redis and update the timestamp.

If cache.json is missing or the timestamp is older than CACHE_VALIDITY_DAYS, the application will perform the full initialization. This makes the first startup or periodic restarts potentially slower, but subsequent restarts will be much faster.

Database Configuration

HERITRACE is designed to be compatible with any SPARQL 1.1 compliant triplestore, including popular options like GraphDB, Apache Jena, and Fuseki. However, the application has been specifically tested and optimized for Virtuoso and Blazegraph.

The full-text search feature is used for internal query optimization. When DATASET_DB_TEXT_INDEX_ENABLED is True, HERITRACE leverages the database’s native text indexing to accelerate SPARQL queries that filter on literal values. This is a performance enhancement and does not expose a direct search functionality to the end-user. This optimization is currently implemented only for Virtuoso and Blazegraph.

Triplestore Settings

# Triplestore Types
DATASET_DB_TRIPLESTORE = 'virtuoso'      # 'virtuoso' or 'blazegraph'
PROVENANCE_DB_TRIPLESTORE = 'virtuoso'   # 'virtuoso' or 'blazegraph'

# Database URLs
DATASET_DB_URL = 'http://localhost:8999/sparql'
PROVENANCE_DB_URL = 'http://localhost:8998/sparql'

# Storage Types
DATASET_IS_QUADSTORE = True
PROVENANCE_IS_QUADSTORE = True

# Features
DATASET_DB_TEXT_INDEX_ENABLED = True

DATASET_DB_TRIPLESTORE = 'virtuoso'
PROVENANCE_DB_TRIPLESTORE = 'virtuoso'
DATASET_DB_TEXT_INDEX_ENABLED = True

Virtuoso is the recommended database for its performance, stability, and active maintenance.

Features:

Actively Supported: Benefits from continuous development and community support.
Internal Query Optimization: Uses its powerful full-text index to accelerate internal queries on text literals when enabled.
Proven Scalability: Reliable for production environments with large datasets.
Named Graph Support: Fully compatible with quad-store setups.

Visit the Virtuoso GitHub Repository

DATASET_DB_TRIPLESTORE = 'blazegraph'
PROVENANCE_DB_TRIPLESTORE = 'blazegraph'
DATASET_DB_TEXT_INDEX_ENABLED = True

Blazegraph is an open-source alternative that also supports named graphs and text indexing for query optimization.

Features:

Open Source: Fully open-source and community-driven.
Internal Query Optimization: Supports full-text indexing to speed up internal queries.
Named Graph Support: Compatible with quad-store setups.

Visit the Blazegraph GitHub Repository

Storage Configuration

Setting	Options	Description
`DATASET_IS_QUADSTORE`	`True`/`False`	Enable named graph support
`PROVENANCE_IS_QUADSTORE`	`True`/`False`	Enable provenance named graphs
`DATASET_DB_TEXT_INDEX_ENABLED`	`True`/`False`	Enable internal query optimization using the database’s full-text search index.

Schema and Display Configuration

These settings control the data model’s constraints and its presentation in the user interface.

# Schema and Display
SHACL_PATH = shacl_path
DISPLAY_RULES_PATH = display_rules_path

Setting	Description	Default
`SHACL_PATH`	Path to the SHACL schema file that defines data validation rules.	`resources/shacl.ttl`
`DISPLAY_RULES_PATH`	Path to the YAML display rules file that controls how entities are rendered.	`resources/display_rules.yaml`

Data Provenance and Versioning

These settings configure how the application handles data provenance and versioning, aligning with the W3C PROV Ontology.

# Provenance and Versioning
DATASET_GENERATION_TIME = '2024-09-16T00:00:00+02:00'
PRIMARY_SOURCE = 'https://doi.org/your-doi'
CHANGE_TRACKING_CONFIG = os.path.join(BASE_HERITRACE_DIR, 'change_tracking.json')

Setting	Description	Default
`PRIMARY_SOURCE`	Defines the default primary source (DOI/URL). This value is used for the `prov:hadPrimarySource` property. It’s used for the initial dataset import and proposed as the default when creating or modifying entities.	Required
`DATASET_GENERATION_TIME`	Specifies the creation timestamp for a pre-existing dataset. This value is used for the `prov:generatedAtTime` property in the initial historical snapshot.	Current time
`CHANGE_TRACKING_CONFIG`	Path to the change tracking configuration. This file is generated automatically by the underlying time-agnostic-library based on the provenance database endpoint. The default location is generally sufficient.	`change_tracking.json`

URI Generation

The URI_GENERATOR and COUNTER_HANDLER settings work together to define how new, unique identifiers are created for entities within the system.

# URI Generation
URI_GENERATOR = meta_uri_generator
COUNTER_HANDLER = counter_handler

URI_GENERATOR

This setting specifies the class responsible for generating new URIs. Whenever a new entity is created, HERITRACE calls the generate_uri method of the object assigned to this setting.

A valid URI generator class must inherit from the abstract base class heritrace.uri_generator.URIGenerator and implement the generate_uri(entity_type) method. The abstract class also defines an initialize_counters(sparql) method.

generate_uri(entity_type): This is the core method responsible for returning a new, unique URI as a string based on the entity’s type.
initialize_counters(sparql): This method is specifically for URI strategies that use sequential counters. It’s called at application startup to query the database and determine the last-used identifier, preventing duplicates. If your URI_GENERATOR does not rely on counters (e.g., if it uses UUIDs), this method can have an empty implementation (pass).

This structure allows you to implement custom URI generation schemes tailored to your specific project needs. HERITRACE’s default implementation is MetaURIGenerator, designed to create URIs compliant with the OpenCitations Meta standard.

Here is an example of a generator that uses UUIDs instead of counters. This approach guarantees uniqueness without needing a persistent counter.

import uuid
from heritrace.uri_generator.uri_generator import URIGenerator

class UUIDURIGenerator(URIGenerator):
    """
    A simple URI generator that creates unique URIs using UUIDs.
    This generator does not rely on sequential counters.
    """
    def __init__(self, base_iri: str):
        self.base_iri = base_iri.rstrip('/')

    def generate_uri(self, entity_type: str | None = None) -> str:
        """
        Generates a new URI by appending a new UUID to the base IRI.
        The entity_type is ignored in this implementation.
        """
        return f"{self.base_iri}/{uuid.uuid4()}"

    def initialize_counters(self, sparql) -> None:
        """
        This method is not needed for a UUID-based generator,
        so it has an empty implementation.
        """
        pass

COUNTER_HANDLER

The COUNTER_HANDLER is a component designed to work with counter-based URI generators. It manages the persistent state of the numerical counters. While optional, it is crucial for ensuring URI uniqueness across application restarts when using a counter-based strategy.

A custom counter handler should implement the following methods:

read_counter(entity_name: str) -> int: Returns the current counter value for a given entity type. Should return 0 if the counter doesn’t exist.
set_counter(new_value: int, entity_name: str): Sets the counter for an entity type to a specific value.
increment_counter(entity_name: str) -> int: Atomically increments the counter for an entity type by one and returns the new value.
close(): Closes any open connections (e.g., to a database).

The default implementation, MetaCounterHandler, uses a Redis database to store these counters. This provides a fast and reliable way to persist the counter state.

Here is an example of a minimal, non-persistent counter handler. Note: This is for demonstration only and is not suitable for production, as counter values will be lost on application restart.

import threading

class InMemoryCounterHandler:
    """
    A simple, non-persistent counter handler that stores counters in memory.

    This implementation is for demonstration purposes only. It is NOT
    suitable for production as all counter states will be lost when the
    application restarts.
    """
    def __init__(self):
        self._counters = {}
        self._lock = threading.Lock()

    def set_counter(self, new_value: int, entity_name: str) -> None:
        """Sets the counter for a given entity type to a specific value."""
        with self._lock:
            self._counters[entity_name] = new_value

    def read_counter(self, entity_name: str) -> int:
        """Reads the current value of a counter, returning 0 if not found."""
        with self._lock:
            return self._counters.get(entity_name, 0)

    def increment_counter(self, entity_name: str) -> int:
        """Increments a counter by one and returns the new value."""
        with self._lock:
            current_value = self._counters.get(entity_name, 0)
            new_value = current_value + 1
            self._counters[entity_name] = new_value
            return new_value

    def close(self):
        """No action needed for the in-memory handler."""
        pass

ORCID Integration

Basic Setup

# ORCID OAuth Configuration
ORCID_CLIENT_ID = 'your-client-id'
ORCID_CLIENT_SECRET = 'your-client-secret'

# ORCID Endpoints
ORCID_AUTHORIZE_URL = 'https://orcid.org/oauth/authorize'
ORCID_TOKEN_URL = 'https://orcid.org/oauth/token'
ORCID_API_URL = 'https://pub.orcid.org/v2.1'
ORCID_SCOPE = '/authenticate'

# Access Control
ORCID_WHITELIST = [
    'your-allowed-orcid-1',
    'https://orcid.org/your-allowed-orcid-2' # Full URLs are also supported
]

Setting Up ORCID

Create an ORCID Account: If you don’t already have one, create an ORCID account.
Get Credentials:
- Go to ORCID Developer Tools.
- Create a new application and note your Client ID and Client Secret.
Configure Redirect URI:
- In your ORCID application settings, add the redirect URI. The path is /auth/callback.
- For local development: https://127.0.0.1:5000/auth/callback
- For production: https://your-domain.com/auth/callback

Add Credentials to config.py: Update your config.py with the credentials from ORCID.

ORCID_CLIENT_ID = 'APP-XXXXXXXXXX'  # From ORCID registration
ORCID_CLIENT_SECRET = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'

Whitelist Your ORCID ID: To enable access, you must add authorized ORCID iDs to the ORCID_WHITELIST. The application automatically extracts the ID from full ORCID URLs, so you can use either format.

# Allow specific researchers by ID or full URL
ORCID_WHITELIST = [
    '0000-0002-1825-0097',                # Josiah Carberry (ID only)
    'https://orcid.org/0000-0001-5109-3700' # Another researcher (Full URL)
]

Internationalization

# Language Support
LANGUAGES = ['en', 'it']
BABEL_TRANSLATION_DIRECTORIES = os.path.join(
    BASE_HERITRACE_DIR,
    'babel',
    'translations'
)

Entity Handling Strategies

HERITRACE provides configurable strategies to manage how the application handles entities that are left without connections after a deletion. These strategies apply to two distinct types of entities: orphans and proxies.

# Entity Handling Strategies
ORPHAN_HANDLING_STRATEGY = OrphanHandlingStrategy.ASK
PROXY_HANDLING_STRATEGY = ProxyHandlingStrategy.DELETE

Strategy Options

Orphan Handling
Proxy Handling

An orphan is an entity that is no longer referenced by any other entity in the database. For example, if you remove the last author from a book, the author’s record becomes an orphan if no other books or entities refer to it.

This strategy controls what happens to these orphaned entities:

ASK: (Default) Prompts the user for confirmation before deleting any orphaned entities.
DELETE: Automatically deletes any entities that become orphans as a result of a deletion.
KEEP: Keeps the orphaned entities in the database, even if they are no longer connected to anything.

A proxy (or intermediate) entity is a resource that links two other entities, often to add specific attributes to their relationship. For example, a pro:RoleInTime entity can connect a foaf:Person to a fabio:Book, defining their role as an author.

When the primary relationship (e.g., the book’s author entry) is deleted, this strategy determines what happens to the intermediate pro:RoleInTime entity:

DELETE: (Default) Automatically deletes the intermediate entity.
ASK: Prompts the user for confirmation before deleting.
KEEP: Retains the intermediate entity in the database.

Defining a Proxy Relationship

A property is treated as a proxy relationship in resources/display_rules.yaml by using the intermediateRelation key. This key tells HERITRACE to create an intermediate entity to connect the subject to the target object.

You must specify two things under intermediateRelation:

class: The RDF class of the intermediate entity (e.g., pro:RoleInTime).
targetEntityType: The RDF class of the final entity you want to create and link to (e.g., foaf:Agent).

# In resources/display_rules.yaml
- property: "http://purl.org/spar/pro/isDocumentContextFor"
  displayName: "Author"
  intermediateRelation:
    class: "http://purl.org/spar/pro/RoleInTime"
    targetEntityType: "http://xmlns.com/foaf/0.1/Agent"
  displayRules:
    - shape: "http://schema.org/AuthorShape"
      displayName: "Author"
      # ...