Skip to content

Application Settings

HERITRACE uses a Python configuration file (config.py) to manage all application settings. This page covers all available configuration options.

  1. Copy the template configuration:

    Terminal window
    cp config.example.py config.py
  2. Edit config.py with your specific settings

  3. Restart the application to apply changes

class Config(object):
# Application Identity
APP_TITLE = 'Your App Title'
APP_SUBTITLE = 'Your App Subtitle'
# Security
SECRET_KEY = 'your-secret-key-here' # REQUIRED: Change this!
# Performance Caching
CACHE_FILE = 'cache.json'
CACHE_VALIDITY_DAYS = 7
SettingTypeDescriptionRequired
APP_TITLEStringMain application title shown in UINo
APP_SUBTITLEStringSubtitle displayed in interfaceNo
SECRET_KEYStringFlask secret key for securityYes
CACHE_FILEStringPath to the performance cache file.No
CACHE_VALIDITY_DAYSIntegerDays until the performance cache expires.No

The application uses a specific caching mechanism to improve startup performance. Here’s how it works:

  • Purpose: The cache’s primary role is to prevent the application from running an expensive SPARQL query on every startup. This query initializes the entity counters (used for URI generation) by reading the entire provenance database and populating a Redis instance.
  • CACHE_FILE: This file (e.g., cache.json) does not store application data. Instead, it holds a single timestamp that marks the last time the counter initialization was successfully performed.
  • CACHE_VALIDITY_DAYS: This setting determines how long the timestamp in CACHE_FILE is considered valid. After this period, the application will run the initialization query again to refresh the counters in Redis and update the timestamp.

If cache.json is missing or the timestamp is older than CACHE_VALIDITY_DAYS, the application will perform the full initialization. This makes the first startup or periodic restarts potentially slower, but subsequent restarts will be much faster.

HERITRACE is designed to be compatible with any SPARQL 1.1 compliant triplestore, including popular options like GraphDB, Apache Jena, and Fuseki. However, the application has been specifically tested and optimized for Virtuoso and Blazegraph.

The full-text search feature is used for internal query optimization. When DATASET_DB_TEXT_INDEX_ENABLED is True, HERITRACE leverages the database’s native text indexing to accelerate SPARQL queries that filter on literal values. This is a performance enhancement and does not expose a direct search functionality to the end-user. This optimization is currently implemented only for Virtuoso and Blazegraph.

# Triplestore Types
DATASET_DB_TRIPLESTORE = 'virtuoso' # 'virtuoso' or 'blazegraph'
PROVENANCE_DB_TRIPLESTORE = 'virtuoso' # 'virtuoso' or 'blazegraph'
# Database URLs
DATASET_DB_URL = 'http://localhost:8999/sparql'
PROVENANCE_DB_URL = 'http://localhost:8998/sparql'
# Storage Types
DATASET_IS_QUADSTORE = True
PROVENANCE_IS_QUADSTORE = True
# Features
DATASET_DB_TEXT_INDEX_ENABLED = True
DATASET_DB_TRIPLESTORE = 'virtuoso'
PROVENANCE_DB_TRIPLESTORE = 'virtuoso'
DATASET_DB_TEXT_INDEX_ENABLED = True

Virtuoso is the recommended database for its performance, stability, and active maintenance.

Features:

  • Actively Supported: Benefits from continuous development and community support.
  • Internal Query Optimization: Uses its powerful full-text index to accelerate internal queries on text literals when enabled.
  • Proven Scalability: Reliable for production environments with large datasets.
  • Named Graph Support: Fully compatible with quad-store setups.
Visit the Virtuoso GitHub Repository
SettingOptionsDescription
DATASET_IS_QUADSTORETrue/FalseEnable named graph support
PROVENANCE_IS_QUADSTORETrue/FalseEnable provenance named graphs
DATASET_DB_TEXT_INDEX_ENABLEDTrue/FalseEnable internal query optimization using the database’s full-text search index.

These settings control the data model’s constraints and its presentation in the user interface.

# Schema and Display
SHACL_PATH = shacl_path
DISPLAY_RULES_PATH = display_rules_path
SettingDescriptionDefault
SHACL_PATHPath to the SHACL schema file that defines data validation rules.resources/shacl.ttl
DISPLAY_RULES_PATHPath to the YAML display rules file that controls how entities are rendered.resources/display_rules.yaml

These settings configure how the application handles data provenance and versioning, aligning with the W3C PROV Ontology.

# Provenance and Versioning
DATASET_GENERATION_TIME = '2024-09-16T00:00:00+02:00'
PRIMARY_SOURCE = 'https://doi.org/your-doi'
CHANGE_TRACKING_CONFIG = os.path.join(BASE_HERITRACE_DIR, 'change_tracking.json')
SettingDescriptionDefault
PRIMARY_SOURCEDefines the default primary source (DOI/URL). This value is used for the prov:hadPrimarySource property. It’s used for the initial dataset import and proposed as the default when creating or modifying entities.Required
DATASET_GENERATION_TIMESpecifies the creation timestamp for a pre-existing dataset. This value is used for the prov:generatedAtTime property in the initial historical snapshot.Current time
CHANGE_TRACKING_CONFIGPath to the change tracking configuration. This file is generated automatically by the underlying time-agnostic-library based on the provenance database endpoint. The default location is generally sufficient.change_tracking.json

The URI_GENERATOR and COUNTER_HANDLER settings work together to define how new, unique identifiers are created for entities within the system.

# URI Generation
URI_GENERATOR = meta_uri_generator
COUNTER_HANDLER = counter_handler

This setting specifies the class responsible for generating new URIs. Whenever a new entity is created, HERITRACE calls the generate_uri method of the object assigned to this setting.

A valid URI generator class must inherit from the abstract base class heritrace.uri_generator.URIGenerator and implement the generate_uri(entity_type) method. The abstract class also defines an initialize_counters(sparql) method.

  • generate_uri(entity_type): This is the core method responsible for returning a new, unique URI as a string based on the entity’s type.
  • initialize_counters(sparql): This method is specifically for URI strategies that use sequential counters. It’s called at application startup to query the database and determine the last-used identifier, preventing duplicates. If your URI_GENERATOR does not rely on counters (e.g., if it uses UUIDs), this method can have an empty implementation (pass).

This structure allows you to implement custom URI generation schemes tailored to your specific project needs. HERITRACE’s default implementation is MetaURIGenerator, designed to create URIs compliant with the OpenCitations Meta standard.

The COUNTER_HANDLER is a component designed to work with counter-based URI generators. It manages the persistent state of the numerical counters. While optional, it is crucial for ensuring URI uniqueness across application restarts when using a counter-based strategy.

A custom counter handler should implement the following methods:

  • read_counter(entity_name: str) -> int: Returns the current counter value for a given entity type. Should return 0 if the counter doesn’t exist.
  • set_counter(new_value: int, entity_name: str): Sets the counter for an entity type to a specific value.
  • increment_counter(entity_name: str) -> int: Atomically increments the counter for an entity type by one and returns the new value.
  • close(): Closes any open connections (e.g., to a database).

The default implementation, MetaCounterHandler, uses a Redis database to store these counters. This provides a fast and reliable way to persist the counter state.

# ORCID OAuth Configuration
ORCID_CLIENT_ID = 'your-client-id'
ORCID_CLIENT_SECRET = 'your-client-secret'
# ORCID Endpoints
ORCID_AUTHORIZE_URL = 'https://orcid.org/oauth/authorize'
ORCID_TOKEN_URL = 'https://orcid.org/oauth/token'
ORCID_API_URL = 'https://pub.orcid.org/v2.1'
ORCID_SCOPE = '/authenticate'
# Access Control
ORCID_WHITELIST = [
'your-allowed-orcid-1',
'https://orcid.org/your-allowed-orcid-2' # Full URLs are also supported
]
  1. Create an ORCID Account: If you don’t already have one, create an ORCID account.

  2. Get Credentials:

  3. Configure Redirect URI:

    • In your ORCID application settings, add the redirect URI. The path is /auth/callback.
    • For local development: https://127.0.0.1:5000/auth/callback
    • For production: https://your-domain.com/auth/callback
  4. Add Credentials to config.py: Update your config.py with the credentials from ORCID.

    ORCID_CLIENT_ID = 'APP-XXXXXXXXXX' # From ORCID registration
    ORCID_CLIENT_SECRET = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
  5. Whitelist Your ORCID ID: To enable access, you must add authorized ORCID iDs to the ORCID_WHITELIST. The application automatically extracts the ID from full ORCID URLs, so you can use either format.

    # Allow specific researchers by ID or full URL
    ORCID_WHITELIST = [
    '0000-0002-1825-0097', # Josiah Carberry (ID only)
    'https://orcid.org/0000-0001-5109-3700' # Another researcher (Full URL)
    ]
# Language Support
LANGUAGES = ['en', 'it']
BABEL_TRANSLATION_DIRECTORIES = os.path.join(
BASE_HERITRACE_DIR,
'babel',
'translations'
)

HERITRACE provides configurable strategies to manage how the application handles entities that are left without connections after a deletion. These strategies apply to two distinct types of entities: orphans and proxies.

# Entity Handling Strategies
ORPHAN_HANDLING_STRATEGY = OrphanHandlingStrategy.ASK
PROXY_HANDLING_STRATEGY = ProxyHandlingStrategy.DELETE

An orphan is an entity that is no longer referenced by any other entity in the database. For example, if you remove the last author from a book, the author’s record becomes an orphan if no other books or entities refer to it.

This strategy controls what happens to these orphaned entities:

  • ASK: (Default) Prompts the user for confirmation before deleting any orphaned entities.
  • DELETE: Automatically deletes any entities that become orphans as a result of a deletion.
  • KEEP: Keeps the orphaned entities in the database, even if they are no longer connected to anything.