Skip to content

Application settings

HERITRACE is configured through Docker environment variables.

Configure settings in your docker-compose.yml file:

environment:
# Application settings
- FLASK_ENV=production
- APP_TITLE=HERITRACE
- APP_SUBTITLE=A semantic editor for heritage data
# Security (REQUIRED: Change this!)
- SECRET_KEY=your-secure-secret-key-here
# Query Configuration
- COUNT_LIMIT=10000
# Database endpoints
- DATASET_DB_URL=http://host.docker.internal:8999/sparql
- PROVENANCE_DB_URL=http://host.docker.internal:8998/sparql
- DATASET_DB_TRIPLESTORE=virtuoso
- PROVENANCE_DB_TRIPLESTORE=virtuoso
# ORCID authentication
- ORCID_CLIENT_ID=your-client-id
- ORCID_CLIENT_SECRET=your-client-secret
- ORCID_SAFELIST=0000-0000-0000-0000,0000-0000-0000-0001
Environment VariableTypeDescriptionRequiredDefault
APP_TITLEStringMain application title shown in UINoHERITRACE
APP_SUBTITLEStringSubtitle displayed in interfaceNoHeritage Enhanced Repository Interface
SECRET_KEYStringFlask secret key for securityYesgenerate-a-secure-random-key
CACHE_VALIDITY_DAYSIntegerDays until the performance cache expires (internal Redis only)No7

HERITRACE uses Redis for counter management and caching. You can use either an internal Redis instance (default) or connect to an external Redis server.

By default, HERITRACE runs a Redis instance inside the application container. No additional configuration is needed.

# No REDIS_URL specified - uses internal Redis
environment:
- CACHE_VALIDITY_DAYS=7

To use an external Redis instance, set the REDIS_URL environment variable:

environment:
# Redis Configuration
- REDIS_URL=redis://heritrace-redis:6379/0
# Counter initialization is skipped when using external Redis
# Make sure counters are already populated in the external instance
Environment VariableTypeDescriptionRequiredDefault
REDIS_URLStringConnection URL for external Redis serverNoredis://localhost:6379/0 (internal)

Compatible with SPARQL 1.1 triplestores. Tested with Virtuoso and Blazegraph.

When DATASET_DB_TEXT_INDEX_ENABLED is enabled, uses database text indexing to optimize SPARQL queries on literals. Available for Virtuoso and Blazegraph only.

environment:
# Triplestore Types
- DATASET_DB_TRIPLESTORE=virtuoso # 'virtuoso' or 'blazegraph'
- PROVENANCE_DB_TRIPLESTORE=virtuoso # 'virtuoso' or 'blazegraph'
# Database URLs
- DATASET_DB_URL=http://localhost:8999/sparql
- PROVENANCE_DB_URL=http://localhost:8998/sparql
# Storage Types
- DATASET_IS_QUADSTORE=true
- PROVENANCE_IS_QUADSTORE=true
# Features
- DATASET_DB_TEXT_INDEX_ENABLED=true
environment:
- DATASET_DB_TRIPLESTORE=virtuoso
- PROVENANCE_DB_TRIPLESTORE=virtuoso
- DATASET_DB_TEXT_INDEX_ENABLED=true

Pros:

  • Mature and battle-tested database
  • Open source and SPARQL 1.1 compliant
  • Excellent performance even with large datasets
  • Full-text indexing and named graph support

Cons:

  • Complex configuration
  • Poor documentation
Visit the Virtuoso GitHub Repository
Environment VariableOptionsDescription
DATASET_IS_QUADSTOREtrue/falseEnable named graph support
PROVENANCE_IS_QUADSTOREtrue/falseEnable provenance named graphs
DATASET_DB_TEXT_INDEX_ENABLEDtrue/falseEnable internal query optimization using the database’s full-text search index

These settings control the data model’s constraints and its presentation in the user interface.

Environment VariableDescriptionDefault
SHACL_PATHPath to the SHACL schema file that defines data validation rules./shacl.ttl
DISPLAY_RULES_PATHPath to the YAML display rules file that controls how entities are rendered./display_rules.yaml

These settings configure how the application handles data provenance and versioning, aligning with the W3C PROV Ontology.

environment:
# Provenance and Versioning
- DATASET_GENERATION_TIME=2024-09-16T00:00:00+02:00
- PRIMARY_SOURCE=https://doi.org/your-doi
Environment VariableDescriptionDefault
PRIMARY_SOURCEDefines the default primary source (DOI/URL). This value is used for the prov:hadPrimarySource property. It’s used for the initial dataset import and proposed as the default when creating or modifying entitiesRequired
DATASET_GENERATION_TIMESpecifies the creation timestamp for a pre-existing dataset. This value is used for the prov:generatedAtTime property in the initial historical snapshotCurrent time

HERITRACE uses a pluggable architecture for URI generation and counter management, allowing you to customize how unique identifiers are created for entities.

environment:
# Component classes - specify custom implementations via these variables
- COUNTER_HANDLER_CLASS=default_components.meta_counter_handler.MetaCounterHandler
- URI_GENERATOR_CLASS=default_components.meta_uri_generator.MetaURIGenerator

You can create custom URI generators and counter handlers by:

  1. Creating custom components: Write your own Python classes in a custom_components/ directory
  2. Mounting the volume: Add - ./custom_components:/app/custom_components to your docker-compose volumes
  3. Updating environment variables: Set the class paths to your custom implementations

Example:

environment:
- COUNTER_HANDLER_CLASS=custom_components.my_counter.FileCounterHandler
- URI_GENERATOR_CLASS=custom_components.my_uri.UUIDURIGenerator
volumes:
- ./custom_components:/app/custom_components

The default components are configured through environment variables within their respective Python files:

  • MetaCounterHandler: Configure Redis connection settings directly in default_components/meta_counter_handler.py
  • MetaURIGenerator: Configure base IRI, supplier prefix, and regex patterns directly in default_components/meta_uri_generator.py

The URI_GENERATOR component is responsible for creating unique URIs for new entities. A custom URI generator should implement the following methods:

  • generate_uri(entity_type: str | None = None) -> str: Generates a new URI for the given entity type
  • initialize_counters(sparql) -> None: Initializes any required state from existing data in the database

The COUNTER_HANDLER is a component designed to work with counter-based URI generators. It manages the persistent state of the numerical counters. While optional, it is crucial for ensuring URI uniqueness across application restarts when using a counter-based strategy.

A custom counter handler should implement the following methods:

  • read_counter(entity_name: str) -> int: Returns the current counter value for a given entity type. Should return 0 if the counter doesn’t exist.
  • set_counter(new_value: int, entity_name: str): Sets the counter for an entity type to a specific value.
  • increment_counter(entity_name: str) -> int: Atomically increments the counter for an entity type by one and returns the new value.
  • close(): Closes any open connections (e.g., to a database).

The default implementation, MetaCounterHandler, uses a Redis database to store these counters. This provides a fast and reliable way to persist the counter state.

environment:
# ORCID OAuth Configuration
- ORCID_CLIENT_ID=your-client-id
- ORCID_CLIENT_SECRET=your-client-secret
# ORCID Endpoints
- ORCID_AUTHORIZE_URL=https://orcid.org/oauth/authorize
- ORCID_TOKEN_URL=https://orcid.org/oauth/token
- ORCID_API_URL=https://pub.orcid.org/v2.1
- ORCID_SCOPE=/authenticate
# Access Control
- ORCID_SAFELIST=your-allowed-orcid-1,your-allowed-orcid-2
  1. Create an ORCID Account: If you don’t already have one, create an ORCID account.

  2. Get Credentials:

  3. Configure Redirect URI:

    • In your ORCID application settings, add the redirect URI. The path is /auth/callback.
    • For local development: https://127.0.0.1:5000/auth/callback
    • For production: https://your-domain.com/auth/callback
  4. Add Credentials to docker-compose.yml: Update your environment variables with the credentials from ORCID.

    environment:
    - ORCID_CLIENT_ID=APP-XXXXXXXXXX # From ORCID registration
    - ORCID_CLIENT_SECRET=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  5. Safelist Your ORCID ID: To enable access, you must add authorized ORCID IDs to the ORCID_SAFELIST. The application automatically extracts the ID from full ORCID URLs, so you can use either format.

    environment:
    # Allow specific researchers by ID or full URL (comma-separated)
    - ORCID_SAFELIST=0000-0002-1825-0097,https://orcid.org/0000-0001-5109-3700

HERITRACE provides configurable strategies to manage how the application handles entities that are left without connections after a deletion. These strategies apply to two distinct types of entities: orphans and proxies.

environment:
# Entity Handling Strategies
- ORPHAN_HANDLING_STRATEGY=ASK # Options: ASK, DELETE, KEEP
- PROXY_HANDLING_STRATEGY=ASK # Options: ASK, DELETE, KEEP

An orphan is an entity that is no longer referenced by any other entity in the database. For example, if you remove the last author from a book, the author’s record becomes an orphan if no other books or entities refer to it.

This strategy controls what happens to these orphaned entities:

  • ASK: (Default) Prompts the user for confirmation before deleting any orphaned entities.
  • DELETE: Automatically deletes any entities that become orphans as a result of a deletion.
  • KEEP: Keeps the orphaned entities in the database, even if they are no longer connected to anything.

These settings control the behaviour and display of the catalogue and time vault interfaces.

environment:
# Query Configuration
- COUNT_LIMIT=10000
# Catalogue pagination settings
- CATALOGUE_DEFAULT_PER_PAGE=50
- CATALOGUE_ALLOWED_PER_PAGE=50,100,200,500
Environment VariableTypeDescriptionDefault
COUNT_LIMITIntegerMaximum count for entity queries. When exceeded, displays “LIMIT+” in catalog10000
CATALOGUE_DEFAULT_PER_PAGEIntegerDefault number of items displayed per page in the catalogue50
CATALOGUE_ALLOWED_PER_PAGEStringComma-separated list of pagination options available to users50,100,200,500