Application Settings
HERITRACE is configured through Docker environment variables.
Configuration Method
Section titled “Configuration Method”Configure settings in your docker-compose.yml
file:
environment: # Application settings - FLASK_ENV=production - APP_TITLE=HERITRACE - APP_SUBTITLE=A semantic editor for heritage data
# Security (REQUIRED: Change this!) - SECRET_KEY=your-secure-secret-key-here
# Database endpoints - DATASET_DB_URL=http://host.docker.internal:8999/sparql - PROVENANCE_DB_URL=http://host.docker.internal:8998/sparql - DATASET_DB_TRIPLESTORE=virtuoso - PROVENANCE_DB_TRIPLESTORE=virtuoso
# ORCID authentication - ORCID_CLIENT_ID=your-client-id - ORCID_CLIENT_SECRET=your-client-secret - ORCID_WHITELIST=0000-0000-0000-0000,0000-0000-0000-0001
Core Application Settings
Section titled “Core Application Settings”Basic Configuration
Section titled “Basic Configuration”Environment Variable | Type | Description | Required | Default |
---|---|---|---|---|
APP_TITLE | String | Main application title shown in UI | No | HERITRACE |
APP_SUBTITLE | String | Subtitle displayed in interface | No | Heritage Enhanced Repository Interface |
SECRET_KEY | String | Flask secret key for security | Yes | generate-a-secure-random-key |
CACHE_VALIDITY_DAYS | Integer | Days until the performance cache expires | No | 7 |
Database Configuration
Section titled “Database Configuration”Compatible with SPARQL 1.1 triplestores. Tested with Virtuoso and Blazegraph.
When DATASET_DB_TEXT_INDEX_ENABLED
is enabled, uses database text indexing to optimize SPARQL queries on literals. Available for Virtuoso and Blazegraph only.
Triplestore Settings
Section titled “Triplestore Settings”environment: # Triplestore Types - DATASET_DB_TRIPLESTORE=virtuoso # 'virtuoso' or 'blazegraph' - PROVENANCE_DB_TRIPLESTORE=virtuoso # 'virtuoso' or 'blazegraph'
# Database URLs - DATASET_DB_URL=http://localhost:8999/sparql - PROVENANCE_DB_URL=http://localhost:8998/sparql
# Storage Types - DATASET_IS_QUADSTORE=true - PROVENANCE_IS_QUADSTORE=true
# Features - DATASET_DB_TEXT_INDEX_ENABLED=true
Database Options
Section titled “Database Options”environment: - DATASET_DB_TRIPLESTORE=virtuoso - PROVENANCE_DB_TRIPLESTORE=virtuoso - DATASET_DB_TEXT_INDEX_ENABLED=true
Pros:
- Mature and battle-tested database
- Open source and SPARQL 1.1 compliant
- Excellent performance even with large datasets
- Full-text indexing and named graph support
Cons:
- Complex configuration
- Poor documentation
environment: - DATASET_DB_TRIPLESTORE=blazegraph - PROVENANCE_DB_TRIPLESTORE=blazegraph - DATASET_DB_TEXT_INDEX_ENABLED=true
Pros:
- Simple to configure
- Open source and SPARQL 1.1 compliant
- Full-text indexing and named graph support
Cons:
- No longer actively maintained
- Performance degrades and may stop accepting triples with large datasets
- Poor documentation
Storage Configuration
Section titled “Storage Configuration”Environment Variable | Options | Description |
---|---|---|
DATASET_IS_QUADSTORE | true /false | Enable named graph support |
PROVENANCE_IS_QUADSTORE | true /false | Enable provenance named graphs |
DATASET_DB_TEXT_INDEX_ENABLED | true /false | Enable internal query optimization using the database’s full-text search index |
Schema and Display Configuration
Section titled “Schema and Display Configuration”These settings control the data model’s constraints and its presentation in the user interface.
Environment Variable | Description | Default |
---|---|---|
SHACL_PATH | Path to the SHACL schema file that defines data validation rules | ./shacl.ttl |
DISPLAY_RULES_PATH | Path to the YAML display rules file that controls how entities are rendered | ./display_rules.yaml |
Data Provenance and Versioning
Section titled “Data Provenance and Versioning”These settings configure how the application handles data provenance and versioning, aligning with the W3C PROV Ontology.
environment: # Provenance and Versioning - DATASET_GENERATION_TIME=2024-09-16T00:00:00+02:00 - PRIMARY_SOURCE=https://doi.org/your-doi
Environment Variable | Description | Default |
---|---|---|
PRIMARY_SOURCE | Defines the default primary source (DOI/URL). This value is used for the prov:hadPrimarySource property. It’s used for the initial dataset import and proposed as the default when creating or modifying entities | Required |
DATASET_GENERATION_TIME | Specifies the creation timestamp for a pre-existing dataset. This value is used for the prov:generatedAtTime property in the initial historical snapshot | Current time |
URI Generation and Counter Handling
Section titled “URI Generation and Counter Handling”HERITRACE uses a pluggable architecture for URI generation and counter management, allowing you to customize how unique identifiers are created for entities.
environment: # Component classes - specify custom implementations via these variables - COUNTER_HANDLER_CLASS=default_components.meta_counter_handler.MetaCounterHandler - URI_GENERATOR_CLASS=default_components.meta_uri_generator.MetaURIGenerator
Custom Components
Section titled “Custom Components”You can create custom URI generators and counter handlers by:
- Creating custom components: Write your own Python classes in a
custom_components/
directory - Mounting the volume: Add
- ./custom_components:/app/custom_components
to your docker-compose volumes - Updating environment variables: Set the class paths to your custom implementations
Example:
environment: - COUNTER_HANDLER_CLASS=custom_components.my_counter.FileCounterHandler - URI_GENERATOR_CLASS=custom_components.my_uri.UUIDURIGeneratorvolumes: - ./custom_components:/app/custom_components
Default Components Configuration
Section titled “Default Components Configuration”The default components are configured through environment variables within their respective Python files:
- MetaCounterHandler: Configure Redis connection settings directly in
default_components/meta_counter_handler.py
- MetaURIGenerator: Configure base IRI, supplier prefix, and regex patterns directly in
default_components/meta_uri_generator.py
URI_GENERATOR
Section titled “URI_GENERATOR”The URI_GENERATOR
component is responsible for creating unique URIs for new entities. A custom URI generator should implement the following methods:
generate_uri(entity_type: str | None = None) -> str
: Generates a new URI for the given entity typeinitialize_counters(sparql) -> None
: Initializes any required state from existing data in the database
COUNTER_HANDLER
Section titled “COUNTER_HANDLER”The COUNTER_HANDLER
is a component designed to work with counter-based URI generators. It manages the persistent state of the numerical counters. While optional, it is crucial for ensuring URI uniqueness across application restarts when using a counter-based strategy.
A custom counter handler should implement the following methods:
read_counter(entity_name: str) -> int
: Returns the current counter value for a given entity type. Should return0
if the counter doesn’t exist.set_counter(new_value: int, entity_name: str)
: Sets the counter for an entity type to a specific value.increment_counter(entity_name: str) -> int
: Atomically increments the counter for an entity type by one and returns the new value.close()
: Closes any open connections (e.g., to a database).
The default implementation, MetaCounterHandler
, uses a Redis database to store these counters. This provides a fast and reliable way to persist the counter state.
ORCID Integration
Section titled “ORCID Integration”Basic Setup
Section titled “Basic Setup”environment: # ORCID OAuth Configuration - ORCID_CLIENT_ID=your-client-id - ORCID_CLIENT_SECRET=your-client-secret
# ORCID Endpoints - ORCID_AUTHORIZE_URL=https://orcid.org/oauth/authorize - ORCID_TOKEN_URL=https://orcid.org/oauth/token - ORCID_API_URL=https://pub.orcid.org/v2.1 - ORCID_SCOPE=/authenticate
# Access Control - ORCID_WHITELIST=your-allowed-orcid-1,your-allowed-orcid-2
Setting Up ORCID
Section titled “Setting Up ORCID”-
Create an ORCID Account: If you don’t already have one, create an ORCID account.
-
Get Credentials:
- Go to ORCID Developer Tools.
- Create a new application and note your Client ID and Client Secret.
-
Configure Redirect URI:
- In your ORCID application settings, add the redirect URI. The path is
/auth/callback
. - For local development:
https://127.0.0.1:5000/auth/callback
- For production:
https://your-domain.com/auth/callback
- In your ORCID application settings, add the redirect URI. The path is
-
Add Credentials to
docker-compose.yml
: Update your environment variables with the credentials from ORCID.environment:- ORCID_CLIENT_ID=APP-XXXXXXXXXX # From ORCID registration- ORCID_CLIENT_SECRET=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -
Whitelist Your ORCID ID: To enable access, you must add authorized ORCID IDs to the
ORCID_WHITELIST
. The application automatically extracts the ID from full ORCID URLs, so you can use either format.environment:# Allow specific researchers by ID or full URL (comma-separated)- ORCID_WHITELIST=0000-0002-1825-0097,https://orcid.org/0000-0001-5109-3700
Entity Handling Strategies
Section titled “Entity Handling Strategies”HERITRACE provides configurable strategies to manage how the application handles entities that are left without connections after a deletion. These strategies apply to two distinct types of entities: orphans and proxies.
environment: # Entity Handling Strategies - ORPHAN_HANDLING_STRATEGY=ASK # Options: ASK, DELETE, KEEP - PROXY_HANDLING_STRATEGY=ASK # Options: ASK, DELETE, KEEP
Strategy Options
Section titled “Strategy Options”An orphan is an entity that is no longer referenced by any other entity in the database. For example, if you remove the last author from a book, the author’s record becomes an orphan if no other books or entities refer to it.
This strategy controls what happens to these orphaned entities:
ASK
: (Default) Prompts the user for confirmation before deleting any orphaned entities.DELETE
: Automatically deletes any entities that become orphans as a result of a deletion.KEEP
: Keeps the orphaned entities in the database, even if they are no longer connected to anything.
A proxy (or intermediate) entity is a resource that links two other entities, often to add specific attributes to their relationship. For example, a pro:RoleInTime
entity can connect a foaf:Person
to a fabio:Book
, defining their role as an author
.
When the primary relationship (e.g., the book’s author entry) is deleted, this strategy determines what happens to the intermediate pro:RoleInTime
entity:
ASK
: (Default) Prompts the user for confirmation before deleting.DELETE
: Automatically deletes the intermediate entity.KEEP
: Retains the intermediate entity in the database.
Defining a Proxy Relationship
Section titled “Defining a Proxy Relationship”A property is treated as a proxy relationship in resources/display_rules.yaml
by using the intermediateRelation
key. This key tells HERITRACE to create an intermediate entity to connect the subject to the target object.
You must specify two things under intermediateRelation
:
class
: The RDF class of the intermediate entity (e.g.,pro:RoleInTime
).targetEntityType
: The RDF class of the final entity you want to create and link to (e.g.,foaf:Agent
).
# In resources/display_rules.yaml- property: "http://purl.org/spar/pro/isDocumentContextFor" displayName: "Author" intermediateRelation: class: "http://purl.org/spar/pro/RoleInTime" targetEntityType: "http://xmlns.com/foaf/0.1/Agent" displayRules: - shape: "http://schema.org/AuthorShape" displayName: "Author" # ...