Application Settings
HERITRACE uses a Python configuration file (config.py
) to manage all application settings. This page covers all available configuration options.
Configuration Setup
Section titled “Configuration Setup”Initial Setup
Section titled “Initial Setup”-
Copy the template configuration:
Terminal window cp config.example.py config.py -
Edit
config.py
with your specific settings -
Restart the application to apply changes
Core Application Settings
Section titled “Core Application Settings”Basic Configuration
Section titled “Basic Configuration”class Config(object): # Application Identity APP_TITLE = 'Your App Title' APP_SUBTITLE = 'Your App Subtitle'
# Security SECRET_KEY = 'your-secret-key-here' # REQUIRED: Change this!
# Performance Caching CACHE_FILE = 'cache.json' CACHE_VALIDITY_DAYS = 7
Setting | Type | Description | Required |
---|---|---|---|
APP_TITLE | String | Main application title shown in UI | No |
APP_SUBTITLE | String | Subtitle displayed in interface | No |
SECRET_KEY | String | Flask secret key for security | Yes |
CACHE_FILE | String | Path to the performance cache file. | No |
CACHE_VALIDITY_DAYS | Integer | Days until the performance cache expires. | No |
Performance Caching Explained
Section titled “Performance Caching Explained”The application uses a specific caching mechanism to improve startup performance. Here’s how it works:
- Purpose: The cache’s primary role is to prevent the application from running an expensive SPARQL query on every startup. This query initializes the entity counters (used for URI generation) by reading the entire provenance database and populating a Redis instance.
CACHE_FILE
: This file (e.g.,cache.json
) does not store application data. Instead, it holds a single timestamp that marks the last time the counter initialization was successfully performed.CACHE_VALIDITY_DAYS
: This setting determines how long the timestamp inCACHE_FILE
is considered valid. After this period, the application will run the initialization query again to refresh the counters in Redis and update the timestamp.
If cache.json
is missing or the timestamp is older than CACHE_VALIDITY_DAYS
, the application will perform the full initialization. This makes the first startup or periodic restarts potentially slower, but subsequent restarts will be much faster.
Database Configuration
Section titled “Database Configuration”HERITRACE is designed to be compatible with any SPARQL 1.1 compliant triplestore, including popular options like GraphDB, Apache Jena, and Fuseki. However, the application has been specifically tested and optimized for Virtuoso and Blazegraph.
The full-text search feature is used for internal query optimization. When DATASET_DB_TEXT_INDEX_ENABLED
is True
, HERITRACE leverages the database’s native text indexing to accelerate SPARQL queries that filter on literal values. This is a performance enhancement and does not expose a direct search functionality to the end-user. This optimization is currently implemented only for Virtuoso and Blazegraph.
Triplestore Settings
Section titled “Triplestore Settings”# Triplestore TypesDATASET_DB_TRIPLESTORE = 'virtuoso' # 'virtuoso' or 'blazegraph'PROVENANCE_DB_TRIPLESTORE = 'virtuoso' # 'virtuoso' or 'blazegraph'
# Database URLsDATASET_DB_URL = 'http://localhost:8999/sparql'PROVENANCE_DB_URL = 'http://localhost:8998/sparql'
# Storage TypesDATASET_IS_QUADSTORE = TruePROVENANCE_IS_QUADSTORE = True
# FeaturesDATASET_DB_TEXT_INDEX_ENABLED = True
Database Options
Section titled “Database Options”DATASET_DB_TRIPLESTORE = 'virtuoso'PROVENANCE_DB_TRIPLESTORE = 'virtuoso'DATASET_DB_TEXT_INDEX_ENABLED = True
Virtuoso is the recommended database for its performance, stability, and active maintenance.
Features:
- Actively Supported: Benefits from continuous development and community support.
- Internal Query Optimization: Uses its powerful full-text index to accelerate internal queries on text literals when enabled.
- Proven Scalability: Reliable for production environments with large datasets.
- Named Graph Support: Fully compatible with quad-store setups.
DATASET_DB_TRIPLESTORE = 'blazegraph'PROVENANCE_DB_TRIPLESTORE = 'blazegraph'DATASET_DB_TEXT_INDEX_ENABLED = True
Blazegraph is an open-source alternative that also supports named graphs and text indexing for query optimization.
Features:
- Open Source: Fully open-source and community-driven.
- Internal Query Optimization: Supports full-text indexing to speed up internal queries.
- Named Graph Support: Compatible with quad-store setups.
Storage Configuration
Section titled “Storage Configuration”Setting | Options | Description |
---|---|---|
DATASET_IS_QUADSTORE | True /False | Enable named graph support |
PROVENANCE_IS_QUADSTORE | True /False | Enable provenance named graphs |
DATASET_DB_TEXT_INDEX_ENABLED | True /False | Enable internal query optimization using the database’s full-text search index. |
Schema and Display Configuration
Section titled “Schema and Display Configuration”These settings control the data model’s constraints and its presentation in the user interface.
# Schema and DisplaySHACL_PATH = shacl_pathDISPLAY_RULES_PATH = display_rules_path
Setting | Description | Default |
---|---|---|
SHACL_PATH | Path to the SHACL schema file that defines data validation rules. | resources/shacl.ttl |
DISPLAY_RULES_PATH | Path to the YAML display rules file that controls how entities are rendered. | resources/display_rules.yaml |
Data Provenance and Versioning
Section titled “Data Provenance and Versioning”These settings configure how the application handles data provenance and versioning, aligning with the W3C PROV Ontology.
# Provenance and VersioningDATASET_GENERATION_TIME = '2024-09-16T00:00:00+02:00'PRIMARY_SOURCE = 'https://doi.org/your-doi'CHANGE_TRACKING_CONFIG = os.path.join(BASE_HERITRACE_DIR, 'change_tracking.json')
Setting | Description | Default |
---|---|---|
PRIMARY_SOURCE | Defines the default primary source (DOI/URL). This value is used for the prov:hadPrimarySource property. It’s used for the initial dataset import and proposed as the default when creating or modifying entities. | Required |
DATASET_GENERATION_TIME | Specifies the creation timestamp for a pre-existing dataset. This value is used for the prov:generatedAtTime property in the initial historical snapshot. | Current time |
CHANGE_TRACKING_CONFIG | Path to the change tracking configuration. This file is generated automatically by the underlying time-agnostic-library based on the provenance database endpoint. The default location is generally sufficient. | change_tracking.json |
URI Generation
Section titled “URI Generation”The URI_GENERATOR
and COUNTER_HANDLER
settings work together to define how new, unique identifiers are created for entities within the system.
# URI GenerationURI_GENERATOR = meta_uri_generatorCOUNTER_HANDLER = counter_handler
URI_GENERATOR
Section titled “URI_GENERATOR”This setting specifies the class responsible for generating new URIs. Whenever a new entity is created, HERITRACE calls the generate_uri
method of the object assigned to this setting.
A valid URI generator class must inherit from the abstract base class heritrace.uri_generator.URIGenerator
and implement the generate_uri(entity_type)
method. The abstract class also defines an initialize_counters(sparql)
method.
generate_uri(entity_type)
: This is the core method responsible for returning a new, unique URI as a string based on the entity’s type.initialize_counters(sparql)
: This method is specifically for URI strategies that use sequential counters. It’s called at application startup to query the database and determine the last-used identifier, preventing duplicates. If yourURI_GENERATOR
does not rely on counters (e.g., if it uses UUIDs), this method can have an empty implementation (pass
).
This structure allows you to implement custom URI generation schemes tailored to your specific project needs. HERITRACE’s default implementation is MetaURIGenerator
, designed to create URIs compliant with the OpenCitations Meta standard.
COUNTER_HANDLER
Section titled “COUNTER_HANDLER”The COUNTER_HANDLER
is a component designed to work with counter-based URI generators. It manages the persistent state of the numerical counters. While optional, it is crucial for ensuring URI uniqueness across application restarts when using a counter-based strategy.
A custom counter handler should implement the following methods:
read_counter(entity_name: str) -> int
: Returns the current counter value for a given entity type. Should return0
if the counter doesn’t exist.set_counter(new_value: int, entity_name: str)
: Sets the counter for an entity type to a specific value.increment_counter(entity_name: str) -> int
: Atomically increments the counter for an entity type by one and returns the new value.close()
: Closes any open connections (e.g., to a database).
The default implementation, MetaCounterHandler
, uses a Redis database to store these counters. This provides a fast and reliable way to persist the counter state.
ORCID Integration
Section titled “ORCID Integration”Basic Setup
Section titled “Basic Setup”# ORCID OAuth ConfigurationORCID_CLIENT_ID = 'your-client-id'ORCID_CLIENT_SECRET = 'your-client-secret'
# ORCID EndpointsORCID_AUTHORIZE_URL = 'https://orcid.org/oauth/authorize'ORCID_TOKEN_URL = 'https://orcid.org/oauth/token'ORCID_API_URL = 'https://pub.orcid.org/v2.1'ORCID_SCOPE = '/authenticate'
# Access ControlORCID_WHITELIST = [ 'your-allowed-orcid-1', 'https://orcid.org/your-allowed-orcid-2' # Full URLs are also supported]
Setting Up ORCID
Section titled “Setting Up ORCID”-
Create an ORCID Account: If you don’t already have one, create an ORCID account.
-
Get Credentials:
- Go to ORCID Developer Tools.
- Create a new application and note your Client ID and Client Secret.
-
Configure Redirect URI:
- In your ORCID application settings, add the redirect URI. The path is
/auth/callback
. - For local development:
https://127.0.0.1:5000/auth/callback
- For production:
https://your-domain.com/auth/callback
- In your ORCID application settings, add the redirect URI. The path is
-
Add Credentials to
config.py
: Update yourconfig.py
with the credentials from ORCID.ORCID_CLIENT_ID = 'APP-XXXXXXXXXX' # From ORCID registrationORCID_CLIENT_SECRET = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' -
Whitelist Your ORCID ID: To enable access, you must add authorized ORCID iDs to the
ORCID_WHITELIST
. The application automatically extracts the ID from full ORCID URLs, so you can use either format.# Allow specific researchers by ID or full URLORCID_WHITELIST = ['0000-0002-1825-0097', # Josiah Carberry (ID only)'https://orcid.org/0000-0001-5109-3700' # Another researcher (Full URL)]
Internationalization
Section titled “Internationalization”# Language SupportLANGUAGES = ['en', 'it']BABEL_TRANSLATION_DIRECTORIES = os.path.join( BASE_HERITRACE_DIR, 'babel', 'translations')
Entity Handling Strategies
Section titled “Entity Handling Strategies”HERITRACE provides configurable strategies to manage how the application handles entities that are left without connections after a deletion. These strategies apply to two distinct types of entities: orphans and proxies.
# Entity Handling StrategiesORPHAN_HANDLING_STRATEGY = OrphanHandlingStrategy.ASKPROXY_HANDLING_STRATEGY = ProxyHandlingStrategy.DELETE
Strategy Options
Section titled “Strategy Options”An orphan is an entity that is no longer referenced by any other entity in the database. For example, if you remove the last author from a book, the author’s record becomes an orphan if no other books or entities refer to it.
This strategy controls what happens to these orphaned entities:
ASK
: (Default) Prompts the user for confirmation before deleting any orphaned entities.DELETE
: Automatically deletes any entities that become orphans as a result of a deletion.KEEP
: Keeps the orphaned entities in the database, even if they are no longer connected to anything.
A proxy (or intermediate) entity is a resource that links two other entities, often to add specific attributes to their relationship. For example, a pro:RoleInTime
entity can connect a foaf:Person
to a fabio:Book
, defining their role as an author
.
When the primary relationship (e.g., the book’s author entry) is deleted, this strategy determines what happens to the intermediate pro:RoleInTime
entity:
DELETE
: (Default) Automatically deletes the intermediate entity.ASK
: Prompts the user for confirmation before deleting.KEEP
: Retains the intermediate entity in the database.
Defining a Proxy Relationship
Section titled “Defining a Proxy Relationship”A property is treated as a proxy relationship in resources/display_rules.yaml
by using the intermediateRelation
key. This key tells HERITRACE to create an intermediate entity to connect the subject to the target object.
You must specify two things under intermediateRelation
:
class
: The RDF class of the intermediate entity (e.g.,pro:RoleInTime
).targetEntityType
: The RDF class of the final entity you want to create and link to (e.g.,foaf:Agent
).
# In resources/display_rules.yaml- property: "http://purl.org/spar/pro/isDocumentContextFor" displayName: "Author" intermediateRelation: class: "http://purl.org/spar/pro/RoleInTime" targetEntityType: "http://xmlns.com/foaf/0.1/Agent" displayRules: - shape: "http://schema.org/AuthorShape" displayName: "Author" # ...