Introduction to HERITRACE
HERITRACE (Heritage Enhanced Repository Interface for Tracing, Research, Archival Curation, and Engagement) is a semantic data editor designed for galleries, libraries, archives, and museums (GLAM) that bridges the gap between sophisticated semantic technologies and the practical needs of cultural heritage professionals.
The Challenge
Section titled “The Challenge”The widespread adoption of Semantic Web technologies in the GLAM sector has created a paradoxical situation. While these technologies have made human intervention more critical due to the semantic interpretation of data that cannot be automated, they have simultaneously limited the number of curators to those who are experts in the Semantic Web. This creates challenges in workforce scalability and accessibility.
Many GLAM institutions face a choice between embracing Semantic Web technologies (requiring staff with advanced technical expertise) or avoiding them entirely to prevent curatorial complexities. HERITRACE resolves this dilemma by enabling non-technical domain experts to manage semantic data intuitively without losing its semantic integrity.
Purpose and Objectives
Section titled “Purpose and Objectives”HERITRACE has been designed with five primary objectives:
- User-Friendly Interface: Providing an intuitive interface for domain experts to interact with semantic data without technical knowledge
- Comprehensive Provenance Management: Implementing detailed documentation of metadata modifications, including who made changes, when, and from what sources
- Robust Change-Tracking: Delivering efficient version control capabilities for reconstruction of previous data states
- Flexible Customization: Offering standardized configuration through SHACL and YAML rather than proprietary solutions
- Seamless Integration: Facilitating compatibility with pre-existing RDF data collections without modification
Target Users
Section titled “Target Users”End Users (GLAM Professionals)
Section titled “End Users (GLAM Professionals)”- Librarians managing bibliographic records
- Archivists cataloguing collections
- Museum curators organizing metadata
- Academic researchers working with bibliographic data
Domain experts can enrich and edit metadata through an intuitive interface without knowing anything about the Semantic Web, while the system maintains complete semantic integrity behind the scenes.
Technical Users (Developers & System Administrators)
Section titled “Technical Users (Developers & System Administrators)”- System administrators deploying and maintaining HERITRACE
- Developers customizing schemas through SHACL definitions
- IT professionals configuring YAML display rules and database connections
Technical staff can customize HERITRACE through standardized languages (SHACL for data validation, YAML for interface configuration) rather than learning proprietary templating systems.
Key Features
Section titled “Key Features”Provenance Management and Change Tracking
Section titled “Provenance Management and Change Tracking”HERITRACE employs the OpenCitations Data Model (OCDM) extending the PROV Ontology to implement comprehensive provenance management. Every modification is captured as a snapshot with complete metadata including:
- Timestamp of creation/invalidation
- Responsible agent (individual, organization, or automated process)
- Primary data source
- Detailed list of modifications made
Change tracking uses a delta-based approach storing only differences between successive snapshots as SPARQL update queries, enabling efficient storage and precise restoration of previous versions.
Time Machine and Time Vault
Section titled “Time Machine and Time Vault”The Time Machine feature provides a timeline interface for managing entity evolution, allowing users to view previous versions and restore earlier states with automatic adjustment of linked resources. The Time Vault serves as a specialized catalog for deleted entities, enabling recovery when needed.
Intelligent Metadata Management
Section titled “Intelligent Metadata Management”- Real-time Validation: SHACL-based constraints ensure data consistency and integrity
- Smart Disambiguation: Automatic detection of similar entities during creation to prevent duplication
- Dynamic Field Configuration: Interface adapts based on entity type and defined schemas
- Semantic Relationship Management: Handle complex relationships between entities with proper validation
Seamless RDF Integration
Section titled “Seamless RDF Integration”HERITRACE functions effectively out of the box with existing RDF datasets. Simply connect to your triplestore - no data transformation or special import procedures required. The system automatically discovers and displays entities based on their RDF types.
Technical Foundation
Section titled “Technical Foundation”HERITRACE is built using modern web technologies optimized for semantic data management:
- Backend: Python Flask framework with RDFlib for RDF processing and Time-agnostic Library for managing version reconstruction
- Database: Database-agnostic architecture supporting any RDF triplestore. Tested with Virtuoso and Blazegraph, but also compatible with GraphDB and Apache Jena. Virtuoso is recommended as it is open source and actively maintained, while Blazegraph is no longer maintained
- Frontend: Jinja2 templating engine with React components for interactive elements requiring reactivity
- Standards Compliance: Built on RDF, SPARQL, SHACL, and PROV-O standards
- Authentication: ORCID OAuth integration with access control
- Deployment: Docker and Docker Compose support for easy installation
- Customization: SHACL for data model definition, YAML for display rule configuration
Real-World Deployment
Section titled “Real-World Deployment”HERITRACE is currently deployed in the ParaText project at the University of Bologna, managing bibliographic metadata for textual resources. The system is also planned for adoption by OpenCitations, demonstrating its scalability for large-scale, dynamic datasets in real production environments.