Skip to content

Introduction to HERITRACE

HERITRACE (Heritage Enhanced Repository Interface for Tracing, Research, Archival Curation, and Engagement) is a semantic data editor designed for galleries, libraries, archives, and museums (GLAM) that bridges the gap between sophisticated semantic technologies and the practical needs of cultural heritage professionals.

The widespread adoption of Semantic Web technologies in the GLAM sector has created a paradoxical situation. While these technologies have made human intervention more critical due to the semantic interpretation of data that cannot be automated, they have simultaneously limited the number of curators to those who are experts in the Semantic Web. This creates challenges in workforce scalability and accessibility.

Many GLAM institutions face a choice between embracing Semantic Web technologies (requiring staff with advanced technical expertise) or avoiding them entirely to prevent curatorial complexities. HERITRACE resolves this dilemma by enabling non-technical domain experts to manage semantic data intuitively without losing its semantic integrity.

HERITRACE has been designed with five primary objectives:

  1. User-Friendly Interface: Providing an intuitive interface for domain experts to interact with semantic data without technical knowledge
  2. Comprehensive Provenance Management: Implementing detailed documentation of metadata modifications, including who made changes, when, and from what sources
  3. Robust Change-Tracking: Delivering efficient version control capabilities for reconstruction of previous data states
  4. Flexible Customization: Offering standardized configuration through SHACL and YAML rather than proprietary solutions
  5. Seamless Integration: Facilitating compatibility with pre-existing RDF data collections without modification
  • Librarians managing bibliographic records
  • Archivists cataloguing collections
  • Museum curators organizing metadata
  • Academic researchers working with bibliographic data

Domain experts can enrich and edit metadata through an intuitive interface without knowing anything about the Semantic Web, while the system maintains complete semantic integrity behind the scenes.

Technical Users (Developers & System Administrators)

Section titled “Technical Users (Developers & System Administrators)”
  • System administrators deploying and maintaining HERITRACE
  • Developers customizing schemas through SHACL definitions
  • IT professionals configuring YAML display rules and database connections

Technical staff can customize HERITRACE through standardized languages (SHACL for data validation, YAML for interface configuration) rather than learning proprietary templating systems.

HERITRACE employs the OpenCitations Data Model (OCDM) extending the PROV Ontology to implement comprehensive provenance management. Every modification is captured as a snapshot with complete metadata including:

  • Timestamp of creation/invalidation
  • Responsible agent (individual, organization, or automated process)
  • Primary data source
  • Detailed list of modifications made

Change tracking uses a delta-based approach storing only differences between successive snapshots as SPARQL update queries, enabling efficient storage and precise restoration of previous versions.

Resource editing interface showing metadata fields and editing options

The Time Machine feature provides a timeline interface for managing entity evolution, allowing users to view previous versions and restore earlier states with automatic adjustment of linked resources. The Time Vault serves as a specialized catalog for deleted entities, enabling recovery when needed.

Time Machine interface showing version history and timeline

  • Real-time Validation: SHACL-based constraints ensure data consistency and integrity
  • Smart Disambiguation: Automatic detection of similar entities during creation to prevent duplication
  • Dynamic Field Configuration: Interface adapts based on entity type and defined schemas
  • Semantic Relationship Management: Handle complex relationships between entities with proper validation

HERITRACE functions effectively out of the box with existing RDF datasets. Simply connect to your triplestore - no data transformation or special import procedures required. The system automatically discovers and displays entities based on their RDF types.

HERITRACE is built using modern web technologies optimized for semantic data management:

  • Backend: Python Flask framework with RDFlib for RDF processing and Time-agnostic Library for managing version reconstruction
  • Database: Database-agnostic architecture supporting any RDF triplestore. Tested with Virtuoso and Blazegraph, but also compatible with GraphDB and Apache Jena. Virtuoso is recommended as it is open source and actively maintained, while Blazegraph is no longer maintained
  • Frontend: Jinja2 templating engine with React components for interactive elements requiring reactivity
  • Standards Compliance: Built on RDF, SPARQL, SHACL, and PROV-O standards
  • Authentication: ORCID OAuth integration with access control
  • Deployment: Docker and Docker Compose support for easy installation
  • Customization: SHACL for data model definition, YAML for display rule configuration

HERITRACE is currently deployed in the ParaText project at the University of Bologna, managing bibliographic metadata for textual resources. The system is also planned for adoption by OpenCitations, demonstrating its scalability for large-scale, dynamic datasets in real production environments.