Merge overview
The merge tools find duplicate entities and consolidate them, combining their data and updating all references.
Workflow
Section titled “Workflow”- Find duplicates - Scan RDF files to find entities sharing identifiers
- Group entities - Prepare for parallel processing
- Execute merge - Consolidate entities with provenance tracking
- Track history - Reconstruct what was merged (optional)
Find duplicates:
uv run python -m oc_meta.run.find.duplicated_entities /data/rdf duplicates.csv brGroup for parallel processing:
uv run python -m oc_meta.run.merge.group_entities duplicates.csv groups/ meta_config.yamlMerge:
uv run python -m oc_meta.run.merge.entities groups/ meta_config.yaml https://w3id.org/oc/meta/prov/pa/1Optional - see what was merged:
uv run python -m oc_meta.run.find.merged_entities -c meta_config.yaml -o merged.csv --entity-type brAvailable tools
Section titled “Available tools”| Tool | Purpose |
|---|---|
| Find duplicates | Scan RDF files for duplicate identifiers and entities |
| Group entities | Prepare duplicates for parallel merging |
| Merge entities | Execute merge operations |
| Verify merge | Check merge results and generate fix queries |
| Compact CSV | Extract completed merges into a single file |
| Merge history | Reconstruct merge history from provenance |
What happens during merge
Section titled “What happens during merge”When entity B is merged into entity A:
- Identifiers from B are added to A
- Metadata from B fills gaps in A (titles, dates, etc.)
- Relationships pointing to B are redirected to A
- Author/editor chains from A are kept (B’s chains are discarded)
- Provenance records the merge operation
- Entity B is marked as merged and invalidated
The surviving entity (A) becomes the canonical representation. The merged entity (B) is preserved in provenance for historical queries but is no longer active.