Merge history#
This script reconstructs the history of merged entities by analyzing provenance data. It finds all entities that were merged and traces the chain of merges.
Usage#
uv run python -m oc_meta.run.find.merged_entities -c <META_CONFIG> -o <OUTPUT_CSV> --entity-type <TYPE> [OPTIONS]
Option |
Default |
Description |
|---|---|---|
|
- |
Path to Meta config file |
|
- |
Output CSV file |
|
- |
Entity type: |
|
4 |
Parallel workers |
Examples#
Find all merged bibliographic resources:
uv run python -m oc_meta.run.find.merged_entities \
-c meta_config.yaml \
-o merged_brs.csv \
--entity-type br \
--workers 8
Find merged responsible agents:
uv run python -m oc_meta.run.find.merged_entities \
-c meta_config.yaml \
-o merged_ras.csv \
--entity-type ra
Output format#
surviving_entity,merged_entities
https://w3id.org/oc/meta/br/060/1,https://w3id.org/oc/meta/br/060/2; https://w3id.org/oc/meta/br/060/3
https://w3id.org/oc/meta/br/060/100,https://w3id.org/oc/meta/br/060/101
How it works#
The script scans provenance files (se.zip) and looks for snapshots with prov:wasDerivedFrom pointing to 2+ sources—this indicates a merge operation. For each merge snapshot:
Extracts the surviving entity from
prov:specializationOfExtracts merged entities from
prov:wasDerivedFromsources (excluding the surviving entity itself)
It then reconstructs chains: if A was merged into B, and B was later merged into C, the script reports C as the final surviving entity for both A and B.