Merge history
This script reconstructs the history of merged entities by analyzing provenance data. It finds all entities that were merged and traces the chain of merges.
uv run python -m oc_meta.run.find.merged_entities -c <META_CONFIG> -o <OUTPUT_CSV> --entity-type <TYPE> [OPTIONS]| Option | Default | Description |
|---|---|---|
-c, --config | - | Path to Meta config file |
-o, --output | - | Output CSV file |
--entity-type | - | Entity type: br, ra, id, ar, re |
--workers | 4 | Parallel workers |
Examples
Section titled “Examples”Find all merged bibliographic resources:
uv run python -m oc_meta.run.find.merged_entities \ -c meta_config.yaml \ -o merged_brs.csv \ --entity-type br \ --workers 8Find merged responsible agents:
uv run python -m oc_meta.run.find.merged_entities \ -c meta_config.yaml \ -o merged_ras.csv \ --entity-type raOutput format
Section titled “Output format”surviving_entity,merged_entitieshttps://w3id.org/oc/meta/br/060/1,https://w3id.org/oc/meta/br/060/2; https://w3id.org/oc/meta/br/060/3https://w3id.org/oc/meta/br/060/100,https://w3id.org/oc/meta/br/060/101How it works
Section titled “How it works”The script scans provenance files (se.zip) and looks for snapshots with prov:wasDerivedFrom pointing to 2+ sources—this indicates a merge operation. For each merge snapshot:
- Extracts the surviving entity from
prov:specializationOf - Extracts merged entities from
prov:wasDerivedFromsources (excluding the surviving entity itself)
It then reconstructs chains: if A was merged into B, and B was later merged into C, the script reports C as the final surviving entity for both A and B.