Meta entities#
Counts bibliographic resources, agent roles (authors, publishers, editors), and venues in the dataset.
Usage#
uv run python -m oc_meta.run.count.meta_entities <SPARQL_ENDPOINT> [OPTIONS]
Options#
Option |
Description |
|---|---|
|
Path to CSV dump directory (required for venue counting) |
|
Count bibliographic resources (fabio:Expression) |
|
Count agent roles (pro:author, pro:publisher, pro:editor) |
|
Count distinct venues (requires |
If no options are specified, all counts are computed.
Examples#
All counts:
uv run python -m oc_meta.run.count.meta_entities http://localhost:8890/sparql --csv /path/to/csv/dump
Only bibliographic resources and roles:
uv run python -m oc_meta.run.count.meta_entities http://localhost:8890/sparql --br --ar
Only venues:
uv run python -m oc_meta.run.count.meta_entities http://localhost:8890/sparql --venues --csv /path/to/csv/dump
How it works#
Count |
Method |
|---|---|
Bibliographic resources |
SPARQL query counting |
Agent roles |
SPARQL query counting |
Venues |
CSV dump parsing with disambiguation |
Venues are counted from CSV files because the SPARQL query for venue disambiguation can exhaust memory on large datasets.
Venue disambiguation#
Venues are disambiguated based on identifiers:
If a venue has only an OMID (no external identifiers like ISSN/ISBN), venues with the same name are counted as one
If a venue has external identifiers, it’s counted by its OMID