Skip to content

Meta entities

Counts bibliographic resources, agent roles (authors, publishers, editors), and venues in the dataset.

Terminal window
uv run python -m oc_meta.run.count.meta_entities <SPARQL_ENDPOINT> [OPTIONS]
OptionDescription
--csvPath to CSV dump directory (required for venue counting)
--brCount bibliographic resources (fabio:Expression)
--arCount agent roles (pro:author, pro:publisher, pro:editor)
--venuesCount distinct venues (requires --csv)

If no options are specified, all counts are computed.

All counts:

Terminal window
uv run python -m oc_meta.run.count.meta_entities http://localhost:8890/sparql --csv /path/to/csv/dump

Only bibliographic resources and roles:

Terminal window
uv run python -m oc_meta.run.count.meta_entities http://localhost:8890/sparql --br --ar

Only venues:

Terminal window
uv run python -m oc_meta.run.count.meta_entities http://localhost:8890/sparql --venues --csv /path/to/csv/dump
CountMethod
Bibliographic resourcesSPARQL query counting fabio:Expression entities
Agent rolesSPARQL query counting pro:RoleInTime by role type
VenuesCSV dump parsing with disambiguation

Venues are counted from CSV files because the SPARQL query for venue disambiguation can exhaust memory on large datasets.

Venues are disambiguated based on identifiers:

  • If a venue has only an OMID (no external identifiers like ISSN/ISBN), venues with the same name are counted as one
  • If a venue has external identifiers, it’s counted by its OMID