CSV format
This page documents the input CSV format and supported identifier schemas.
CSV format
Section titled “CSV format”Meta expects CSV files with these columns:
| Column | Required | Description |
|---|---|---|
id | Yes | Space-separated identifiers in schema:value format (see supported schemas) |
title | No | Title of the work |
author | No | Semicolon-separated names in Surname, Name [identifier] format (see author/editor format) |
pub_date | No | ISO 8601 date: YYYY-MM-DD, YYYY-MM, or YYYY (see date format) |
venue | No | Container title with optional identifier in brackets (see venue format) |
volume | No | Volume number |
issue | No | Issue number |
page | No | Page range (e.g., 50-75) |
type | No | Resource type (see resource types) |
publisher | No | Publisher name with optional identifier in brackets (see publisher format) |
editor | No | Same format as author |
Example
Section titled “Example”id,title,author,pub_date,venue,volume,issue,page,type,publisher,editordoi:10.1162/qss_a_00292,OpenCitations Meta,"Massari, Arcangelo [orcid:0000-0002-8420-0696]; Mariani, Fabio [orcid:0000-0002-8810-1564]; Heibi, Ivan [orcid:0000-0001-5366-5194]; Peroni, Silvio [orcid:0000-0003-0530-4305]; Shotton, David [orcid:0000-0001-5506-523X]",2024-01-22,Quantitative Science Studies [issn:2641-3337],5,1,50-75,journal article,MIT Press [crossref:281],Identifier format
Section titled “Identifier format”Identifiers use the format schema:value:
doi:10.1162/qss_a_00292pmid:38034492orcid:0000-0002-8420-0696issn:2641-3337Multiple identifiers are separated by spaces:
doi:10.1162/qss_a_00292 pmid:38034492Supported identifier schemas
Section titled “Supported identifier schemas”Bibliographic resources
Section titled “Bibliographic resources”| Schema | Example | Description |
|---|---|---|
doi | doi:10.1162/qss_a_00292 | Digital Object Identifier |
pmid | pmid:38034492 | PubMed ID |
pmcid | pmcid:PMC10927410 | PubMed Central ID |
arxiv | arxiv:2302.03976 | arXiv identifier |
isbn | isbn:978-3-030-00668-6 | International Standard Book Number |
issn | issn:2641-3337 | International Standard Serial Number |
url | url:https://opencitations.net | Web URL |
wikidata | wikidata:Q107507571 | Wikidata entity |
wikipedia | wikipedia:OpenCitations | Wikipedia article |
openalex | openalex:W4390928828 | OpenAlex work ID |
Responsible agents
Section titled “Responsible agents”| Schema | Example | Description |
|---|---|---|
orcid | orcid:0000-0002-8420-0696 | ORCID identifier |
viaf | viaf:309649614 | VIAF identifier |
crossref | crossref:281 | Crossref funder/member ID |
wikidata | wikidata:Q30265034 | Wikidata entity |
ror | ror:01111rn36 | Research Organization Registry |
Author/editor format
Section titled “Author/editor format”Authors and editors use the format:
Surname, Given Name [identifier]Multiple authors are separated by semicolons:
Massari, Arcangelo [orcid:0000-0002-8420-0696]; Mariani, Fabio [orcid:0000-0002-8810-1564]; Heibi, Ivan [orcid:0000-0001-5366-5194]; Peroni, Silvio [orcid:0000-0003-0530-4305]; Shotton, David [orcid:0000-0001-5506-523X]The identifier in brackets is optional.
Name parsing
Section titled “Name parsing”The comma determines how names are interpreted:
With comma = Person
Peroni, Silvio → Family: Peroni, Given: SilvioMassari, A. → Family: Massari, Given: A.Shotton, David M. → Family: Shotton, Given: David M.Without comma = Organization
MIT Press → Organization nameWorld Health Organization → Organization nameIf a name has no comma, Meta treats it as an organization, not a person.
Date format
Section titled “Date format”Dates should use ISO 8601 format:
| Format | Example | Precision |
|---|---|---|
YYYY-MM-DD | 2024-01-15 | Day |
YYYY-MM | 2024-01 | Month |
YYYY | 2024 | Year |
Resource types
Section titled “Resource types”Supported values for the type column:
| Value | Description |
|---|---|
journal article | Article in a journal |
book | Complete book |
book chapter | Chapter in a book |
book part | Other part of a book |
book section | Section of a book |
book series | Series of books |
book set | Set of books |
edited book | Book with editors |
reference book | Reference work |
monograph | Single-author scholarly work |
report | Technical or research report |
report series | Series of reports |
standard | Technical standard |
standard series | Series of standards |
journal | Complete journal |
journal volume | Volume of a journal |
journal issue | Issue of a journal |
proceedings | Conference proceedings |
proceedings article | Article in proceedings |
proceedings series | Series of proceedings |
reference entry | Entry in reference work |
dissertation | Thesis or dissertation |
peer review | Peer review document |
data file | Dataset |
dataset | Dataset |
web content | Web page or content |
Venue format
Section titled “Venue format”Venues can include identifiers:
Quantitative Science Studies [issn:2641-3337]Proceedings of the ACM/IEEE Joint Conference on Digital Libraries [isbn:978-1-4503-9822-4]For book chapters, the venue is the containing book:
The Semantic Web: Research and Applications [isbn:978-3-642-30283-1]Publisher format
Section titled “Publisher format”Publishers can include Crossref member IDs or ROR identifiers:
MIT Press [crossref:281]Springer Nature [crossref:297]University of Bologna [ror:01111rn36]Validation
Section titled “Validation”Meta validates identifiers during curation using oc_ds_converter.oc_idmanager.
- Syntax check: Must match
^doi:10\.(\d{4,9}|[^\s/]+(\.[^\s/]+)*)/[^\s]+$ - Normalization: Removes URL prefixes (
https://doi.org/,http://dx.doi.org/), converts to lowercase
- Syntax check: Must match
^orcid:([0-9]{4}-){3}[0-9]{3}[0-9X]$ - Checksum: Validates using ISO/IEC 7064:2003 MOD 11-2
- Normalization: Removes non-digit characters, uppercases X, formats as
XXXX-XXXX-XXXX-XXXX
- Syntax check: Must match
^issn:[0-9]{4}-[0-9]{3}[0-9X]$ - Checksum: Validates using modulo 11
- Special case:
0000-0000is explicitly rejected - Normalization: Removes non-digit characters, uppercases X, formats as
XXXX-XXXX
- Syntax check: ISBN-13 must match
^isbn:97[89][0-9X]{10}$, ISBN-10 must match^isbn:[0-9X]{10}$ - Checksum: Validates modulo 10 for ISBN-13, modulo 11 for ISBN-10
- Normalization: Removes non-digit characters, uppercases X
Other identifiers
Section titled “Other identifiers”Identifiers with other schemas (PMID, arXiv, Wikidata, etc.) are accepted without validation.