CSV format#
This page documents the input CSV format and supported identifier schemas.
CSV format#
Meta expects CSV files with these columns:
Column |
Required |
Description |
|---|---|---|
|
Yes |
Space-separated identifiers in |
|
No |
Title of the work |
|
No |
Semicolon-separated names in |
|
No |
ISO 8601 date: |
|
No |
Container title with optional identifier in brackets (see venue format) |
|
No |
Volume number |
|
No |
Issue number |
|
No |
Page range (e.g., |
|
No |
Resource type (see resource types) |
|
No |
Publisher name with optional identifier in brackets (see publisher format) |
|
No |
Same format as |
Example#
id,title,author,pub_date,venue,volume,issue,page,type,publisher,editor
doi:10.1162/qss_a_00292,OpenCitations Meta,"Massari, Arcangelo [orcid:0000-0002-8420-0696]; Mariani, Fabio [orcid:0000-0002-8810-1564]; Heibi, Ivan [orcid:0000-0001-5366-5194]; Peroni, Silvio [orcid:0000-0003-0530-4305]; Shotton, David [orcid:0000-0001-5506-523X]",2024-01-22,Quantitative Science Studies [issn:2641-3337],5,1,50-75,journal article,MIT Press [crossref:281],
Identifier format#
Identifiers use the format schema:value:
doi:10.1162/qss_a_00292
pmid:38034492
orcid:0000-0002-8420-0696
issn:2641-3337
Multiple identifiers are separated by spaces:
doi:10.1162/qss_a_00292 pmid:38034492
Supported identifier schemas#
Bibliographic resources#
Schema |
Example |
Description |
|---|---|---|
|
|
Digital Object Identifier |
|
|
PubMed ID |
|
|
PubMed Central ID |
|
|
arXiv identifier |
|
|
International Standard Book Number |
|
|
International Standard Serial Number |
|
|
Web URL |
|
|
Wikidata entity |
|
|
Wikipedia article |
|
|
OpenAlex work ID |
Responsible agents#
Schema |
Example |
Description |
|---|---|---|
|
|
ORCID identifier |
|
|
VIAF identifier |
|
|
Crossref funder/member ID |
|
|
Wikidata entity |
|
|
Research Organization Registry |
Date format#
Dates should use ISO 8601 format:
Format |
Example |
Precision |
|---|---|---|
|
|
Day |
|
|
Month |
|
|
Year |
Resource types#
Supported values for the type column:
Value |
Description |
|---|---|
|
Article in a journal |
|
Complete book |
|
Chapter in a book |
|
Other part of a book |
|
Section of a book |
|
Series of books |
|
Set of books |
|
Book with editors |
|
Reference work |
|
Single-author scholarly work |
|
Technical or research report |
|
Series of reports |
|
Technical standard |
|
Series of standards |
|
Complete journal |
|
Volume of a journal |
|
Issue of a journal |
|
Conference proceedings |
|
Article in proceedings |
|
Series of proceedings |
|
Entry in reference work |
|
Thesis or dissertation |
|
Peer review document |
|
Dataset |
|
Dataset |
|
Web page or content |
Venue format#
Venues can include identifiers:
Quantitative Science Studies [issn:2641-3337]
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries [isbn:978-1-4503-9822-4]
For book chapters, the venue is the containing book:
The Semantic Web: Research and Applications [isbn:978-3-642-30283-1]
Publisher format#
Publishers can include Crossref member IDs or ROR identifiers:
MIT Press [crossref:281]
Springer Nature [crossref:297]
University of Bologna [ror:01111rn36]
Validation#
Meta validates identifiers during curation using oc_ds_converter.oc_idmanager.
DOI#
Syntax check: Must match
^doi:10\.(\d{4,9}|[^\s/]+(\.[^\s/]+)*)/[^\s]+$Normalization: Removes URL prefixes (
https://doi.org/,http://dx.doi.org/), converts to lowercase
ORCID#
Syntax check: Must match
^orcid:([0-9]{4}-){3}[0-9]{3}[0-9X]$Checksum: Validates using ISO/IEC 7064:2003 MOD 11-2
Normalization: Removes non-digit characters, uppercases X, formats as
XXXX-XXXX-XXXX-XXXX
ISSN#
Syntax check: Must match
^issn:[0-9]{4}-[0-9]{3}[0-9X]$Checksum: Validates using modulo 11
Special case:
0000-0000is explicitly rejectedNormalization: Removes non-digit characters, uppercases X, formats as
XXXX-XXXX
ISBN#
Syntax check: ISBN-13 must match
^isbn:97[89][0-9X]{10}$, ISBN-10 must match^isbn:[0-9X]{10}$Checksum: Validates modulo 10 for ISBN-13, modulo 11 for ISBN-10
Normalization: Removes non-digit characters, uppercases X
Other identifiers#
Identifiers with other schemas (PMID, arXiv, Wikidata, etc.) are accepted without validation.