Skip to content

CSV format

This page documents the input CSV format and supported identifier schemas.

Meta expects CSV files with these columns:

ColumnRequiredDescription
idYesSpace-separated identifiers in schema:value format (see supported schemas)
titleNoTitle of the work
authorNoSemicolon-separated names in Surname, Name [identifier] format (see author/editor format)
pub_dateNoISO 8601 date: YYYY-MM-DD, YYYY-MM, or YYYY (see date format)
venueNoContainer title with optional identifier in brackets (see venue format)
volumeNoVolume number
issueNoIssue number
pageNoPage range (e.g., 50-75)
typeNoResource type (see resource types)
publisherNoPublisher name with optional identifier in brackets (see publisher format)
editorNoSame format as author
id,title,author,pub_date,venue,volume,issue,page,type,publisher,editor
doi:10.1162/qss_a_00292,OpenCitations Meta,"Massari, Arcangelo [orcid:0000-0002-8420-0696]; Mariani, Fabio [orcid:0000-0002-8810-1564]; Heibi, Ivan [orcid:0000-0001-5366-5194]; Peroni, Silvio [orcid:0000-0003-0530-4305]; Shotton, David [orcid:0000-0001-5506-523X]",2024-01-22,Quantitative Science Studies [issn:2641-3337],5,1,50-75,journal article,MIT Press [crossref:281],

Identifiers use the format schema:value:

doi:10.1162/qss_a_00292
pmid:38034492
orcid:0000-0002-8420-0696
issn:2641-3337

Multiple identifiers are separated by spaces:

doi:10.1162/qss_a_00292 pmid:38034492
SchemaExampleDescription
doidoi:10.1162/qss_a_00292Digital Object Identifier
pmidpmid:38034492PubMed ID
pmcidpmcid:PMC10927410PubMed Central ID
arxivarxiv:2302.03976arXiv identifier
isbnisbn:978-3-030-00668-6International Standard Book Number
issnissn:2641-3337International Standard Serial Number
urlurl:https://opencitations.netWeb URL
wikidatawikidata:Q107507571Wikidata entity
wikipediawikipedia:OpenCitationsWikipedia article
openalexopenalex:W4390928828OpenAlex work ID
SchemaExampleDescription
orcidorcid:0000-0002-8420-0696ORCID identifier
viafviaf:309649614VIAF identifier
crossrefcrossref:281Crossref funder/member ID
wikidatawikidata:Q30265034Wikidata entity
rorror:01111rn36Research Organization Registry

Authors and editors use the format:

Surname, Given Name [identifier]

Multiple authors are separated by semicolons:

Massari, Arcangelo [orcid:0000-0002-8420-0696]; Mariani, Fabio [orcid:0000-0002-8810-1564]; Heibi, Ivan [orcid:0000-0001-5366-5194]; Peroni, Silvio [orcid:0000-0003-0530-4305]; Shotton, David [orcid:0000-0001-5506-523X]

The identifier in brackets is optional.

The comma determines how names are interpreted:

With comma = Person

Peroni, Silvio → Family: Peroni, Given: Silvio
Massari, A. → Family: Massari, Given: A.
Shotton, David M. → Family: Shotton, Given: David M.

Without comma = Organization

MIT Press → Organization name
World Health Organization → Organization name

If a name has no comma, Meta treats it as an organization, not a person.

Dates should use ISO 8601 format:

FormatExamplePrecision
YYYY-MM-DD2024-01-15Day
YYYY-MM2024-01Month
YYYY2024Year

Supported values for the type column:

ValueDescription
journal articleArticle in a journal
bookComplete book
book chapterChapter in a book
book partOther part of a book
book sectionSection of a book
book seriesSeries of books
book setSet of books
edited bookBook with editors
reference bookReference work
monographSingle-author scholarly work
reportTechnical or research report
report seriesSeries of reports
standardTechnical standard
standard seriesSeries of standards
journalComplete journal
journal volumeVolume of a journal
journal issueIssue of a journal
proceedingsConference proceedings
proceedings articleArticle in proceedings
proceedings seriesSeries of proceedings
reference entryEntry in reference work
dissertationThesis or dissertation
peer reviewPeer review document
data fileDataset
datasetDataset
web contentWeb page or content

Venues can include identifiers:

Quantitative Science Studies [issn:2641-3337]
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries [isbn:978-1-4503-9822-4]

For book chapters, the venue is the containing book:

The Semantic Web: Research and Applications [isbn:978-3-642-30283-1]

Publishers can include Crossref member IDs or ROR identifiers:

MIT Press [crossref:281]
Springer Nature [crossref:297]
University of Bologna [ror:01111rn36]

Meta validates identifiers during curation using oc_ds_converter.oc_idmanager.

  • Syntax check: Must match ^doi:10\.(\d{4,9}|[^\s/]+(\.[^\s/]+)*)/[^\s]+$
  • Normalization: Removes URL prefixes (https://doi.org/, http://dx.doi.org/), converts to lowercase
  • Syntax check: Must match ^orcid:([0-9]{4}-){3}[0-9]{3}[0-9X]$
  • Checksum: Validates using ISO/IEC 7064:2003 MOD 11-2
  • Normalization: Removes non-digit characters, uppercases X, formats as XXXX-XXXX-XXXX-XXXX
  • Syntax check: Must match ^issn:[0-9]{4}-[0-9]{3}[0-9X]$
  • Checksum: Validates using modulo 11
  • Special case: 0000-0000 is explicitly rejected
  • Normalization: Removes non-digit characters, uppercases X, formats as XXXX-XXXX

Identifiers with other schemas (PMID, arXiv, Wikidata, etc.) are accepted without validation.