<!--
SPDX-FileCopyrightText: 2026 Arcangelo Massari <arcangelo.massari@unibo.it>

SPDX-License-Identifier: CC-BY-4.0
-->

# Multi-source queries

RAMOSE can combine results from multiple SPARQL endpoints in a single operation. This is driven by directives: lines starting with `@@` inside the `#sparql` block.

When no directives are present, the query runs against the default endpoint as usual. When directives appear, RAMOSE splits the block into steps and executes them in sequence, building up an accumulator of rows.

SPARQL read retries apply per HTTP SPARQL step and per SPARQL Anything read step. In an `@@foreach` block, RAMOSE retries only the failed iteration. It does not restart the whole multi-source pipeline. Write operations are not retried by this policy.

## Setup

Register named endpoints in the API section:

```
#sources meta=https://opencitations.net/meta/sparql; index=https://opencitations.net/index/sparql
```

## Directive syntax

All directives follow the same grammar:

```
@@name <arg>... [param=value]...
```

Parameters can be passed positionally or by name using `key=value` syntax, like Python function arguments. Once a keyword argument appears, all subsequent arguments must also be keyword. Optional parameters (those with defaults) use `key=value` syntax.

A token with `=` is treated as a keyword argument only if the key matches a known parameter name. This allows values containing `=` (such as URLs with query strings) to be passed positionally without ambiguity.

```
@@foreach ?br item wait=0.5
@@foreach ?br placeholder=item wait=0.5
@@foreach variable=?br placeholder=item wait=0.5
```

These three forms are equivalent.

## Directives

### @@with

Switch to a named source or explicit endpoint and, when needed, select the query engine for subsequent queries. Source names are declared in `#sources`; direct endpoint URLs can be used without declaring a source.

Syntax: `@@with [source=<source>|endpoint=<url>] [engine=<sparql|sparql-anything>]`

```
@@with index
SELECT ?citing ?cited WHERE { ... }
```

```
@@with endpoint=https://opencitations.net/index/sparql
SELECT ?citing ?cited WHERE { ... }
```

| Parameter | Required | Description |
|-----------|----------|-------------|
| `source` | yes for `engine=sparql` unless `endpoint` is set | Name declared in `#sources` |
| `endpoint` | yes for `engine=sparql` unless `source` is set | SPARQL endpoint URL |
| `engine` | no | `sparql` by default; use `sparql-anything` for SPARQL Anything queries |

### @@join

Join the next query's results with the current accumulator.

Syntax: `@@join <left_var> <right_var> [type=<inner|left>]`

```
@@join ?doi ?doi type=left
SELECT ?doi ?citation_count WHERE { ... }
```

| Parameter | Required | Default | Description |
|-----------|----------|---------|-------------|
| `left_var` | yes | | Join key from the accumulator |
| `right_var` | yes | | Join key from the next query |
| `type` | no | `inner` | `inner` keeps only matches; `left` preserves all accumulator rows |

Join keys are normalized (http/https unification, trailing slash removal) to handle minor URL differences between endpoints.

When a right-side column name collides with an existing column, it gets a `_r` suffix.

### @@values

Inject accumulated values into the next query as a SPARQL `VALUES` clause.

Syntax: `@@values <var>...`

```
@@values ?doi
SELECT ?doi ?abstract WHERE { ... }
```

Takes one or more `?variable` names. RAMOSE collects distinct values for the listed variables from the accumulator and inserts a `VALUES` block into the next query's `WHERE` clause. Literal values are quoted; IRIs (starting with `http://` or `https://`) are wrapped in angle brackets.

### @@foreach

Iterate the next query once per distinct value of a variable from the accumulator.

Syntax: `@@foreach <variable> <placeholder> [wait=<seconds>]`

```
@@foreach ?br item wait=0.5
SELECT ?result WHERE {
  BIND(<[[item]]> as ?br)
  ...
}
```

| Parameter | Required | Default | Description |
|-----------|----------|---------|-------------|
| `variable` | yes | | Column from the accumulator to iterate over (must start with `?`) |
| `placeholder` | yes | | Name used as `[[placeholder]]` in the query text |
| `wait` | no | `0` | Pause in seconds (float) between iterations |

Results from all iterations are concatenated.

### @@remove

Drop columns from the accumulator.

Syntax: `@@remove <var>...`

```
@@remove ?batch_id ?temp_var
```

Takes one or more `?variable` names. Useful for cleaning up intermediate columns before the final output.

### @@page

Keep only one page of distinct values of a variable, so the steps that follow resolve only that page.

Syntax: `@@page <variable> [default_size=<N>] [max_size=<M>]`

```
@@page ?id default_size=10 max_size=100
```

The page number and page size come from the request's `page` and `page_size` parameters, the same ones used by RAMOSE's built-in pagination. When `page_size` is absent the directive uses `default_size`; with neither set it does nothing and every row passes through. An explicit `page_size` above `max_size` is rejected with HTTP 422.

The directive counts the distinct values of `<variable>` in first-appearance order, so a preceding `ORDER BY` decides which values land on each page. It keeps the rows whose value belongs to the requested page and records the total count of distinct values, the current page, and the page size on the operation. The output converter reads these through the request URL to report the totals.

One way to use it: place `@@page` after a cheap query that returns just the variable to paginate (and any sort key), and before the queries that resolve full per-item data. The page is fixed first, so the expensive resolution runs for one page instead of every match.

## Full example

A query that fetches metadata from OpenCitations Meta and joins citation counts from the OpenCitations Index:

```
#sources meta=https://opencitations.net/meta/sparql; index=https://opencitations.net/index/sparql
```

```
#sparql
SELECT ?doi ?title WHERE {
  ?identifier literal:hasLiteralValue "[[doi]]"^^xsd:string ;
    datacite:usesIdentifierScheme datacite:doi ;
    ^datacite:hasIdentifier ?res .
  ?res dcterm:title ?title .
  BIND("[[doi]]"^^xsd:string as ?doi)
}
@@with index
@@join ?doi ?doi type=left
SELECT ?doi ?citation_count WHERE {
  BIND("[[doi]]" as ?doi)
  {
    SELECT (COUNT(?citing) as ?citation_count) WHERE {
      ?citing cito:cites ?cited .
      ?cited datacite:hasIdentifier/literal:hasLiteralValue "[[doi]]"^^xsd:string
    }
  }
}
```

This fetches the title from Meta, then joins the citation count from Index. The `left` join keeps the row even if the Index has no citation data for that DOI.

## SPARQL Anything

[SPARQL Anything](https://sparql-anything.cc/) lets you query non-RDF data sources (CSV, JSON, XML, etc.) using SPARQL. RAMOSE integrates it via [PySPARQL-Anything](https://pypi.org/project/pysparql-anything/).

This requires the optional extra:

```sh
pip install ramose[sparql-anything]
```

Use `@@with engine=sparql-anything` before the query:

```
@@with engine=sparql-anything
SELECT * WHERE {
  SERVICE <x-sparql-anything:location=https://example.org/data.csv> {
    ?s ?p ?o
  }
}
```
