Distributed Mode
Arneb supports distributed query execution using a coordinator-worker architecture. The coordinator accepts SQL queries, plans execution, and dispatches tasks to workers via Apache Arrow Flight RPC.
Roles
| Role | pgwire | Web UI | Flight RPC | Description |
|---|---|---|---|---|
standalone | yes | yes | yes | Single process, all-in-one (default) |
coordinator | yes | yes | yes | Accepts SQL, plans queries, dispatches to workers |
worker | no | no | yes | Executes plan fragments, serves data |
Coordinator Setup
Create coordinator.toml:
bind_address = "0.0.0.0"
port = 5432
[[tables]]
name = "lineitem"
path = "/data/lineitem.parquet"
format = "parquet"Start the coordinator:
cargo run --bin arneb -- --config coordinator.toml --role coordinatorThe coordinator listens on:
- Port
5432— pgwire (SQL clients) - Port
6432— Web UI - Port
9090— Flight RPC (worker communication)
Worker Setup
Create worker.toml:
bind_address = "0.0.0.0"
[[tables]]
name = "lineitem"
path = "/data/lineitem.parquet"
format = "parquet"
[cluster]
rpc_port = 9091
coordinator_address = "127.0.0.1:9090"
worker_id = "worker-1"Start the worker:
cargo run --bin arneb -- --config worker.toml --role workerWorkers do not expose pgwire or Web UI ports. They communicate with the coordinator via Flight RPC only.
TIP
Workers need the same table definitions as the coordinator. Each worker should have access to the same data files or object store paths.
Adding More Workers
Each worker needs a unique worker_id and rpc_port:
# worker-2.toml
bind_address = "0.0.0.0"
[[tables]]
name = "lineitem"
path = "/data/lineitem.parquet"
format = "parquet"
[cluster]
rpc_port = 9092
coordinator_address = "127.0.0.1:9090"
worker_id = "worker-2"Workers register with the coordinator automatically via a heartbeat protocol on startup.
How It Works
- A SQL query arrives at the coordinator via pgwire
- The coordinator parses, plans, and optimizes the query into a
LogicalPlan - The
PlanFragmentersplits the plan into fragments suitable for distributed execution - The
NodeSchedulerassigns fragments to available workers - Workers execute their fragments and return results via Flight RPC
- The coordinator assembles the final result and sends it to the client
Multi-Node Example
Start a coordinator and two workers on a single machine:
Terminal 1 — Coordinator:
cargo run --bin arneb -- --config coordinator.toml --port 5432 --role coordinatorTerminal 2 — Worker 1:
cargo run --bin arneb -- --config worker-1.toml --role workerTerminal 3 — Worker 2:
cargo run --bin arneb -- --config worker-2.toml --role workerTerminal 4 — Query:
psql -h 127.0.0.1 -p 5432 -c "SELECT count(*) FROM lineitem;"