Skip to content

Hive Connector

Arneb connects to Apache Hive Metastore (HMS) to discover and query tables managed by Hive catalogs. The connector supports HMS 4.x via an async Thrift client.

Configuration

toml
[[catalogs]]
name = "datalake"
type = "hive"
metastore_uri = "127.0.0.1:9083"
default_schema = "default"

[catalogs.storage.s3]
region = "us-east-1"
endpoint = "http://localhost:9000"
allow_http = true
FieldTypeRequiredDescription
namestringyesCatalog name used in 3-part table references (catalog.schema.table)
typestringyesMust be "hive"
metastore_uristringyeshost:port of the Hive Metastore Thrift service (no scheme)
default_schemastringnoDefault schema to use when schema is not specified

Three-Part Table References

Tables in Hive catalogs use three-part naming:

sql
SELECT * FROM datalake.demo.cities;
--            ^^^^^^^^ ^^^^ ^^^^^^
--            catalog  schema table

Storage Configuration

Hive tables are typically stored in object stores. Configure storage credentials either globally or per-catalog:

toml
# Global storage (used by all catalogs unless overridden)
[storage.s3]
region = "us-east-1"

# Per-catalog override
[catalogs.storage.s3]
region = "us-east-1"
endpoint = "http://localhost:9000"
allow_http = true
access_key_id = "minioadmin"
secret_access_key = "minioadmin"

Per-catalog settings merge with and override global [storage] settings.

Local Demo Walkthrough

Arneb includes a Docker Compose setup with HMS 4.2.0 and MinIO for local development.

Prerequisites

  • Docker and Docker Compose
  • Rust toolchain

Step 1: Start Services

bash
docker compose up -d

This starts:

  • MinIO — S3-compatible object store on port 9000 (API) and 9001 (console)
  • Hive Metastore — HMS 4.2.0 on port 9083

Step 2: Seed TPC-H Data

bash
docker compose run --rm tpch-seed

This creates 8 TPC-H tables in the tpch schema on MinIO via Trino CTAS.

Step 3: Start Arneb

bash
cargo run --bin arneb -- --config benchmarks/tpch/tpch-hive.toml

The config (benchmarks/tpch/tpch-hive.toml) connects to the local HMS and MinIO:

toml
bind_address = "127.0.0.1"
port = 5432

[storage.s3]
region = "us-east-1"
endpoint = "http://localhost:9000"
allow_http = true
access_key_id = "minioadmin"
secret_access_key = "minioadmin"

[[catalogs]]
name = "datalake"
type = "hive"
metastore_uri = "127.0.0.1:9083"
default_schema = "tpch"

Step 4: Run Queries

bash
psql -h 127.0.0.1 -p 5432 -c "SELECT COUNT(*) FROM datalake.tpch.nation;"
psql -h 127.0.0.1 -p 5432 -c "SELECT COUNT(*) FROM datalake.tpch.lineitem;"

Step 5: Tear Down

bash
docker compose down

HMS Compatibility

The Hive connector uses auto-generated Thrift bindings from the Hive 4.2.0 IDL. It communicates with HMS using plain TBinaryProtocol (buffered codec), compatible with standard HMS deployments.

The Thrift bindings are in the hive-metastore crate. To regenerate after modifying the IDL:

bash
cargo run -p hive-metastore-thrift-build