Skip to content

Ingest

Four ways to get bytes into kyma. All of them coerce JSON-shaped input into Arrow batches, hand off to the same staging buffer, and commit through the same snapshot CAS — so what differs between them is the wire format and the failure model, not the storage shape they produce.

Pick the path closest to where your data already lives.

REST / NDJSON

POST /v1/ingest with NDJSON body. Auto-creates the table, evolves the schema mid-batch, returns the snapshot the rows are visible at. The default for application code, web hooks, and the entire quickstart path.

OTLP gRPC

OpenTelemetry Protocol over gRPC on port 4317. Logs land in a fixed otel_logs table in the configured database. Phase A is logs-only — traces and metrics are tracked follow-ups.

Kafka

Built-in consumer that maps one topic to one table. Subscribes, parses NDJSON message bodies, and commits Kafka offsets after each batch. Use it where Kafka is already the durability layer.

File-drop

Watcher polls an object-store prefix; each file's SHA256 is its idempotency key. Path convention {prefix}/{database}/{table}/... routes the file to a target table. Re-scans of the same file are no-ops.

Idempotency and coercion

Cross-cutting reference — JSON-to-Arrow type rules, schema-evolves mid-batch, the three idempotency-key shapes (REST header, file-drop SHA256, Kafka offsets). Read this once and the four pages above are mostly examples.

What's the same across all four

  • One write path. Frontend bytes become Arrow batches; the staging buffer group-commits them; the commit coordinator publishes a new snapshot via Postgres CAS. See Extents and snapshots.
  • Schema only widens. New columns get added; old ones never get narrowed or deleted. Old extents stay readable through schema changes. See Schema model.
  • Idempotent by design. REST sends a key, file-drop hashes the bytes, Kafka tracks offsets. A replayed input never produces a duplicate extent at the catalog boundary.