Skip to content

Postgres โ€‹

๐Ÿšง Roadmap. This connector ships in DB-M1. The design below is committed and stable; the implementation is in progress. See the spec for status.

The Postgres engine is the first of three operational-database connectors. It registers as a DataFusion catalog (federation), replays the source's logical replication into kyma extents (sync), or both at once. The same connection pool, the same schema introspection, the same status surface.

type: "postgres". Versions: 15+, 16+. Behind the federation Cargo feature.

Modes โ€‹

bash
curl -sS -X POST http://localhost:8080/v1/connectors \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
  "name": "pg_prod",
  "type": "postgres",
  "mode": "both",
  "connection": {
    "url":         "postgres://app@prod-rds.example.com:5432/app",
    "secret_ref":  "$env:PG_PROD_PASSWORD",
    "tls":         "required",
    "pool_size":   10
  },
  "scope": {
    "include_schemas": ["public", "billing"],
    "exclude_tables":  ["public.audit_log"]
  },
  "sync": {
    "tables":      ["public.users", "billing.invoices"],
    "schedule_ms": null
  }
}
JSON
  • mode: "federation" โ€” register the source as a DataFusion catalog only. Live reads against the source; nothing copied.
  • mode: "sync" โ€” initial snapshot under REPEATABLE READ, then logical replication into kyma extents. Queries hit kyma, not the source.
  • mode: "both" โ€” both at once. Bare references resolve to synced extents; live(table) opts into the federated path. See Multi-source data.

tls is "required" by default; "disabled" works but emits a loud warning into last_error and refuses to start without an explicit override.

Federation pushdown โ€‹

DataFusion calls FederatedTableProvider::scan(filters, projection, limit, sort); the shared PushdownPlanner reads the Postgres Capabilities and produces a PushedPlan::Sql plus a residual.

Always pushed:

  • Column projection.
  • Filters: =, !=, <, <=, >, >=, IN, IS NULL, LIKE (with translated wildcards), AND/OR/NOT trees.
  • LIMIT, ORDER BY, and the combined TopK shape.
  • Single-source aggregations: COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT), GROUP BY โ€” when every column referenced is on the same source.

Pushed when safe:

  • Joins where both sides are this same connection. The whole join becomes one SELECT to Postgres.
  • Common scalar functions with verified semantics: LOWER, UPPER, COALESCE, date_trunc.

Never pushed:

  • kyma-specific UDFs (cosine_distance, token-index functions, dynamic accessors).
  • Cross-source joins. Each side scans at its own source; DataFusion joins the streams.
  • Window functions (v1).

Every federated response carries a pushdown_summary array โ€” one entry per FederatedScan โ€” listing what got pushed, what stayed residual, and why. See Multi-source data.

CDC sync โ€‹

Two-phase pipeline per source-table:

  1. Initial snapshot. BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY, pg_create_logical_replication_slot('kyma_<connector_id>', 'pgoutput', true), SET TRANSACTION SNAPSHOT '<exported>'. Stream rows in batches; on the final batch, advance connector_cdc_state.phase to streaming and commit the cursor at the slot's LSN โ€” atomically with the kyma extent CAS.
  2. Streaming. START_REPLICATION SLOT kyma_<connector_id> LOGICAL <lsn> with proto_version=2. Group-commit batches to kyma extents. INSERT/UPDATE/DELETE events become rows tagged with _kyma_op and a tombstone row for deletes.

The exactly-once knot: the cursor advance is a payload in the same tables.current_snapshot_id CAS that already commits the extent manifest. Either both land or neither does. On kill -9 mid-batch, the runner reopens from connector_cdc_state.checkpoint; the source replays; kyma's idempotency layer dedupes any partway-through rows.

Type mapping โ€‹

Postgres typekyma typeNotes
smallint, intint32-bit signed
bigint, oidlong
real, double precisionreal64-bit float
numeric(p, s)real for p โ‰ค 15; else stringForce via connection.numeric_mode = "string"
booleanbool
text, varchar, char, name, uuidstringUUIDs in canonical form
byteastring (base64)Bytes-typed columns post-v1
timestamp, timestamptz, date, timetimestampAlways UTC; loses sub-microsecond precision
json, jsonbdynamic (CBOR)Whole document; field-level inference inside JSONB is post-v1
int[], text[], etc.dynamicArrow List<T> mapping post-v1
int4range, tstzrange, etc.dynamic{lower, upper, lower_inc, upper_inc}
geometry (PostGIS)string (WKT)Opt-in via scope.geometry_mode = "wkt"
enumstring
hstoredynamicMap representation in CBOR
inet, cidr, macaddrstring
vector (pgvector)vector(N)Dimension fixed; mismatch errors loudly

Composite types and domains unwrap to their base type. Out-of-band types not in this table land in dynamic with a warning in last_error.

System columns on synced tables โ€‹

Every synced row has four extra columns kyma adds automatically:

ColumnTypeMeaning
_kyma_pkstringSource PK; for composite PKs, <col1>:<col2>:... in information_schema order.
_kyma_opstring'insert' | 'update' | 'delete'.
_kyma_lsnstringPostgres LSN at commit time.
_kyma_event_attimestampWall-clock the source emitted the event.

A source table with no primary key is rejected at connector start with disabled_reason="table has no primary key โ€” cannot CDC sync". CDC without a PK can't dedupe replays or build tombstones correctly. Use mode: "federation" for that table instead, or add a PK on the source.

Failure modes โ€‹

FailureBehavior
Source unreachableFederation: 502 source_unreachable. Sync: stream reopens at last cursor.
TLS handshake failsConnector disabled, disabled_reason="tls_failed: <detail>".
Replication slot dropped externallyConnector disabled, disabled_reason="replication_slot_missing". Does NOT auto-recreate. Operator decides: re-enable to re-snapshot, or restore externally.
Source DDL adds an unrepresentable typeField routes to dynamic; warning in last_error. Sync continues.
Source DDL drops a columnColumn null-fills going forward. Existing data preserved.
Source PK changesConnector disabled, disabled_reason="pk_changed". Manual resync.
Pool exhausted503 pool_exhausted; visible in GET /v1/connectors/:id/status.
Pushdown produced incorrect SQL (bug)Source SQL parse error; 5xx pushdown_failed; logs SQL + residual. CI property tests gate this.

Where to go next โ€‹