Skip to content

MySQL โ€‹

๐Ÿšง Roadmap. This connector ships in DB-M2, after Postgres. The design below is committed and stable; the implementation is in progress. See the spec for status.

The MySQL engine reuses everything DB-M1 delivered for Postgres โ€” the ExternalSource trait, the federation crate, the CDC consumer, the schema evolver. What's MySQL-specific is the binlog reader, the GTID checkpoint, and one correctness-critical pushdown rule: collations.

type: "mysql". Versions: 8.0+. Behind the federation Cargo feature.

Configuration โ€‹

bash
curl -sS -X POST http://localhost:8080/v1/connectors \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
  "name": "mysql_orders",
  "type": "mysql",
  "mode": "both",
  "connection": {
    "url":         "mysql://app@orders-rds.example.com:3306/orders",
    "secret_ref":  "$env:MYSQL_ORDERS_PASSWORD",
    "tls":         "required",
    "pool_size":   10
  },
  "scope": {
    "include_schemas": ["orders"],
    "exclude_tables":  ["orders.audit_log"]
  },
  "sync": {
    "tables": ["orders.line_items", "orders.shipments"]
  }
}
JSON

The mode, connection, scope, and sync shapes are the same as Postgres. tls: "required" is the default; disabling it requires an explicit override.

Collation safety โ€‹

This is the trap MySQL sets that nothing else does. By default many MySQL columns use case-insensitive collations (utf8mb4_general_ci, utf8mb4_0900_ai_ci, โ€ฆ). Pushed equality and LIKE against such a column would return rows DataFusion's own case-sensitive evaluation never would โ€” and quietly corrupt every federated query that joined on a string column.

The MySQL Capabilities struct carries string_collation_safe_columns: Set<TableQualifiedName>, populated at introspection time. The shared PushdownPlanner refuses to push any string equality or LIKE filter unless the column's collation is one of the explicitly verified case-sensitive variants (utf8mb4_bin, utf8mb4_0900_as_cs). Filters on case-insensitive columns evaluate above the scan, in DataFusion, where the semantics are predictable.

The pushdown_summary flags the residual with agg_residual_reason / filters_residual so an operator can see why a filter that "looks" pushable didn't. To force pushdown, change the column's collation on the source to a case-sensitive one โ€” there is no override knob, intentionally. A wrong answer is worse than a slow one.

CDC sync โ€‹

Two-phase pipeline per source-table, mirroring the Postgres flow:

  1. Initial snapshot. START TRANSACTION WITH CONSISTENT SNAPSHOT, capture gtid_executed, stream rows, advance connector_cdc_state.phase to streaming with the captured GTID set as the cursor โ€” atomically with the kyma extent CAS.
  2. Streaming. COM_BINLOG_DUMP_GTID with the executed-GTID set as the resume point. Row events become rows tagged with _kyma_op; deletes are tombstones.

Cursor checkpoints are GTID sets, stored as opaque JSON in connector_cdc_state.checkpoint. Reopen-from-checkpoint is the only recovery path; the source replays, kyma's idempotency layer dedupes.

Type mapping โ€‹

MySQL typekyma typeNotes
tinyint, smallint, mediumint, intinttinyint(1) โ†’ bool
bigint, bigint unsignedlongbigint unsigned > i64 max โ†’ string with warning
decimal(p, s)real for p โ‰ค 15; else string
float, doublereal
bitbool (width 1) or dynamic (wider)
date, datetime, timestamp, time, yeartimestampUTC; time and year stringified post-v1
char, varchar, text, longtext, enum, setstringset comma-joined
binary, varbinary, blobstring (base64)
jsondynamic (CBOR)Whole document; field-level inference is post-v1
Spatial (geometry, point, โ€ฆ)string (WKT)

System columns on synced tables โ€‹

Every synced row has four extra columns kyma adds automatically:

ColumnTypeMeaning
_kyma_pkstringSource PK; for composite PKs, <col1>:<col2>:... in information_schema order.
_kyma_opstring'insert' | 'update' | 'delete'.
_kyma_lsnstringGTID at commit time.
_kyma_event_attimestampWall-clock the source emitted the event.

A source table with no primary key is rejected at connector start with disabled_reason="table has no primary key โ€” cannot CDC sync". Use mode: "federation" for that table instead.

Federation pushdown โ€‹

Filters, projection, LIMIT, ORDER BY, single-source aggregations, opportunistic same-source joins. The list is identical to Postgres โ€” minus the collation-unsafe filters described above, which always go residual.

Failure modes โ€‹

FailureBehavior
Source unreachableFederation: 502 source_unreachable. Sync: stream reopens at GTID.
TLS handshake failsConnector disabled, disabled_reason="tls_failed: <detail>".
Binlog purged past the connector's GTIDConnector disabled, disabled_reason="binlog_purged". Manual resync.
tinyint(1) source column widens to tinyint(4)New column added; old bool column preserved; reads union via coalesce.
Source DDL emits unrepresentable typeField routes to dynamic; warning in last_error. Sync continues.
Source PK changesConnector disabled, disabled_reason="pk_changed". Manual resync.
bigint unsigned value > 2^63 - 1Routed to string with warning. Existing typed data preserved.
Pool exhausted503 pool_exhausted; visible in GET /v1/connectors/:id/status.

Where to go next โ€‹