Dynamic and vectors
Kyma is column-aware first. Most data fits the typed columns from the schema model. Two column types handle what doesn't: dynamic for arbitrary structured data, and vector(N) for embeddings.
The dynamic column
dynamic is the catch-all. CBOR-encoded values per row, with two catalog-side indices that make queries against them fast.
Why CBOR (and not JSON): smaller, faster to parse, native binary support without base64. Every primitive type maps to a CBOR major type; nested maps and arrays are first-class.
What goes in dynamic
Anything that's structurally not a single column:
- OTLP resource attributes — a map of arbitrary keys per log line.
- Mongo documents synced via the connector framework. Top-level fields flatten to dotted columns up to
flatten_depth; deeper nesting and polymorphic fields land indynamic. - Postgres
jsonbcolumns — the entire document as onedynamicvalue. - Application-specific attributes that haven't stabilized into typed columns yet.
Path access in queries
You query dynamic with bracketed path syntax:
otel_logs
| where attributes["http.method"] == "POST"
| where attributes["http.status_code"] >= 500
| project _timestamp, attributes["http.url"], attributes["error.code"]The same in SQL:
SELECT _timestamp,
attributes ->> 'http.url' AS url,
attributes ->> 'error.code' AS error_code
FROM otel_logs
WHERE attributes ->> 'http.method' = 'POST'
AND CAST(attributes ->> 'http.status_code' AS INTEGER) >= 500KQL is the more ergonomic surface for dynamic access; SQL works but gets verbose with casts.
Why dynamic queries stay fast
Two indices, both at the extent level:
- Path bitmap records which paths were written to this extent. A query referencing a path the extent never saw skips it without reading any block bytes.
- Token index is a posting list over leaf strings. A predicate like
attributes["error.code"] == "ECONNRESET"plans as a posting-list intersection, not a substring scan.
For a tour of how these fit into the broader pipeline, see The pruning cascade.
When to promote out of dynamic
A field that's appeared in ≥ 100 events with one consistent type within a 1000-event window is a candidate for promotion to a typed column. Manual promotion via kyma-cli:
kyma-cli alter-table otel_logs add-column \
--name "service_name" \
--type "string" \
--from-dynamic "attributes.service.name"After promotion, new writes go to the typed column; old data stays in dynamic. Reads union the two via coalesce(). Connectors in sync mode do this promotion automatically.
The vector(N) column
A fixed-dimension Float32 embedding column. The dimension N is set at table creation time and never changes.
kyma-cli create-table embeddings \
--schema '_timestamp:timestamp, doc_id:string, body:string, embedding:vector(384)'Storage
Vectors are stored as Arrow FixedSizeList<Float32, N>. Per-extent column statistics include centroid, bounding box, and (when an ANN index is built — see roadmap) HNSW or IVF metadata.
Distance UDFs
Three distance functions registered in DataFusion:
SELECT doc_id,
cosine_distance(embedding, $query_vec) AS d
FROM embeddings
ORDER BY d ASC
LIMIT 5Available UDFs:
| UDF | Distance |
|---|---|
cosine_distance(a, b) | 1 - (a · b) / (‖a‖ ‖b‖) |
l2_distance(a, b) | √(Σ (aᵢ − bᵢ)²) |
inner_product(a, b) | −(a · b) (for ranking) |
Dimensions are checked at query time; mismatches fail loudly.
Without an ANN index
Today, vector search is exact: every candidate row gets a distance calculation. With time-range and metadata filters, this is usually fast enough — pruning eliminates most extents before any vector math runs.
For tables with millions of vectors and no good prefilter, exact search becomes the bottleneck. ANN indices (HNSW) land in a later milestone; the trait surface (SegmentFormat::vector_index) is already in place.
Loading vectors
Two paths to populate a vector column:
- Compute outside, ingest as values. Generate embeddings with your model of choice; send them as Arrow
FixedSizeList<Float32, 384>over the REST or OTLP path. - Compute inside, on ingest. Configure an embedding backend (
fastembed,ollama, OpenAI-compatible, Gemini) on the table; the ingest path runsbodythrough the backend and writes the result toembeddingautomatically.
Where to go next
- The agent endpoint, which uses vectors for schema RAG: The agent loop.
- KQL syntax for
dynamic: Query. - Schema evolution rules: Schema model.