The agent loop
/v1/agent/ask is kyma's natural-language surface. A user (or another agent) sends a question; kyma figures out which table to look at, runs the query, and streams the answer back as Server-Sent Events.
The shape is intentionally narrow. The agent doesn't replace KQL or SQL — it turns "show me the top errors" into the same KQL you'd have written yourself, runs it, and tells you what it found.
Request
curl -N -X POST http://localhost:8080/v1/agent/ask \
-H 'Content-Type: application/json' \
-d '{
"question": "which service errored most in the last hour?",
"database": "default"
}'-N keeps curl from buffering — important for SSE.
Response
Server-Sent Events. Each event is a JSON envelope:
event: run_started
data: {"run_id": "01HZ...", "model": "gemma4:latest"}
event: thinking_delta
data: {"text": "Looking at the schema. otel_logs has severity_text and..."}
event: tool_call
data: {"tool": "run_sql", "args": {"query": "SELECT service_name, COUNT(*) ..."}, "call_index": 0}
event: tool_result
data: {"rows": [{"service_name": "payments-svc", "n": 412}, ...]}
event: answer_delta
data: {"text": "Over the last hour, payments-svc had the most errors (412), "}
event: answer_delta
data: {"text": "followed by auth-svc (198) and search-svc (89)."}
event: answer_final
data: {"text": "Over the last hour, payments-svc had the most errors..."}
event: run_finished
data: {"run_id": "01HZ...", "elapsed_ms": 1840, "usage": {"input_tokens": 940, "output_tokens": 320}, "tool_calls": 2}The answer_delta events let you render answers token-by-token while they generate. tool_call and tool_result give you full transparency into which queries were run — useful for debugging a wrong answer.
What the agent has access to
Eight built-in tools, plus a schema RAG layer:
list_databasesreturns every database the catalog knows about (kyma-native plus any registered external sources).describe_tablereturns the schema of a specific table — columns, types, recent sample rows, presentdynamicpaths.run_sqlexecutes a SQL query and returns the result rows.run_kqlexecutes a KQL pipeline against the same engine — same Arrow result, just a different surface.sample_rowspulls a small random sample for exploratory work.explore_schemareturns a one-shot "context graph" view of a database: all tables, their columns, and the foreign-key-shaped edges between them. Lets the agent reason about an unfamiliar schema in one tool call.find_references_totraverses the catalog for tables that reference a given entity — e.g., "every table that has auser_idcolumn."graph_traversewalks the kyma graph layer. Wraps the KQLgraph-traverseoperator for multi-hop entity-relationship queries.
Schema RAG: every table in the catalog is embedded into the schema_embeddings pgvector table at create time. When the agent needs to find a table, it does a vector search over the embeddings rather than reading the entire catalog. This keeps prompt size bounded even on a catalog with thousands of tables.
Why SSE and not WebSockets
SSE is one-way (server → client) over HTTP/1.1. That fits the request shape perfectly: the client asks one question, the server streams many events. No reconnection logic, no framing protocol, works through any HTTP proxy that handles streaming responses.
If you need bidirectional tool use — agent asks the client for permission to do something, etc. — that lands as a separate endpoint in a later milestone.
Run history
Every run is persisted to the agent_runs catalog table:
SELECT run_id, question, model_id, status, started_at, finished_at,
usage_json, trace_json
FROM agent_runs
ORDER BY started_at DESC
LIMIT 10Replay a specific run with the full event trace via:
curl http://localhost:8080/v1/agent/runs/01HZABCDE...Models
The agent's LLM is Ollama, configured at startup via two env vars:
KYMA_AGENT_OLLAMA_HOST— Ollama daemon URL. Defaults to the host's local Ollama installation.KYMA_AGENT_MODEL— model name. Defaults togemma4:latest.
The choice of Ollama is deliberate: the agent runs against your infrastructure, with no telemetry leaving the cluster, and you control which weights it sees. Other LLM backends (Anthropic Claude, OpenAI, Gemini) are tracked as a follow-up; the underlying ADK abstraction already supports them.
Separately, the embedding backend for schema RAG configures independently — see Dynamic and vectors. Out of the box: fastembed (default, CPU), ollama, OpenAI-compatible, or Gemini. Embeddings power the schema search; the LLM consumes the result.
Limits
The agent is read-only by design. The exposed tools never write — no ingest_rows, no create_table. Write tools are tracked under the roadmap section of the spec; the constraint is intentional, since giving an LLM the ability to mutate production data without an explicit approval step is a bad default.
Where to go next
- Vectors and embedding backends: Dynamic and vectors.
- Calling the endpoint from your code: Query.
- The schema the agent reasons over: Schema model.