model layer
connecting…
auto-refresh
Try it
stream
raw
Send
thinking
result appears here
Machines
free resources per host — to see what can fit a new model
Models
+ Register backend
+ Deploy local
Register an OpenAI-compatible backend
name
type
external
llama
base_url
model_id
API key
API key env var
mode
sync
queued
concurrency
Register
Cancel
Deploy a local llama-server (launches + self-registers)
name
model_path
mode
queued
sync
concurrency
ctx
Deploy
Cancel
Deploy blocks while the model loads (can take 10–30s).
Edit connection —
base_url
model_id
API key
API key env var
mode
sync
queued
concurrency
tags
Load/unload control (optional) — set these to drive a native server via its node agent
agent_url
model_path
port
ctx
Save
Cancel
name
type
mode
conc
status
latency
last seen
endpoint
actions
Queue
job
model
status
enqueued
finished
Recent calls
time
model
backend
path
status
queued
latency
tokens (p/c)