Deployment Topologies

Srasta maps intent and hardware to the right operating shape.

Srasta topology is not a one-size-fits-all compose file. The installer asks what kind of environment the operator wants, probes available hosts, recommends a feasible profile, explains trade-offs, places services, and verifies the runtime before handoff.

Recommendation Flow

The installer makes topology decisions explainable.

The operator declares intent, the installer probes hardware, and Srasta recommends the highest feasible profile for that intent. When hardware does not satisfy the desired shape, the operator sees limitations and upgrade paths instead of a vague failure.

Srasta topology recommendation flow Topology recommendation path intent · inventory · fit · placement · verification Operatorintent Hardwareprobe Profilefit check Serviceplacement Deploy +verify Trade-offscapabilities · limits Access modeLAN · nginx · tunnel Operator handoffAdmin · health · next steps

Profiles

Topology profiles grow with operating maturity.

01

Trial

Single host, CPU inference through Ollama, full governance and RAG surface, no HA, no backup tier.

Best for evaluation and first proof.
02

Compact Single-Host

One GPU host runs control plane, stateful services, and bundled vLLM inference on the same machine.

Best for small teams with one strong box.
03

Compact 1+1

CPU control plane plus dedicated GPU worker. Inference is separated from stateful and app workloads.

Common regulated production starting point.
04

Production HA

Three or more hosts with role separation, app-tier HA, isolated inference, observability, auth, and backup agent.

Best for production availability.
05

Production HA + DR

Five or more hosts, full role separation, dedicated backup target, stronger RTO/RPO, and single-rack failure posture.

Best for stricter recovery requirements.
06

External Inference Variant

Any profile can reduce local GPU dependency when the operator chooses external vLLM, NIM, hosted API, or another provider.

Best when inference is already standardized elsewhere.

Deployment Backends

Same platform story, different execution substrate.

Srasta supports simple Compose-based starts and platform-team Kubernetes deployments. The decision is usually less about product capability and more about the customer’s operating model.

Single-node Compose

Fastest path for trial, prototype, demo, and one-team evaluation. Everything runs on one Linux host.

Guided multi-host Compose

Installer reaches worker nodes by SSH, probes capability, places services, syncs config, and verifies each host.

Kubernetes / Helm

Uses existing cluster primitives: namespace, storage class, ingress, GPU nodes, RBAC, probes, services, and Helm values.

Cloud Kubernetes

Provider-oriented path for customers using managed clusters, GPU node pools, cloud load balancers, and cloud storage patterns.

Placement

Hardware capability drives service placement.

Srasta’s placement logic separates control-plane, app, stateful, observability, backup, and inference roles. GPU hosts should serve inference; CPU hosts should absorb stateful and app workloads when available.

Control plane

Installer, plan/run state, topology, placement, access URLs, and operator lifecycle coordination.

App tier

Srasta API, Admin, RAG API, Tool Gateway, service discovery, gateway, and related app workloads.

Stateful tier

Postgres, Milvus, MinIO, Valkey, audit volumes, backup metadata, and recovery state.

Inference tier

vLLM, Ollama fallback, LiteLLM routing, GPU model placement, and parser-aware runtime configuration.

Observability

Langfuse, metrics, traces, audit review, and operator visibility surfaces.

Backup / DR

Backup agents, isolated backup targets, restore plans, recovery readiness, and DR validation.

Access Modes

Topology also includes how operators and users reach the runtime.

Private access

  • LAN-only installs use private IP access and require no public ingress.
  • Multi-host installs need reliable inter-node network reachability.
  • WireGuard can provide a managed overlay when needed.

Public access

  • Cloudflare Tunnel avoids public inbound firewall changes.
  • Direct nginx + Let's Encrypt works with public DNS and open 80/443.
  • Kubernetes can use cluster ingress and TLS controls.

Trade-Offs

The right profile depends on risk, not just scale.

A small team with regulated data may need stronger separation than a larger team running a low-risk internal assistant. Srasta keeps the trade-off visible: speed of install, local inference quality, availability, recovery posture, and operational complexity.

Evaluation speed

Trial and compact single-host minimize setup friction and help validate the first governed workflow quickly.

Inference isolation

Compact 1+1 moves GPU inference off the control plane and reduces IO contention.

Availability

Production HA adds role separation and app-tier resilience, but stateful tier recovery still depends on backup posture.

Recovery

HA + DR adds a dedicated backup target and stronger recovery goals at the cost of more hosts and operating discipline.

FAQ

Deployment Topologies FAQ

What topology should I start with?

Start with the smallest topology that can prove the workflow safely. Trial and compact single-host are good for evaluation. Compact 1+1 is the common production starting point when local GPU inference is required. Production HA and HA + DR are for stricter availability and recovery requirements.

Does Srasta require a GPU?

No. CPU-only trial inference is supported through Ollama, and external inference can remove the local GPU requirement. Local production inference usually needs at least one NVIDIA GPU host.

Can Srasta mix CPU and GPU architectures?

Yes. The control plane can run on one architecture while GPU workers run another, such as amd64 control plane plus arm64 NVIDIA GB10 workers, because Srasta images are published as multi-architecture manifests.

When should Kubernetes be used?

Use Kubernetes when the customer already operates clusters and wants Helm-managed workloads, storage classes, ingress, RBAC, and platform-team lifecycle controls. Single-node and guided multi-host Compose remain valid for simpler installs.

Next

Choose the operating shape before sizing hardware.

Once topology is clear, Srasta can guide model routing, GPU choice, access mode, verification, and day-2 operations without turning deployment into guesswork.

Review install