RAG & Knowledge Ingestion

Srasta turns controlled knowledge into governed context.

Enterprise retrieval is not just “upload documents and search.” Srasta ingests approved sources into workspace-scoped collections, builds dense and sparse indexes, retrieves context through policy-aware APIs, and keeps document, vector, prompt, response, and audit data inside the customer perimeter.

Review data flow Review security model

Ingestion Path

Knowledge enters Srasta through an explicit pipeline.

The ingestion scripts read files on disk, filter eligible content, chunk text, generate embeddings, fit sparse retrieval parameters, tag workspace scope, create indexes, and insert batches into Milvus.

What Ingest Does

Documents become searchable evidence, not uncontrolled prompt stuffing.

Srasta’s default ingestion path is designed for repeatable enterprise operation. Operators can fully refresh a collection or run incremental updates based on git changes, while keeping workspace tagging and collection naming explicit.

File filtering

Eligible source types include code, markdown, YAML, JSON, Terraform, shell, text, Dart, TypeScript, TOML, HCL, and Nomad files.

Safe exclusions

Common generated or dependency directories are skipped, and credential-named files are excluded from ingest.

Chunking

Text is chunked by configurable size and overlap, with stable chunk identifiers derived from file path and chunk index.

Embedding

Dense vectors are generated engine-native (BGE-M3 on vLLM or host-native MLX) over the OpenAI /v1/embeddings contract in the RAG API runtime.

Sparse search

BM25 sparse vectors are fit for each corpus and saved for query-time use, preserving keyword precision for codes, IDs, and names.

Indexing

Milvus stores dense vectors, sparse vectors, source path, chunk ID, document text, and workspace scope with dense, sparse, and scope indexes.

Retrieval Path

Hybrid retrieval balances meaning and exactness.

Dense semantic retrieval

Finds conceptually similar content even when words differ.
Works well for policy, architecture, and natural-language questions.
Backed by vector embeddings stored in Milvus.

BM25 sparse retrieval

Preserves exact matches for identifiers, filenames, services, and config keys.
Reduces failures where semantic similarity misses literal tokens.
Combined with dense search through weighted ranking.

Srasta’s RAG API uses a hybrid search strategy, with default top-k controls and support for optional reranking and context compression before model inference.

Scoping Controls

Retrieval scope is selected deliberately.

A single deployment can carry multiple collections. Operators and clients can scope retrieval by header, model prefix, path-based URL, explicit environment configuration, or release/namespace-aware deployment strategy.

Header scoping

Programmatic clients can use request headers to choose one or more collections.

Model prefix

IDE and OpenAI-compatible clients can prefix the model name with a collection.

Path routing

Gateway paths can map URLs to collection-scoped RAG access.

Workspace-scoped collections

Production values support release or namespace-scoped Milvus collections for workspace isolation within the single-tenant install.

Governance

Knowledge access follows the same control model as inference.

AuthenticationIngest can validate tokens against Srasta API before writing vectors.

AuthorizationOIDC and role enforcement can require workspace scope match and role permissions.

Policy scanningCompliance profiles can scan input and output for sensitive patterns.

Audit trailRequests are recorded with actor, path, outcome, persona, session, workspace scope, roles, and timing.

ObservabilityPrometheus metrics and Langfuse traces help operators understand retrieval and generation behavior.

Customer perimeterDocuments, embeddings, responses, audit records, and backups stay inside customer-controlled infrastructure by default.

Operations

Operators need both full refresh and incremental paths.

Full replacement is useful for clean baselines. Incremental ingest is useful when a git-backed corpus changes frequently and operators want to delete stale chunks and insert only changed content.

Full refresh

Drops and recreates a collection, rebuilds dense and sparse indexes, and inserts a fresh corpus snapshot.

Incremental ingest

Uses git diff state to identify added, modified, deleted, and renamed files, then updates affected chunks.

Auto-discovery

When explicit collection lists are not set, the RAG API can discover available Milvus collections at query time.

Day-2 tuning

Operators can tune top-k, embedding host, model route, compression threshold, reranker, and policy profiles.

FAQ

RAG and Knowledge Ingestion FAQ

Where are ingested documents and vectors stored?

Ingested source files remain in the customer-controlled environment, while chunks and vectors are stored in Milvus inside the deployment. Optional object storage such as MinIO is used for platform storage and session persistence when configured.

How does Srasta retrieve knowledge?

Srasta combines dense semantic search with BM25 sparse keyword search in Milvus, then ranks results with a weighted hybrid strategy before injecting scoped context into the inference request.

How is access scoped?

Each Srasta install is single-tenant. Within it, the RAG API enforces role and workspace scoping with collection-level boundaries when OIDC authorization is enabled; release or namespace-scoped Milvus collections separate workspaces. The tenant fields remain as defensive scaffolding, not a multi-tenant boundary.

Can operators choose which knowledge source to search?

Yes. Collections can be scoped by request header, model prefix, path-based routing, or deployment configuration. If no explicit list is set, Srasta can auto-discover available Milvus collections.

Use retrieval as governed context, then measure the outcome.

Knowledge ingestion gives Srasta a controlled retrieval base. Measure Loop shows which documents, prompts, routes, policies, and workflows actually improve enterprise decisions.

Read Measure Loop Back to docs