Question 1

What LLM platforms and technologies does VSHN work with?

Accepted Answer

VSHN deploys and operates LLM workloads on APPUiO (our managed Kubernetes platform), Red Hat OpenShift, enterprise private cloud infrastructure, and sovereign cloud partners. We work with [vLLM](https://www.vllm.ch/) for high-throughput inference, [LiteLLM](https://www.litellm.ch/) for multi-provider gateways, and [llm-d](https://www.llm-d.ch/) for distributed serving. All platforms run on Swiss or European data centers with up to 99.99% uptime SLA.

Question 2

Which cloud providers does VSHN operate on?

Accepted Answer

VSHN operates on multiple Swiss cloud providers including Exoscale and Cloudscale, as well as European sovereign cloud partners. For organizations that need GPU-accelerated workloads, we work with providers offering GPU instances in Swiss data centers on public and private cloud. All infrastructure is managed under a single SLA with 24/7 support from our operations team.

Question 3

How does VSHN scope and quote LLMOps consulting engagements?

Accepted Answer

Every engagement starts with a free architecture consultation where we assess your model serving needs, GPU requirements, and compliance constraints. VSHN then delivers a written scope document with a fixed-price or time-and-materials quote in CHF. Typical engagements cover cluster design, inference stack deployment, observability setup with Prometheus and Grafana, and backup automation for model artefacts and configuration data. There is no commitment at the scoping stage.

Question 4

How does VSHN handle GPU scheduling and scaling?

Accepted Answer

We configure Kubernetes GPU scheduling with NVIDIA device plugins, resource quotas, and pod priority classes so your inference workloads get the GPU time they need. Horizontal pod autoscaling adjusts replica counts based on request queue depth or latency targets. For batch training jobs, we set up preemptible scheduling to optimise cost without blocking interactive inference.

Question 5

Can VSHN set up vector databases for RAG pipelines?

Accepted Answer

Yes. VSHN operates PostgreSQL with the pgvector extension as a fully operated database through our Application Catalog. You get automated daily backups, point-in-time recovery, high-availability replicas, and up to 99.99% SLA as all our managed database services. We also support dedicated search indices for hybrid retrieval workflows.

Question 6

Why should we run our own inference instead of using cloud AI APIs?

Accepted Answer

Every API call to a hosted LLM provider incurs per-token costs that compound as agentic workflows scale. Agents calling agents, each burning tokens. Running your own inference on Kubernetes with vLLM or llm-d converts that variable cost into fixed infrastructure spend. You also gain full control over model selection, latency, data residency, and uptime. VSHN operates the GPU infrastructure 24/7 so you get the economics of ownership without the operational burden. For mixed workloads, LiteLLM gateways let you route some requests to self-hosted models and others to commercial APIs, optimizing cost per query.

Question 7

How does VSHN ensure data sovereignty for LLM workloads?

Accepted Answer

All infrastructure runs in Swiss data centers operated by Swiss or European sovereign cloud providers. Training data, model weights, vector embeddings, and inference logs never leave the chosen jurisdiction. All operational access is from Switzerland-based engineers, and we provide audit trails for compliance reporting. See our [sovereignty assessment](/sovereignty/) for details on how VSHN scores against the EU Cloud Sovereignty Framework.

Question 8

Does VSHN support open-source and commercial LLM models?

Accepted Answer

We support both. For open-source models such as Llama, Mistral, Falcon, and Apertus (the Swiss AI foundation model, Apache 2.0 licensed and EU AI Act compliant), we deploy Kubernetes-native serving infrastructure with [vLLM](https://www.vllm.ch/) or [llm-d](https://www.llm-d.ch/). For commercial APIs like Anthropic Claude or OpenAI, we set up [LiteLLM](https://www.litellm.ch/) gateways that route requests through Swiss infrastructure while enforcing budget controls and audit logging. VSHN handles the infrastructure layer so your data science team can focus on model quality.

Question 9

What monitoring and observability does VSHN provide?

Accepted Answer

VSHN integrates Prometheus and Grafana into every managed platform, with custom dashboards for LLM-specific metrics: inference latency (p50, p95, p99), tokens per second, GPU utilisation, queue depth, and estimated cost per request. Alerting rules notify your team and our 24/7 operations center when metrics breach thresholds, so performance issues are caught before they affect users.

Question 10

Can VSHN help if we don't have our own GPU infrastructure?

Accepted Answer

Yes. Many of our LLMOps engagements start with organisations that have a model or use case but no GPU infrastructure. VSHN provisions GPU-equipped Kubernetes clusters on Swiss cloud providers, deploys your inference stack with vLLM or llm-d, and operates it 24/7. You bring your model requirements; we handle the infrastructure from provisioning through day-two operations. For smaller workloads, shared GPU options let you start without committing to dedicated hardware. Contact us for a free architecture consultation.

Question 11

How do I get started with VSHN's LLMOps consulting?

Accepted Answer

Contact us through the form below for a free initial consultation. VSHN has operated production Kubernetes platforms since 2014 for banks, insurers, and SaaS companies across Switzerland - LLM operations builds on that same infrastructure and team. We assess your workloads, platform requirements, and compliance constraints, then propose an architecture with a clear scope and pricing. Most customers go from consultation to a running production platform in four to six weeks, with ongoing 24/7 operations support.

Question 12

Can consulting firms partner with VSHN for client LLM projects?

Accepted Answer

Yes. Consulting firms and AI agencies regularly partner with VSHN to deliver LLM infrastructure for their clients. You bring the model selection and application design; VSHN provisions and operates the GPU infrastructure, inference stack, and observability on Swiss cloud. Each client engagement runs on isolated infrastructure with dedicated resources. This lets your team focus on the AI solution while VSHN handles the operational complexity of running LLM workloads in production.

Run AI models in production. On your infrastructure, on your terms.

Architecture Review and Stack Selection

Inference Deployment on Kubernetes

Multi-Model Routing and API Management

Sovereign Cloud and Swiss Data Residency

Own Your Inference, Control AI Economics

Vector Database and RAG Pipelines

vLLM: High-Throughput Inference

LiteLLM: Unified AI Gateway

llm-d: Distributed Inference

How the Technologies Work Together

LLMOps Consulting FAQ

Book an LLMOps consultation