CLI Deployments

Ship code from the command line. Python files, Docker images, LLM servers.

Commands

Command	Description
`deploy`	Deploy a Python file or Docker image
`deploy ls`	List all deployments
`deploy status`	Get detailed deployment status
`deploy logs`	Stream deployment logs
`deploy scale`	Scale deployment replicas
`deploy delete`	Delete a deployment
`deploy vllm`	Deploy vLLM inference server
`deploy sglang`	Deploy SGLang inference server

Deployment Sources

The deploy command accepts:

Python files (.py): Deployed with a configurable base image
Docker images: Deployed directly

Health Checks

Configure liveness, readiness, and startup probes to ensure your application runs correctly. The CLI supports HTTP path probes with configurable timing.

Key concepts:

Liveness probe: Restarts container if unhealthy
Readiness probe: Controls traffic routing
Startup probe: For slow-starting applications

Use --health-path as shorthand for all probes, or configure each individually.

LLM Inference Servers

vLLM

Deploy OpenAI-compatible inference servers with vLLM. Supports tensor parallelism, quantization (AWQ, GPTQ, FP8), and custom model configurations.

SGLang

Alternative inference server with similar capabilities and SGLang-specific optimizations.

Both commands auto-detect GPU requirements based on model size when not specified.

Troubleshooting

Deployment stuck in Pending

Check status with --show-phases for details
Check logs for errors
Verify GPU availability with ls --compute citadel

Container crash loop

Common causes:

Application error on startup
Missing environment variables
Port binding issues

Check logs for specific errors.

Health check failures

Ensure health endpoint responds with 200 OK
Increase initial delay for slow-starting apps
Verify port matches your application

Next Steps

LLM Inference: vLLM and SGLang deployment guides
Account Management: Billing and API tokens
Python SDK: Programmatic deployments

On this page