CLI
CLI Deployments
Ship code from the command line. Python files, Docker images, LLM servers.
Commands
| Command | Description |
|---|---|
deploy | Deploy a Python file or Docker image |
deploy ls | List all deployments |
deploy status | Get detailed deployment status |
deploy logs | Stream deployment logs |
deploy scale | Scale deployment replicas |
deploy delete | Delete a deployment |
deploy vllm | Deploy vLLM inference server |
deploy sglang | Deploy SGLang inference server |
Deployment Sources
The deploy command accepts:
- Python files (
.py): Deployed with a configurable base image - Docker images: Deployed directly
Health Checks
Configure liveness, readiness, and startup probes to ensure your application runs correctly. The CLI supports HTTP path probes with configurable timing.
Key concepts:
- Liveness probe: Restarts container if unhealthy
- Readiness probe: Controls traffic routing
- Startup probe: For slow-starting applications
Use --health-path as shorthand for all probes, or configure each individually.
LLM Inference Servers
vLLM
Deploy OpenAI-compatible inference servers with vLLM. Supports tensor parallelism, quantization (AWQ, GPTQ, FP8), and custom model configurations.
SGLang
Alternative inference server with similar capabilities and SGLang-specific optimizations.
Both commands auto-detect GPU requirements based on model size when not specified.
Troubleshooting
Deployment stuck in Pending
- Check status with
--show-phasesfor details - Check logs for errors
- Verify GPU availability with
ls --compute citadel
Container crash loop
Common causes:
- Application error on startup
- Missing environment variables
- Port binding issues
Check logs for specific errors.
Health check failures
- Ensure health endpoint responds with 200 OK
- Increase initial delay for slow-starting apps
- Verify port matches your application
Next Steps
- LLM Inference: vLLM and SGLang deployment guides
- Account Management: Billing and API tokens
- Python SDK: Programmatic deployments