🐋 orca¶
Orchestration for Research Cloud Access
Overview¶
orca enables Kubernetes-to-AWS burst computing, seamlessly extending on-premises clusters to the cloud. Scale workloads from local infrastructure to AWS when additional capacity is needed without re-architecting applications or workflows.
ORCA = **O**rchestration for **R**esearch **C**loud **A**ccess
Key Features¶
Kubernetes Integration¶
- Virtual Kubelet - Extends K8s to AWS seamlessly
- Native Workloads - Run existing K8s workloads unchanged
- Pod Scheduling - Transparent scheduling to cloud resources
- API Compatibility - Standard Kubernetes API
Burst Computing¶
- On-Demand Scaling - Scale to cloud when on-prem full
- Cost-Efficient - Only pay for what you use, when you need it
- Automatic Fallback - Return to on-prem when capacity available
- Hybrid Workloads - Run some pods on-prem, some in AWS
Cloud Backend¶
- AWS Batch - Managed batch computing
- Spot Instances - Cost savings with spot
- Multiple Architectures - x86, ARM Graviton support
- EFA Networking - Low-latency MPI for HPC
Workload Support¶
- Batch Jobs - Scientific computing, simulations
- MPI Workloads - Multi-node parallel jobs
- GPU Workloads - ML training, rendering
- Containerized - Any container can burst to AWS
Use Cases¶
HPC Center Burst Capacity¶
Extend on-premises Slurm or Kubernetes cluster to AWS during peak demand periods.
Research Lab Scaling¶
Small on-premises cluster with ability to scale to hundreds of nodes in AWS for large experiments.
Cost Optimization¶
Maintain smaller on-premises infrastructure, burst to cloud for occasional large workloads.
Hybrid Workflows¶
Keep sensitive data on-premises while running compute-intensive tasks in AWS.
Architecture¶
graph TB
Users[Researchers] -->|Submit Jobs| K8s[On-Prem Kubernetes]
K8s -->|Local Capacity| OnPrem[On-Prem Nodes]
K8s -->|Need More| VK[Virtual Kubelet<br/>ORCA]
VK -->|Schedule| Batch[AWS Batch]
Batch -->|Launch| EC2[EC2 Instances]
OnPrem -->|Access| NFS[On-Prem Storage]
EC2 -->|Mount| S3[S3 Storage]
S3 -.->|Sync| NFS
style VK fill:#e1e1ff
style Batch fill:#e1f5ff
style EC2 fill:#fff4e1 How It Works¶
1. Install ORCA Virtual Kubelet¶
ORCA runs as a virtual kubelet provider in your Kubernetes cluster.
# Deploy ORCA to your cluster
kubectl apply -f https://orcapod.io/deploy/latest.yaml
# Configure AWS credentials
kubectl create secret generic orca-aws \
--from-literal=access-key=$AWS_ACCESS_KEY_ID \
--from-literal=secret-key=$AWS_SECRET_ACCESS_KEY
2. Label Workloads for Burst¶
Add labels to pods that can burst to AWS.
apiVersion: batch/v1
kind: Job
metadata:
name: my-simulation
spec:
template:
metadata:
labels:
orca.researchcomputing.io/burst: "enabled"
spec:
containers:
- name: simulation
image: my-simulation:latest
resources:
requests:
cpu: "16"
memory: "64Gi"
3. Automatic Scheduling¶
When on-premises capacity is full, ORCA automatically schedules pods to AWS.
# Submit workload
kubectl apply -f simulation-job.yaml
# Check scheduling
kubectl get pods -o wide
# NAME NODE
# simulation-1 on-prem-node-1
# simulation-2 on-prem-node-2
# simulation-3 orca-aws-node
# simulation-4 orca-aws-node
Configuration¶
Burst Policies¶
apiVersion: orca.researchcomputing.io/v1
kind: BurstPolicy
metadata:
name: default-burst
spec:
# When to burst
trigger:
onPremCapacity: 80% # Burst when on-prem 80% full
queueWaitTime: 5m # Or after 5 min in queue
# Where to burst
target:
provider: aws
region: us-east-1
instanceTypes:
- c7a.16xlarge
- c7i.16xlarge
- c8g.16xlarge
spot: true # Use spot instances
# Cost controls
maxCost: 1000 # Max $1000/day in AWS
maxNodes: 50 # Max 50 burst nodes
Storage Sync¶
apiVersion: orca.researchcomputing.io/v1
kind: StorageSync
metadata:
name: research-data
spec:
source:
type: nfs
path: /mnt/research
destination:
type: s3
bucket: research-data-sync
sync:
direction: bidirectional
schedule: "*/15 * * * *" # Every 15 minutes
Integration with ResearchComputing¶
Hybrid Workflows¶
# 1. Development on-premises (lens or cloudworkspaces)
# 2. Small tests on on-prem K8s
# 3. Large production runs burst to AWS via ORCA
# 4. Results stored in S3 and synced back
# 5. Archive with cargoship
ORCA + ATOM¶
ORCA can target ATOM workloads for architecture-optimized HPC:
Account Management¶
- Provision AWS accounts via petri
- Cost tracking per project
- Budget alerts for burst spending
Getting Started¶
Prerequisites¶
- Kubernetes cluster (1.24+)
- AWS account
- kubectl configured
Installation¶
# Install ORCA CLI
brew install orca
# Or download from releases
curl -LO https://github.com/scttfrdmn/orca/releases/latest/download/orca-darwin-amd64
chmod +x orca-darwin-amd64
sudo mv orca-darwin-amd64 /usr/local/bin/orca
# Deploy to cluster
orca install \
--aws-region us-east-1 \
--aws-account-id 123456789012
# Verify installation
kubectl get nodes
# NAME STATUS ROLE
# on-prem-node-1 Ready worker
# on-prem-node-2 Ready worker
# orca-virtual-node Ready virtual
Submit Burst Job¶
# Create burst-enabled job
cat <<EOF | kubectl apply -f -
apiVersion: batch/v1
kind: Job
metadata:
name: molecular-dynamics
spec:
parallelism: 16
template:
metadata:
labels:
orca.io/burst: "enabled"
orca.io/priority: "high"
spec:
containers:
- name: md-simulation
image: my-org/md-sim:latest
resources:
requests:
cpu: "4"
memory: "16Gi"
restartPolicy: Never
EOF
# Monitor job
orca job status molecular-dynamics
# Check costs
orca cost summary --job molecular-dynamics
Cost Management¶
Cost Estimation¶
# Estimate burst costs
orca estimate \
--cpus 256 \
--memory 1024 \
--duration 4h \
--spot
# Output:
# Instance type: c7a.16xlarge (spot)
# Instances needed: 4
# Duration: 4 hours
# Estimated cost: $12.80
Budget Controls¶
apiVersion: orca.researchcomputing.io/v1
kind: Budget
metadata:
name: lab-budget
spec:
limit: 5000 # $5000/month
alerts:
- threshold: 80%
notify: lab-admin@university.edu
- threshold: 100%
action: pause-burst
Documentation¶
Technology Stack¶
- Language: Go
- Orchestration: Virtual Kubelet
- Cloud Provider: AWS (Batch, ECS, EC2)
- Storage Sync: S3 sync, FSx for Lustre
- Monitoring: Prometheus, CloudWatch
Project Status¶
Current Version: v0.6.0 Status: Beta License: Apache 2.0
Contributing¶
Contributions welcome! See the contribution guide.
Support¶
- Documentation: orcapod.io
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Slack: Join #orca