🖥️ cloudworkstation¶
Interactive research workstations in the cloud
Visit cloudworkstation.io → | GitHub →
Overview¶
cloudworkstation provides pre-configured cloud workstations for data science, machine learning, and computational research. Launch GPU-powered environments in minutes with common tools, frameworks, and libraries pre-installed.
Key Features¶
Pre-Configured Environments¶
- Machine Learning - PyTorch, TensorFlow, JAX, scikit-learn
- Data Science - pandas, NumPy, Matplotlib, Seaborn
- Deep Learning - CUDA, cuDNN, GPU-optimized frameworks
- Development Tools - VSCode, Jupyter, terminals
GPU Support¶
- NVIDIA GPUs - T4, V100, A10G, A100 instances
- CUDA Ready - Pre-installed CUDA toolkit and drivers
- Multi-GPU - Support for multi-GPU training
- Cost Options - On-demand or spot instances
Workspace Management¶
- Quick Launch - Start workstations in under 2 minutes
- Persistent Storage - EBS volumes for data persistence
- Snapshots - Save workspace state for later resumption
- Auto-Shutdown - Automatic shutdown to control costs
Collaboration¶
- Shared Workspaces - Collaborate with team members
- Access Control - IAM-based permissions
- Session Sharing - Share terminal sessions
- Data Sharing - Shared S3 buckets for datasets
Use Cases¶
Interactive Machine Learning¶
Develop and train ML models with GPU acceleration without managing infrastructure.
Data Exploration¶
Analyze large datasets interactively with pre-installed data science tools.
Model Prototyping¶
Quickly prototype deep learning models with access to latest frameworks.
Teaching and Workshops¶
Provide students with consistent, pre-configured environments for courses.
Architecture¶
graph TB
User[Researcher] -->|Launch| CW[cloudworkspaces]
CW -->|Create| EC2[GPU EC2 Instance]
EC2 -->|Mount| EBS[Persistent EBS Volume]
EC2 -->|Access| S3[S3 Data Buckets]
EC2 -->|VSCode| Browser[Web Browser]
EC2 -->|Jupyter| Browser
EC2 -->|SSH| Terminal[Local Terminal]
style CW fill:#fff4e1
style EC2 fill:#e1f5ff Available Workstation Types¶
ML Development (ml.*)¶
- ml.g4dn.xlarge - 1x NVIDIA T4, 4 vCPU, 16 GB RAM
- ml.g5.xlarge - 1x NVIDIA A10G, 4 vCPU, 16 GB RAM
- ml.p3.2xlarge - 1x NVIDIA V100, 8 vCPU, 61 GB RAM
Deep Learning Training (dl.*)¶
- dl.g5.2xlarge - 1x A10G, 8 vCPU, 32 GB RAM
- dl.p3.8xlarge - 4x V100, 32 vCPU, 244 GB RAM
- dl.p4d.24xlarge - 8x A100, 96 vCPU, 1152 GB RAM
Data Science (ds.*)¶
- ds.m5.2xlarge - 8 vCPU, 32 GB RAM (CPU-only)
- ds.r5.4xlarge - 16 vCPU, 128 GB RAM (memory-optimized)
Development (dev.*)¶
- dev.t3.large - 2 vCPU, 8 GB RAM (cost-optimized)
Getting Started¶
Launch a Workstation¶
# Install cloudworkstation CLI
brew install cloudworkstation
# List available workstation types
cloudworkstation list-types
# Launch ML workstation
cloudworkstation launch \
--type ml.g4dn.xlarge \
--name my-ml-workspace \
--storage 100
# Connect via VSCode
cloudworkstation connect my-ml-workspace --vscode
# Or via SSH
cloudworkstation ssh my-ml-workspace
Manage Workstations¶
# List running workstations
cloudworkstation list
# Stop workstation (preserves data)
cloudworkstation stop my-ml-workspace
# Start stopped workstation
cloudworkstation start my-ml-workspace
# Create snapshot
cloudworkstation snapshot my-ml-workspace --name experiment-1
# Delete workstation
cloudworkstation delete my-ml-workspace
Pre-Installed Software¶
Python Ecosystem¶
- Python 3.11, 3.10, 3.9 (via conda)
- PyTorch 2.1 (GPU)
- TensorFlow 2.14 (GPU)
- JAX (GPU)
- scikit-learn, XGBoost, LightGBM
Development Tools¶
- VSCode Server
- Jupyter Lab
- Git, GitHub CLI
- Docker
Data Tools¶
- pandas, Polars
- DuckDB
- Apache Arrow
- Parquet tools
Utilities¶
- tmux, screen
- htop, nvtop
- AWS CLI
- S3 mount utilities
Integration with ResearchComputing¶
Data Sources¶
- S3 - Direct access to S3 buckets for datasets
- cargoship - Retrieve archived data for analysis
Workflow¶
- Launch cloudworkstation for interactive development
- Prototype and test on small datasets
- Scale to atom for production HPC runs
- Archive results with cargoship
Account Management¶
- Provision accounts via petri
- Cost tracking and budget alerts
Documentation¶
Technology Stack¶
- Language: Go (CLI), Python (utilities)
- Infrastructure: EC2, EBS, CloudFormation
- Networking: VPC, Security Groups
- Storage: EBS, S3
- Access: Systems Manager Session Manager, SSH
Project Status¶
Current Version: v1.2.0 Status: Production Ready License: Apache 2.0
Contributing¶
Contributions welcome! See the contribution guide.
Support¶
- Documentation: cloudworkstation.io
- Issues: GitHub Issues
- Discussions: GitHub Discussions