Moving Gen AI Workloads from Hyperscalers to Nebius

A step-by-step guide for migrating production gen AI workloads from AWS, GCP, or Azure to Nebius, covering planning, execution, optimization, and common challenges.

If you’re running gen AI workloads on AWS, GCP, or Azure, you’re likely experiencing the GPU availability crunch: multi-month waitlists for H100s, capacity reservations that require long-term commitments, and pricing that can reach $12+ per GPU-hour. Nebius offers immediate access to NVIDIA’s latest GPUs (H100, H200, B200, GB200) at $2.95/hour per GPU, with managed Kubernetes, PostgreSQL, and MLflow. Operating in US and EU regions.

The Hyperscaler GPU Bottleneck

The challenge with running AI workloads on traditional hyperscalers isn’t capability—it’s access and economics. AWS, GCP, and Azure all offer powerful GPU instances, but getting access to them requires navigating quota systems, capacity reservations, and often multi-year commitments. An 8-GPU H100 instance costs $98.32/hour on Azure, $88.49/hour on GCP, and requires capacity reservations or UltraClusters on AWS just to guarantee availability. Over a month of continuous usage, you’re looking at $64,000-$72,000 per instance.

Nebius provides immediate on-demand access to H100 instances at $2.95 per GPU-hour without sales calls or quota approvals. An 8-GPU cluster costs $23.60/hour, or about $17,200/month—a savings of roughly $47,000-$55,000 per month compared to hyperscaler on-demand pricing.

Unlike most GPU-focused providers, Nebius includes managed Kubernetes, PostgreSQL, MLflow, and Slurm workload managers.

Understanding What Nebius Offers

GPU Hardware and Availability

Nebius provides access to NVIDIA’s latest GPU architectures: H100, H200, B200, and GB200. All instances include NVLink and InfiniBand networking standard across GPU nodes. You can provision 16-32 H100 GPUs immediately on-demand. Larger clusters require commitment-based contracts at $2.95/hour baseline (35% discounts available for longer commitments).

Nebius operates in US and EU regions. Instance configurations range from single GPU instances to multi-node clusters with InfiniBand for low-latency gradient synchronization.

Managed Services

Managed Kubernetes handles cluster provisioning, upgrades, and control plane management. You define GPU node pools through Terraform or the API, and Nebius manages the Kubernetes control plane, node scaling, and GPU driver installation. Your existing Kubernetes manifests, Helm charts, and operators work without modification.

Managed PostgreSQL provides high-availability database clusters with automated backups, point-in-time recovery, and connection pooling. If you’re using RDS, Cloud SQL, or Azure Database for PostgreSQL for ML metadata, experiment results, or application data, this service maps directly.

Managed MLflow gives you a hosted MLflow tracking server for experiment management, model registry, and artifact storage. This replaces SageMaker Experiments, Vertex AI Experiments, or self-hosted MLflow deployments.

Slurm workload manager is available as a managed service for teams migrating from HPC environments or needing batch job scheduling with advanced resource allocation capabilities beyond standard Kubernetes batch jobs.

Infrastructure Capabilities

Nebius provides VPC networking with security groups, private subnets, and network isolation similar to AWS VPC. You can configure software-defined networks, set up firewall rules, and establish site-to-site VPN connections to existing infrastructure. Public IPs are optional—you can run entirely on private networks and expose services through load balancers.

Storage options include S3-compatible object storage for datasets, model artifacts, and checkpoints, plus NFS file storage for high-throughput shared access across GPU nodes. The NFS implementation delivers up to 12 GBps read and 8 GBps write throughput per 8 GPU VM, significantly outperforming AWS EFS (1.5 GBps max) while being competitive with Azure Premium Files (10 GBps) and GCP Filestore (up to 25 GBps).

Automation and Infrastructure-as-Code

Nebius offers an official Terraform provider with comprehensive resource coverage. If you’re managing hyperscaler infrastructure through Terraform, your migration involves translating resource definitions rather than rewriting automation logic. They also provide a CLI, Python SDK, and gRPC APIs for scripting and automation.

Pre-Migration Assessment

Identify Migration Blockers and Dependencies

Tight coupling to hyperscaler-specific services: Code that uses AWS-specific APIs (boto3 calls to SageMaker, DynamoDB), GCP-specific libraries, or Azure SDKs needs adaptation.

Compliance and data residency requirements: If you’re subject to regulations requiring data to stay in specific regions or countries, verify that Nebius operates in compliant regions for your needs. Nebius operates in US and EU regions, covering most common data residency requirements but with a smaller geographic footprint than hyperscalers.

Existing commitments and contracts: If you’ve signed multi-year reserved instance contracts or enterprise agreements with current providers, calculate the financial impact of migrating before those commitments expire. Sometimes it makes sense to run hybrid for a period.

Team expertise: If you’re heavily reliant on managed platforms like SageMaker that abstract infrastructure, you’ll need Kubernetes and Terraform expertise.

Migration Planning

Choosing Your Migration Strategy

Lift-and-shift: If your workloads already run in containers on Kubernetes, you’re moving containers from one cluster to another. Works best with minimal hyperscaler-specific service dependencies.

Hybrid: Run compute on Nebius while keeping data and some services on your current hyperscaler. Gets you immediate GPU cost savings while deferring data migration. You’ll pay cross-cloud data transfer costs, but for compute-heavy workloads this can be net positive.

Full migration: Move everything—compute, data, and dependent services—to Nebius. Maximizes cost savings but requires more planning.

Setting Up Your Nebius Environment

Understanding Nebius IAM:

Resource Hierarchy:

  • Tenant: Top-level organizational unit (equivalent to AWS Organization)
  • Projects: Logical groupings within a tenant for organizing resources by team, environment, or application
  • Resources: Individual infrastructure components (VMs, Kubernetes clusters, storage buckets)

Access Control Model:

Nebius IAM uses a straightforward group-based role assignment model:

  1. Groups: Collections of users and service accounts (e.g., admins, developers, ml-engineers)
  2. Roles: 10 predefined permission sets (no custom policy authoring required)
  3. Assignment: Grant access by adding users/service accounts to groups that have specific roles

All Nebius IAM Roles:

General Roles:

  • auditor: View certain resource types without accessing data
  • viewer: View most resource types and access data within them
  • editor: View and manage most resource types plus their data
  • admin: View and manage all resource types and their data

Object Storage Roles:

  • storage.viewer: View buckets, list and download objects (but not upload)
  • storage.uploader: View buckets, upload objects (but not list or download)
  • storage.object-editor: View buckets, list, download, upload and manage objects
  • storage.editor: View and manage buckets plus all object operations

Specialized Roles:

  • mysterybox.payload-viewer: View payloads in MysteryBox secrets
  • dsr.admin: Manage Data Subject Requests and support tickets

The simplicity of 10 predefined roles (versus hundreds of AWS managed policies) makes Nebius IAM easier to understand and reduces misconfiguration risks. No JSON policy documents, no permission boundaries, no policy evaluation logic to debug.

Account setup and access management:

  1. Create your Nebius account: Sign up at nebius.com and create your tenant. The account owner automatically gets admin role in the default admins group.

  2. Install and configure the Nebius CLI:

# Install CLI (example for Linux/macOS)
curl -sSL https://storage.ai.nebius.cloud/nebius/install.sh | bash

# Initialize CLI configuration
nebius init

# Verify installation
nebius --version
  1. Set up user authentication: Add team members to your tenant and assign them to groups with appropriate roles. Nebius uses bearer token authentication where all API requests require an Authorization: Bearer <IAM-access-token> header.

  2. Create service accounts for automation:

# Create a service account
nebius iam service-account create \
  --name ml-automation \
  --description "Service account for ML pipeline automation"

# Get the service account ID
SA_ID=$(nebius iam service-account get ml-automation --format json | jq -r '.id')
  1. Generate authorized keys for CLI automation:
# Generate key pair and create CLI profile
nebius iam service-account create-authorized-key \
  --service-account-id $SA_ID \
  --output authorized-key.json

# Configure CLI profile to use service account
nebius config profile create sa-profile \
  --service-account-key-file authorized-key.json

# Use the service account profile
nebius --profile sa-profile iam get-access-token
  1. Create API keys for programmatic access:
# API keys are used for simplified authorization with service accounts
nebius iam api-key create \
  --service-account-id $SA_ID \
  --description "API key for Python SDK access"

# Use in Python SDK
from nebius import sdk

client = sdk.NebIAClient(
    token="<IAM-access-token>",
    endpoint="api.nebius.cloud:443"
)
  1. Configure static keys for specific services: Static keys are issued for specific Nebius services (like S3-compatible storage) and stored securely in your secrets manager.

  2. Assign roles and permissions: Grant service accounts the minimum required permissions using Nebius IAM roles:

# Grant editor role to service account (can manage compute resources)
nebius iam role grant \
  --service-account-id $SA_ID \
  --role editor \
  --tenant-id $TENANT_ID
  1. SAML/SSO integration (if required): Nebius supports SAML-based SSO for enterprise authentication. Configure your identity provider (Okta, Azure AD, etc.) to federate with Nebius IAM. Contact Nebius support for enterprise SSO setup guidance, as this typically requires tenant-level configuration.

  2. Set up CI/CD integration: Store service account credentials as secrets in your CI/CD platform:

GitHub Actions example:

- name: Authenticate with Nebius
  env:
    NEBIUS_SA_KEY: ${{ secrets.NEBIUS_SERVICE_ACCOUNT_KEY }}
  run: |
    echo "$NEBIUS_SA_KEY" > /tmp/sa-key.json
    nebius config profile create ci-profile \
      --service-account-key-file /tmp/sa-key.json
    export NEBIUS_PROFILE=ci-profile    

GitLab CI example:

before_script:
  - echo "$NEBIUS_SA_KEY" > /tmp/sa-key.json
  - nebius config profile create ci-profile --service-account-key-file /tmp/sa-key.json
  - export NEBIUS_PROFILE=ci-profile

VPC and network architecture: Nebius automatically creates a default network and subnet when you create a project. VMs and Kubernetes clusters use private IP addresses and can communicate privately without additional configuration. For multiple isolated networks or custom CIDR ranges, see the Nebius VPC documentation.

Storage configuration: Provision S3-compatible object storage buckets and NFS file storage using Terraform or the CLI. Configure lifecycle policies for automatic deletion of old experiment artifacts.

Mapping Hyperscaler Services to Nebius Equivalents

Managed Kubernetes: EKS, GKE, and AKS map directly to Nebius Managed Kubernetes. Your workload manifests don’t change. The main adaptation is in cluster provisioning (Terraform resources) and node pool configuration for GPU instances.

ML platforms: SageMaker Training Jobs, Vertex AI Training, and Azure ML Jobs all become Kubernetes Jobs or custom operators (like Kubeflow Training Operator) on Nebius. Alternatively, use Saturn Cloud, an AI development platform that runs on Nebius and provides development workspaces, distributed training, and model deployment similar to SageMaker but optimized for Nebius infrastructure.

Databases: RDS, Cloud SQL, and Azure Database for PostgreSQL map to Nebius Managed PostgreSQL. For NoSQL databases (DynamoDB, Firestore, Cosmos DB), you’ll need to either keep them on the hyperscaler (acceptable for hybrid architecture) or migrate to open-source alternatives like MongoDB, Cassandra, or PostgreSQL with JSONB columns, running on Nebius compute or using third-party managed services.

Message queues: SQS, Pub/Sub, and Azure Queue Storage don’t have Nebius equivalents. Run your own message queue (RabbitMQ, Kafka, Redis Streams) on Nebius Kubernetes, use managed message queue providers that aren’t tied to specific clouds, or keep these services on your current hyperscaler during hybrid operation.

Object storage: S3, GCS, and Azure Blob Storage map to Nebius S3-compatible object storage. The S3 API compatibility means most code that uses boto3 or similar S3 clients works with minimal changes (updating endpoint URLs and credentials).

Workflow orchestration: AWS Step Functions, GCP Cloud Composer (managed Airflow), and Azure Data Factory don’t have direct Nebius equivalents. Deploy Airflow, Prefect, Argo Workflows, or similar open-source orchestration tools on Nebius Kubernetes.

Step-by-Step Migration Process

Data Migration

Use rclone, s5cmd, or boto3 to transfer data:

# Using rclone for S3 to Nebius object storage
rclone copy s3:my-aws-bucket/training-data nebius:ml-training-data/training-data \
  --progress \
  --transfers 16 \
  --checkers 32

Migration strategies:

Direct network transfer: Use high-bandwidth network connections and parallel transfer tools. If your data is on AWS S3, GCP GCS, or Azure Blob Storage, you can stream directly to Nebius object storage. Tools like rclone, s5cmd, or custom scripts with boto3 can parallelize transfers across many files.

Incremental migration: Write new data to both your current storage and Nebius simultaneously. Backfill historical data over time.

Hybrid storage approach: Leave data on hyperscaler storage initially and access it from Nebius GPU instances. You’ll pay egress costs from AWS/GCP/Azure, but this lets you start using Nebius GPUs immediately. For workloads where compute costs dwarf data transfer costs, this can be net positive. As you iterate on models, gradually migrate frequently accessed datasets while leaving cold data on hyperscaler storage.

Workload Migration

Containerized training jobs: Update your Kubernetes manifests to point to Nebius clusters, update data paths to reference Nebius object storage or NFS, and update image registries if you’re moving those as well.

Model serving infrastructure: For inference, you have two main options on Nebius:

Self-hosted inference: Deploy your model serving framework (Triton Inference Server, TorchServe, TensorFlow Serving, or custom FastAPI services) on Nebius Kubernetes with Horizontal Pod Autoscaler and ingress controllers.

Nebius Token Factory: For LLM inference, consider using Nebius Token Factory, a managed inference platform that supports 60+ open-source models (DeepSeek, Llama, Qwen, Mistral, etc.). Token Factory provides:

  • Sub-second response times with 99.9% uptime SLA
  • Transparent $/token pricing with no hidden infrastructure costs
  • Auto-scaling to handle hundreds of millions of tokens per minute
  • Support for custom fine-tuned models and LoRA adapters
  • Enterprise features including zero-retention mode, SOC 2 Type II, HIPAA, and ISO 27001 compliance

Useful alternative to SageMaker Endpoints or Vertex AI Prediction.

Experiment tracking and MLOps tools: If you’re using SageMaker Experiments or Vertex AI Experiments, migrate to Nebius Managed MLflow. If you already use third-party experiment tracking (Weights & Biases, Neptune, Comet), continue using them from Nebius without any changes—they work identically from Nebius as they do from hyperscalers.

For feature stores, model registries, and other MLOps infrastructure, deploy open-source alternatives (Feast for feature stores, MLflow Model Registry, DVC for data versioning) on Nebius Kubernetes or continue using SaaS MLOps platforms that work across clouds.

Optimizing for Nebius

Taking Advantage of Nebius-Specific Features

NVLink and InfiniBand for distributed training: Nebius includes high-speed GPU interconnects standard on all GPU instances. For multi-node distributed training, ensure your training framework is configured to use InfiniBand (RoCE). PyTorch Distributed, Horovod, and DeepSpeed all support RDMA-capable networking.

Verify that NCCL (NVIDIA Collective Communications Library) is using InfiniBand:

# Set environment variables for NCCL to prefer InfiniBand
export NCCL_IB_DISABLE=0
export NCCL_IB_HCA=mlx5
export NCCL_NET_GDR_LEVEL=5

For large model training where gradient synchronization is a bottleneck, InfiniBand’s lower latency versus standard Ethernet can reduce training time by 20-40%.

High-performance NFS for data loading: Nebius NFS delivers up to 12 GBps read throughput per 8 GPU VM, significantly faster than AWS EFS (1.5 GBps max) and competitive with Azure Premium Files (10 GBps) and GCP Filestore (25 GBps). Use NFS for training data that needs to be accessed from multiple GPU nodes simultaneously. Mount the NFS file system on all GPU nodes and load data directly rather than copying to local disk first.

For very large datasets where you’re reading different portions on each node (data parallelism), NFS performs well. For workloads where you’re reading the entire dataset on every node repeatedly, consider caching to local NVMe storage on GPU instances if available.

Managed MLflow for experiment tracking: Configure all training jobs to log metrics, parameters, and artifacts to MLflow:

Example integration:

import mlflow

mlflow.set_tracking_uri("https://mlflow.nebius.example.com")
mlflow.set_experiment("llm-training")

with mlflow.start_run():
    mlflow.log_params({
        "learning_rate": 1e-4,
        "batch_size": 32,
        "model_size": "7B"
    })

    for epoch in range(num_epochs):
        train_loss = train_epoch()
        mlflow.log_metric("train_loss", train_loss, step=epoch)

    mlflow.pytorch.log_model(model, "model")

Cost Optimization Strategies

Commitment-based discounts: For stable production workloads with predictable GPU usage, negotiate commitment-based pricing with Nebius. They offer discounts up to 35% for longer-term commitments. Calculate your baseline GPU usage (instances that run 24/7 or on predictable schedules) and commit to that capacity, while using on-demand for variable experimental workloads.

Performance Tuning

Network throughput optimization: For distributed training using Nebius’s InfiniBand networking, use NCCL tests to benchmark all-reduce operations across nodes:

# Run NCCL all-reduce benchmark across nodes
mpirun -np 16 --host node1:8,node2:8 \
  /usr/local/bin/all_reduce_perf -b 8 -e 8G -f 2 -g 1

If you see lower-than-expected bandwidth, verify InfiniBand configuration or adjust NCCL environment variables for your specific network topology.

Common Migration Challenges and Solutions

Dealing with Hyperscaler-Specific Dependencies

Problem: Code tightly coupled to AWS SageMaker APIs, GCP Vertex AI, or Azure ML services.

Solution: Abstract hyperscaler-specific APIs behind interfaces. For example, instead of calling SageMaker APIs directly throughout your codebase, create a training orchestration layer that calls SageMaker in your current implementation but can be swapped for Kubernetes Job submission when migrating to Nebius.

If you’re using SageMaker Processing for data preprocessing, replace it with containerized preprocessing jobs on Kubernetes. SageMaker Training Jobs become Kubernetes Jobs or KubeFlow PyTorchJobs/TFJobs. SageMaker Endpoints become Triton or TorchServe deployments.

Problem: Code using AWS-specific services (DynamoDB, SQS, Step Functions).

Solution: Three options:

  1. Keep using these services from Nebius (hybrid architecture) via VPN or site-to-site connectivity.
  2. Replace with open-source equivalents running on Nebius (PostgreSQL instead of DynamoDB, RabbitMQ/Kafka instead of SQS, Airflow instead of Step Functions).
  3. Use cloud-agnostic managed services (MongoDB Atlas, Confluent Cloud, Astronomer for Airflow).

Handling Cross-Cloud Networking During Transition

Problem: High latency or egress costs accessing data and services across clouds during hybrid operation.

Solution: Establish VPN or dedicated interconnect between your hyperscaler VPC and Nebius VPC. For data-intensive workloads, stage frequently-accessed data in Nebius storage rather than accessing it from hyperscaler storage on every training run.

Managing Secrets and Credentials

Problem: Secrets stored in AWS Secrets Manager, GCP Secret Manager, or Azure Key Vault need to be accessible from Nebius.

Solution: Migrate secrets to Kubernetes Secrets or use a cloud-agnostic secret management solution (HashiCorp Vault, Doppler, 1Password). For Kubernetes Secrets, use external-secrets operator to sync secrets from your existing secret manager during transition, then gradually migrate them to your target solution.

When Nebius Makes Sense

Nebius is compelling when:

  • GPU availability is a bottleneck on your current platform
  • GPU compute costs are a significant portion of infrastructure spend
  • Your workloads are containerized and run on Kubernetes (easy to migrate)
  • You have engineering capacity to manage infrastructure and don’t require extensive managed services
  • You need access to latest GPUs (H100, H200, B200, GB200) without multi-year commitments

Staying on hyperscalers may be better if:

  • You’re deeply integrated with hyperscaler-specific services (SageMaker, Vertex AI) and migration effort is substantial
  • Your AI workloads are small and GPU costs aren’t a major concern
  • You require global presence beyond US and EU regions
  • Compliance requirements mandate specific hyperscaler certifications that Nebius doesn’t have
  • You lack operational capacity to manage Kubernetes and infrastructure yourself

Resources