Defragmentation

Cluster Defragmentation #

Cluster Defragmentation helps reduce infrastructure costs and improve efficiency by reorganizing workloads across Kubernetes nodes. This process replaces fragmented and inefficient nodes with optimized ones, ensuring better resource utilization and lower hourly costs — without compromising availability.

Over time, clusters naturally become fragmented — a normal byproduct of daily operations such as scaling, uneven pod distribution, node taints, or affinity rules. Additionally, cloud provider pricing can shift, making some instance types less cost-effective than when initially provisioned.

Defragmentation helps by:

  • Consolidating workloads onto fewer, better-utilized nodes
  • Replacing expensive or poorly matched nodes with cost-effective alternatives
  • Adapting to pricing changes to maintain optimal cost/performance balance
  • Freeing up capacity for future workloads, reducing unnecessary autoscaling

Defragmentation is initiated manually. Before any changes are made, you receive a detailed plan preview, including cost savings and a full before/after topology of your cluster.

Migration from Non-StackBooster Clusters #

When migrating an existing Kubernetes cluster that was not previously managed by StackBooster, you have the option to perform an initial cluster “Roll” as the finaliziong of migration process.

What is a Cluster Roll? #

A cluster roll is the initial defragmentation operation performed when transitioning from a non-StackBooster configuration to a StackBooster-managed cluster. This process evaluates your existing nodes and workloads, then proposes an optimized configuration that:

  • Replaces on-demand nodes with appropriately sized spot instances (where applicable)
  • Consolidates workloads to reduce node count
  • Selects optimal instance families and sizes based on your actual resource usage
  • Significantly reduces compute costs while maintaining or improving resource availability

Roll During Onboarding #

During the StackBooster setup process, you’ll progress through several steps:

  1. Step 1: Connect Account - Link your cloud provider account
  2. Step 2: Connect Cluster - Integrate your Kubernetes cluster
  3. Step 3: Enable Full Agent Mode - Activate complete StackBooster functionality
  4. Step 4: Migration Template - Configure migration preferences
  5. Step 5: Roll Cluster - Review and execute the initial optimization
  6. Step 6: Configure Automation - Set up ongoing autoscaling policies

At Step 5: Roll Cluster, you’ll see:

  • Current configuration analysis: Your existing nodes with their instance types, resource utilization (CPU and Memory), hourly costs, and monthly costs
  • Optimized proposal: The recommended configuration showing how nodes will be replaced with more cost-effective alternatives
  • Projected savings: Both monthly and annual cost reductions

Example Migration Benefits:

Before Roll:

  • 3 nodes (t3.medium, on-demand)
  • Total vCPU: 5.79 → Optimized: 3.66 (36.8% reduction)
  • Total Memory: 9.70 GB → Optimized: 6.15 GB (36.6% reduction)
  • Monthly cost: $96.60 → Optimized: $18.70
  • Monthly savings: $77.90 (80.7% reduction)

Safety During Initial Roll #

The initial roll operation follows the same strict safety protocols as ongoing defragmentation:

  • New nodes are provisioned and verified as Ready before old nodes are drained
  • Workloads are gracefully migrated respecting Pod Disruption Budgets (PDBs)
  • Nodes with pods using ephemeral local storage or standalone pods are excluded
  • The process runs in controlled batches to maintain cluster availability
  • Real-time progress tracking shows the status of each batch and node transition

Starting Your Initial Roll #

  1. Complete Steps 1-4 of the StackBooster onboarding process
  2. At Step 5, review the optimization plan showing:
    • Current cluster costs and configuration
    • Proposed optimized configuration
    • Estimated savings
  3. Click Start roll to begin the migration
  4. Monitor the progress as nodes are replaced in batches
  5. Once complete, proceed to Step 6 to configure automation

The initial roll typically completes within 10-30 minutes depending on cluster size, but timing may vary based on cloud provider provisioning speed and workload eviction policies.

How It Works #

When you initiate a defragmentation roll (either as an initial migration roll or an ongoing optimization), the system evaluates your current node and pod placement and proposes an optimized configuration. This includes:

  • Estimated hourly savings
  • Overview of nodes planned for removal and replacement, with detailed configuration of newly proposed nodes
  • Per-node breakdowns of utilization, instance type, and pricing
  • A summary view comparing current vs. optimized state

If the analysis shows that your cluster is already cost-efficient, you’ll see a message confirming that no action is needed.

If a roll is started, nodes are replaced in safe, controlled batches. Each new node is verified as Ready, and pods are confirmed to be running before proceeding to drain the corresponding old nodes. This continues until all planned replacements are complete or skipped if unnecessary.

Safety and Reliability #

To ensure your workloads remain uninterrupted, defragmentation is performed with strict safety measures:

  • Nodes are only replaced when their workloads can be safely rescheduled.
  • Pods protected by Pod Disruption Budgets (PDBs), using ephemeral local storage, or not managed by a controller (e.g. standalone pods) are excluded from replacement.
  • Nodes are rolled out in controlled batches, and each new node must pass readiness checks before its corresponding old node is drained.
  • If a node fails to provision (e.g. due to lack of capacity), it is skipped and logged without blocking the process.
  • Draining respects pod eviction policies and timeouts, allowing for graceful transitions.

These safeguards maintain service stability while enabling cost-efficient optimization.

Launching a Defragmentation Roll #

For Existing StackBooster Clusters #

To initiate the defragmentation process for an existing StackBooster-managed cluster, go to the Defragmentation tab in your cluster view.

If the cluster is eligible for optimization, you will see:

  • Projected savings (monthly and annual)
  • Current node configuration: showing instance types, resource utilization, and costs
  • Optimized plan: a preview of the more cost-efficient configuration

To begin the operation:

  1. Review the optimization plan and projected benefits
  2. Click the Start saving now button

Once started, the UI will display real-time progress including batch execution status and node transitions.

If the system detects that your cluster is already in an optimal state, it will display a clear message indicating no action is required with a badge showing “The cluster is optimal”.

For New Migrations #

For clusters being migrated to StackBooster for the first time, follow the onboarding wizard and initiate the roll at Step 5 as described in the Migration from Non-StackBooster Clusters section above.

Understanding Optimization Results #

After a defragmentation roll completes, you’ll see detailed before/after comparisons:

Metrics Displayed #

  • Cluster Price: Total hourly cluster cost and percentage reduction
  • Node Count: Number of nodes before and after optimization
  • Total vCPU: Allocated CPU resources and percentage change
  • Total Memory: Allocated memory resources and percentage change
  • Savings Breakdown:
    • Monthly savings amount
    • Annual savings projection
    • Current cost with StackBooster
    • Cost without StackBooster (unoptimized)
    • Percentage savings achieved

Per-Node Details #

For each node in both before and after states, you can view:

  • Family: Instance family (e.g., t3.medium, t4g.medium, c5g.large)
  • Spot: Whether the node is a spot or on-demand instance
  • CPU: CPU utilization percentage
  • Memory: Memory utilization percentage
  • Hourly cost: Cost per hour for the node
  • Monthly cost: Projected monthly cost for the node

Typical Performance #

Nodes replacedBatchesExpected duration*
2-51-2~5-10 min
5-102-3~10-15 min
10-503-5~15-30 min
50-2005-10~30-60 min

*Actual timing may vary based on workload readiness, provisioning delays from the cloud provider, and the time needed to gracefully drain workloads from old nodes as defined by your eviction policies and timeouts.

Best Practices #

When to Run Defragmentation #

  • After initial cluster migration: As part of the onboarding process (Step 5)
  • After significant workload changes: When you’ve added or removed major applications
  • Following cloud provider price updates: When instance pricing changes in your region
  • Periodic optimization: Every 1-3 months to maintain cost efficiency
  • After prolonged scaling activity: When your cluster has experienced extensive autoscaling events

Monitoring During Defragmentation #

While defragmentation is running:

  • Watch the progress indicator: Shows percentage complete and current batch
  • Monitor node status: View which nodes are being added/removed in real-time
  • Check the bottom notification bar: Displays “Defragmentation In Progress” with real-time updates
  • Review batch completion: Track how many nodes have been successfully optimized

Post-Defragmentation #

After completion:

  • Verify all workloads are running normally
  • Review the final cost comparison
  • Check node distribution and resource utilization
  • Configure or update automation settings (Step 6) if needed

Troubleshooting #

Common Scenarios #

Defragmentation shows 0% savings

  • Your cluster is already optimally configured
  • No action is needed

Some nodes weren’t replaced

  • Nodes may contain workloads with ephemeral storage
  • Pods might not be managed by a controller
  • PDB constraints might prevent eviction
  • Cloud provider may lack capacity for the desired instance type

Roll takes longer than expected

  • Workload eviction respects PDB settings and drain timeouts
  • Cloud provider provisioning delays
  • Complex pod scheduling requirements

Getting Help #

If you encounter issues during defragmentation:

  1. Check the real-time progress logs in the UI
  2. Review the notification messages for specific errors
  3. Consult the StackBooster documentation
  4. Contact StackBooster support with your cluster details

Summary #

Cluster Defragmentation is a powerful cost optimization tool that safely reorganizes your Kubernetes infrastructure for maximum efficiency. Whether you’re migrating an existing cluster through the initial roll process or performing periodic optimization on a StackBooster-managed cluster, the operation ensures your workloads run on the most cost-effective nodes while maintaining full availability and respecting all safety constraints.

With potential savings of 60-85% on compute costs, regular defragmentation helps you maintain an optimal balance between performance and cost throughout your cluster’s lifecycle.