Skip to main content
Version: 2.7.0

Auto Scaling

Overview

Auto-scaling is a vital pillar of load management. It empowers service operators to adjust the number of instances or resources allocated to a service automatically, based on current or anticipated demand and resource utilization. This way, auto-scaling ensures a service can handle incoming load while optimizing operational costs by allocating the appropriate number of resources.

In Aperture, service operators can configure auto-scaling policies based on different overload signals such as load throttling, in addition to resource utilization based on CPU, memory usage, network I/O, and more. This versatility enables service operators to fine-tune auto-scaling behavior according to their specific needs. Auto-scaling policies can be set up to add or remove instances or resources based on these signals, enabling dynamic scaling in response to changing traffic patterns.

flowchart LR classDef Controller fill:#F8773D,stroke:#000000,stroke-width:2px; classDef Signal fill:#EFEEED,stroke:#000000,stroke-width:1px; classDef Agent fill:#56AE89,stroke:#000000,stroke-width:2px; classDef Service fill:#56AE89,stroke:#000000,stroke-width:1px; Out("Scale Out Criteria") --> Controller In("Scale In Criteria") --> Controller Controller <-- "Decisions & Telemetry" --> Agent class Controller Controller Agent -- "k8s" --> Infra-API class Agent Agent class Infra-API Service

The diagram outlines the process of auto-scaling. The controller, on receiving a scale in or out signal, dispatches an auto-scaling signal to the agent, which interfaces with the infrastructure APIs, such as Kubernetes, to perform the scaling.

Example Scenario

Imagine a task management application whose usage varies significantly between working days and holidays, and even fluctuates during different hours of a standard working day. To maintain responsive APIs while managing infrastructure costs effectively, the service can use load-based auto-scaling. This strategy ensures that service resources are dynamically adjusted in line with usage patterns, therefore optimizing both user experience and infrastructure expenditure.