Version: 2.6.0

Load-based Auto Scaling

note

The following policy is based on the Service Protection with Average Latency Feedback blueprint.

Overview

Responding to fluctuating service demand is a common challenge for maintaining stable and responsive services. This policy introduces a mechanism to dynamically scale service resources based on observed load, optimizing resource allocation and ensuring that the service remains responsive even under high load.

This policy employs two key strategies: service protection and auto-scaling.

Service Protection: Based on the trend of observed latency, the service gets protected from sudden traffic spikes using a Load Scheduler component. Load on the service is throttled when the observed latency exceeds the long-term trend by a certain percentage threshold. This ensures the service stays responsive even under high load.
Auto-Scaling: The auto-scaling strategy is based on the throttling behavior of the service protection policy. An Auto Scaler component is used to dynamically adjust the number of service instances in response to changes in load. This load-based auto-scaling is enacted by a scale-out Controller that reads Load Scheduler signals. The service replicas are scaled out when the load is being throttled, effectively scaling resources to match the demand. During periods of low load, the policy attempts to scale in after periodic intervals to reduce excess replicas.

By combining service protection with auto-scaling, this policy ensures that the number of service replicas is adjusted to match persistent changes in demand, maintaining service stability and responsiveness.

Configuration

This policy, provides protection against overloads at the search-service.prod.svc.cluster.local service. Auto-scaling is applied to the Deployment search-service with a minimum of 1 and a maximum of 10 replicas.

To prevent frequent fluctuation in replicas, scale-in and scale-out cooldown periods are set to 40 and 30 seconds, respectively. A periodic scale-in interval of 60 seconds is also set to reduce excess replicas during periods of low load.

aperturectl values.yaml

# yaml-language-server: $schema=../../../../../../blueprints/policies/service-protection/average-latency/gen/definitions.json
# Generated values file for policies/service-protection/average-latency blueprint
# Documentation/Reference for objects and parameters can be found at:
# https://docs.fluxninja.com/reference/blueprints/policies/service-protection/average-latency

policy:
  # Name of the policy.
  # Type: string
  # Required: True
  policy_name: load-based-auto-scale
  service_protection_core:
    adaptive_load_scheduler:
      load_scheduler:
        # The selectors determine the flows that are protected by this policy.
        # Type: []aperture.spec.v1.Selector
        # Required: True
        selectors:
        - control_point: ingress
          service: search-service.prod.svc.cluster.local
  latency_baseliner:
    # Tolerance factor beyond which the service is considered to be in overloaded state. E.g. if EMA of latency is 50ms and if Tolerance is 1.1, then service is considered to be in overloaded state if current latency is more than 55ms.
    # Type: float64
    latency_tolerance_multiplier: 1.1
    # Flux Meter defines the scope of latency measurements.
    # Type: aperture.spec.v1.FluxMeter
    # Required: True
    flux_meter:
      selectors:
      - control_point: ingress
        service: search-service.prod.svc.cluster.local
  components:
  - auto_scale:
      auto_scaler:
        scaling_backend:
          # Kubernetes replicas scaling backend.
          # Type: aperture.spec.v1.AutoScalerScalingBackendKubernetesReplicas
          # Required: True
          kubernetes_replicas:
            # Kubernetes object selector.
            # Type: aperture.spec.v1.KubernetesObjectSelector
            # Required: True
            kubernetes_object_selector:
              agent_group: default
              api_version: apps/v1
              kind: Deployment
              name: search-service
              namespace: prod
            # Minimum number of replicas.
            # Type: string
            # Required: True
            min_replicas: "1"
            # Maximum number of replicas.
            # Type: string
            # Required: True
            max_replicas: "10"
        # Dry run mode ensures that no scaling is invoked by this auto scaler.
        # Type: bool
        dry_run: false
        scale_in_controllers:
        - alerter:
            alert_name: Auto-scaler is scaling in
          controller:
            periodic:
              period: "60s"
              scale_in_percentage: 10
        scale_out_controllers:
        - alerter:
            alert_name: Auto-scaler is scaling out
          controller:
            gradient:
              in_ports:
                setpoint:
                  constant_signal:
                    value: 1
                signal:
                  signal_name: DESIRED_LOAD_MULTIPLIER
              parameters:
                slope: -1
        # Parameters that define the scaling behavior.
        # Type: aperture.spec.v1.AutoScalerScalingParameters
        # Required: True
        scaling_parameters:
          scale_in_alerter:
            alert_name: "Auto-scaler is scaling in"
          scale_in_cooldown: "40s"
          scale_out_alerter:
            alert_name: "Auto-scaler is scaling out"
          scale_out_cooldown: "30s"

Generated Policy

apiVersion: fluxninja.com/v1alpha1
kind: Policy
metadata:
  annotations:
    fluxninja.com/blueprint-name: policies/service-protection-with-load-based-pod-auto-scaler/average-latency
    fluxninja.com/blueprints-uri: local
    fluxninja.com/values:
      '{"policy": {"auto_scaling": {"dry_run": false, "periodic_decrease":
      {"period": "60s", "scale_in_percentage": 10}, "scaling_backend": {"kubernetes_replicas":
      {"kubernetes_object_selector": {"agent_group": "default", "api_version": "apps/v1",
      "kind": "Deployment", "name": "search-service", "namespace": "prod"}, "max_replicas":
      "10", "min_replicas": "1"}}, "scaling_parameters": {"scale_in_alerter": {"alert_name":
      "Auto-scaler is scaling in"}, "scale_in_cooldown": "40s", "scale_out_alerter":
      {"alert_name": "Auto-scaler is scaling out"}, "scale_out_cooldown": "30s"}},
      "latency_baseliner": {"flux_meter": {"selectors": [{"control_point": "ingress",
      "service": "search-service.prod.svc.cluster.local"}]}, "latency_tolerance_multiplier":
      1.1000000000000001}, "policy_name": "load-based-auto-scale", "service_protection_core":
      {"adaptive_load_scheduler": {"load_scheduler": {"selectors": [{"control_point":
      "ingress", "service": "search-service.prod.svc.cluster.local"}]}}}}}'
  labels:
    fluxninja.com/validate: "true"
  name: load-based-auto-scale
spec:
  circuit:
    components:
      - flow_control:
          adaptive_load_scheduler:
            dry_run: false
            dry_run_config_key: dry_run
            in_ports:
              overload_confirmation:
                constant_signal:
                  value: 1
              setpoint:
                signal_name: SETPOINT
              signal:
                signal_name: SIGNAL
            out_ports:
              desired_load_multiplier:
                signal_name: DESIRED_LOAD_MULTIPLIER
              observed_load_multiplier:
                signal_name: OBSERVED_LOAD_MULTIPLIER
            parameters:
              alerter:
                alert_name: Load Throttling Event
              gradient:
                max_gradient: 1
                min_gradient: 0.1
                slope: -1
              load_multiplier_linear_increment: 0.0025
              load_scheduler:
                selectors:
                  - control_point: ingress
                    service: search-service.prod.svc.cluster.local
              max_load_multiplier: 2
      - auto_scale:
          auto_scaler:
            dry_run: false
            dry_run_config_key: dry_run
            scale_in_controllers:
              - alerter:
                  alert_name: Periodic scale in intended
                controller:
                  periodic:
                    period: 60s
                    scale_in_percentage: 10
            scale_out_controllers:
              - alerter:
                  alert_name: Load based scale out intended
                controller:
                  gradient:
                    in_ports:
                      setpoint:
                        constant_signal:
                          value: 1
                      signal:
                        signal_name: DESIRED_LOAD_MULTIPLIER
                    parameters:
                      slope: -1
            scaling_backend:
              kubernetes_replicas:
                kubernetes_object_selector:
                  agent_group: default
                  api_version: apps/v1
                  kind: Deployment
                  name: search-service
                  namespace: prod
                max_replicas: "10"
                min_replicas: "1"
            scaling_parameters:
              scale_in_alerter:
                alert_name: Auto-scaler is scaling in
              scale_in_cooldown: 40s
              scale_out_alerter:
                alert_name: Auto-scaler is scaling out
              scale_out_cooldown: 30s
      - query:
          promql:
            evaluation_interval: 10s
            out_ports:
              output:
                signal_name: MAX_EMA
      - ema:
          in_ports:
            input:
              signal_name: SIGNAL
            max_envelope:
              signal_name: MAX_EMA
          out_ports:
            output:
              signal_name: SIGNAL_EMA
          parameters:
            correction_factor_on_max_envelope_violation: 0.95
            ema_window: 1500s
            warmup_window: 60s
      - arithmetic_combinator:
          in_ports:
            lhs:
              signal_name: SIGNAL_EMA
            rhs:
              constant_signal:
                value: 1.1
          operator: mul
          out_ports:
            output:
              signal_name: SETPOINT
    evaluation_interval: 10s
  resources:
    flow_control:
      classifiers: []
      flux_meters:
        load-based-auto-scale:
          selectors:
            - control_point: ingress
              service: search-service.prod.svc.cluster.local

info

Circuit Diagram for this policy.

Policy in Action

During transient load spikes, the response latency on the service increases. The service protection policy queues a proportion of the incoming requests. The Auto Scaler makes a scale-out decision as the OBSERVED_LOAD_MULTIPLIER falls below 1. This triggers the auto-scale policy, which scales up the deployment. With the additional replicas in the deployment, the service is now better equipped to handle the increased load. The OBSERVED_LOAD_MULTIPLIER rises above 1, enabling the service to meet the heightened demand. As a result, the response latency returns to a normal range, and the Load Scheduler ceases throttling.

After the scale-out cooldown period, the periodic scale-in function is triggered, which reduces the number of replicas in response to decreased load.

Auto Scale

Thanks for signing up!

Sign up for updates!

Overview​

Configuration​

Policy in Action​

Overview

Configuration

Policy in Action