Version: 2.8.0

Load Scheduling with Average Latency Feedback

Introduction

This policy detects traffic overloads and cascading failure build-up by comparing the real-time latency with its exponential moving average. A gradient controller calculates a proportional response to limit accepted concurrency. The concurrency is reduced by a multiplicative factor when the service is overloaded, and increased by an additive factor while the service is no longer overloaded.

At a high level, this policy works as follows:

Latency EMA-based overload detection: A Flux Meter is used to gather latency metrics from a service control point. The latency signal gets fed into an Exponential Moving Average (EMA) component to establish a long-term trend that can be compared to the current latency to detect overloads.
Gradient Controller: Set point latency and current latency signals are fed to the gradient controller that calculates the proportional response to adjust the accepted concurrency (Control Variable).
Integral Optimizer: When the service is detected to be in the normal state, an integral optimizer is used to additively increase the concurrency of the service in each execution cycle of the circuit. This design allows warming-up a service from an initial inactive state. This also protects applications from sudden spikes in traffic, as it sets an upper bound to the concurrency allowed on a service in each execution cycle of the circuit based on the observed incoming concurrency.
Load Scheduler and Actuator: The Accepted Concurrency at the service is throttled by a weighted-fair queuing scheduler. The output of the adjustments to accepted concurrency made by gradient controller and optimizer logic are translated to a load multiplier that is synchronized with Aperture Agents through etcd. The load multiplier adjusts (increases or decreases) the token bucket fill rates based on the incoming concurrency observed at each agent.

info

Please see reference for the AdaptiveLoadScheduler component that is used within this blueprint.

info

See the use-cases Adaptive Service Protection with Average Latency Feedback and Workload Prioritization to see this blueprint in use.

Configuration

Blueprint name: load-scheduling/average-latency

Parameters

policy

Parameter	`policy.components`
Description	List of additional circuit components.
Type	Array of Object (aperture.spec.v1.Component)
Default Value	Expand `[]`

Parameter	`policy.policy_name`
Description	Name of the policy.
Type	string
Default Value	`__REQUIRED_FIELD__`

Parameter	`policy.resources`
Description	Additional resources.
Type	Object (aperture.spec.v1.Resources)
Default Value	Expand `flow_control: classifiers: []`

Parameter	`policy.evaluation_interval`
Description	The interval between successive evaluations of the Circuit.
Type	string
Default Value	`10s`

policy.service_protection_core

Parameter	`policy.service_protection_core.adaptive_load_scheduler`
Description	Parameters for Adaptive Load Scheduler.
Type	Object (aperture.spec.v1.AdaptiveLoadSchedulerParameters)
Default Value	Expand `alerter: alert_name: Load Throttling Event gradient: max_gradient: 1 min_gradient: 0.1 slope: -1 load_multiplier_linear_increment: 0.025 load_scheduler: selectors: - control_point: __REQUIRED_FIELD__ service: __REQUIRED_FIELD__ max_load_multiplier: 2`

Parameter	`policy.service_protection_core.dry_run`
Description	Default configuration for setting dry run mode on Load Scheduler. In dry run mode, the Load Scheduler acts as a passthrough and does not throttle flows. This config can be updated at runtime without restarting the policy.
Type	Boolean
Default Value	`false`

Parameter	`policy.service_protection_core.kubelet_overload_confirmations`
Description	Overload confirmation signals from kubelet.
Type	Object (kubelet_overload_confirmations)
Default Value	Expand `{}`

Parameter	`policy.service_protection_core.overload_confirmations`
Description	List of overload confirmation criteria. Load scheduler can throttle flows when all of the specified overload confirmation criteria are met.
Type	Array of Object (overload_confirmation)
Default Value	Expand `[]`

policy.latency_baseliner

Parameter	`policy.latency_baseliner.flux_meter`
Description	Flux Meter defines the scope of latency measurements.
Type	Object (aperture.spec.v1.FluxMeter)
Default Value	Expand `selectors: - control_point: __REQUIRED_FIELD__ service: __REQUIRED_FIELD__`

Parameter	`policy.latency_baseliner.latency_tolerance_multiplier`
Description	Tolerance factor beyond which the service is considered to be in overloaded state. E.g. if the long-term average of latency is L and if the tolerance is T, then the service is considered to be in an overloaded state if the short-term average of latency is more than L*T.
Type	Number (double)
Default Value	`1.25`

Parameter	`policy.latency_baseliner.long_term_query_interval`
Description	Interval for long-term latency query, i.e., how far back in time the query is run. The value should be a string representing the duration in seconds.
Type	string
Default Value	`1800s`

Parameter	`policy.latency_baseliner.long_term_query_periodic_interval`
Description	Periodic interval for long-term latency query, i.e., how often the query is run. The value should be a string representing the duration in seconds.
Type	string
Default Value	`30s`

dashboard

Parameter	`dashboard.extra_filters`
Description	Additional filters to pass to each query to Grafana datasource.
Type	Object (map[string]string)
Default Value	Expand `{}`

Parameter	`dashboard.refresh_interval`
Description	Refresh interval for dashboard panels.
Type	string
Default Value	`15s`

Parameter	`dashboard.time_from`
Description	From time of dashboard.
Type	string
Default Value	`now-15m`

Parameter	`dashboard.time_to`
Description	To time of dashboard.
Type	string
Default Value	`now`

Parameter	`dashboard.title`
Description	Name of the main dashboard.
Type	string
Default Value	`Aperture Service Protection`

dashboard.datasource

Parameter	`dashboard.datasource.filter_regex`
Description	Datasource filter regex.
Type	string
Default Value

Parameter	`dashboard.datasource.name`
Description	Datasource name.
Type	string
Default Value	`$datasource`

Schemas

driver_criteria

Parameter	`enabled`
Description	Enables the driver.
Type	Boolean
Default Value	`__REQUIRED_FIELD__`

Parameter	`threshold`
Description	Threshold for the driver.
Type	Number (double)
Default Value	`__REQUIRED_FIELD__`

overload_confirmation_driver

Parameter	`pod_cpu`
Description	The driver for using CPU usage as overload confirmation.
Type	Object (driver_criteria)
Default Value	Expand `{}`

Parameter	`pod_memory`
Description	The driver for using CPU usage as overload confirmation.
Type	Object (driver_criteria)
Default Value	Expand `{}`

kubelet_overload_confirmations

Parameter	`criteria`
Description	Criteria for overload confirmation.
Type	Object (overload_confirmation_driver)
Default Value	`__REQUIRED_FIELD__`

Parameter	`infra_context`
Description	Kubernetes selector for scraping metrics.
Type	Object (aperture.spec.v1.KubernetesObjectSelector)
Default Value	`__REQUIRED_FIELD__`

overload_confirmation

Parameter	`operator`
Description	The operator for the overload confirmation criteria. oneof: `gt \| lt \| gte \| lte \| eq \| neq`
Type	string
Default Value

Parameter	`query_string`
Description	The Prometheus query to be run. Must return a scalar or a vector with a single element.
Type	string
Default Value

Parameter	`threshold`
Description	The threshold for the overload confirmation criteria.
Type	Number (double)
Default Value

Dynamic Configuration

note

The following configuration parameters can be dynamically configured at runtime, without reloading the policy.

Parameters

Parameter	`dry_run`
Description	Dynamic configuration for setting dry run mode at runtime without restarting this policy. In dry run mode the scheduler acts as pass through to all flow and does not queue flows. It is useful for observing the behavior of load scheduler without disrupting any real traffic.
Type	Boolean
Default Value	`__REQUIRED_FIELD__`

Thanks for signing up!

Sign up for updates!

Introduction​

Configuration​

Parameters​

policy​

policy.service_protection_core​

policy.latency_baseliner​

dashboard​

dashboard.datasource​

Schemas​

driver_criteria​

overload_confirmation_driver​

kubelet_overload_confirmations​

overload_confirmation​

Dynamic Configuration​

Parameters​

Introduction

Configuration

Parameters

policy

policy.service_protection_core

policy.latency_baseliner

dashboard

dashboard.datasource

Schemas

driver_criteria

overload_confirmation_driver

kubelet_overload_confirmations

overload_confirmation

Dynamic Configuration

Parameters