Service Protection for PostgreSQL
Introduction
This policy detects traffic overloads and cascading failure build-up on PostgreSQL by checking the real-time percentage of PostgreSQL connections against the maximum number of connections.
It also uses the CPU utilization ratio of the PostgreSQL pod to confirm traffic overloads and cascading failure build-up. The CPU utilization ratio is the percentage of CPU used by the PostgreSQL pod divided by the total CPU available to the pod.
All the PostgreSQL related metrics are collected by the PostgreSQL OpenTelemetry Collector so if the system under observation requires using different metrics for the overload confirmation, the list of available metrics can be used to configure the policy.
A gradient controller calculates a proportional response to limit accepted concurrency. The concurrency is reduced by a multiplicative factor when the service is overloaded, and increased by an additive factor while the service is no longer overloaded.
Please see reference for the
AdaptiveLoadScheduler
component that is used within this blueprint.
Configuration
Blueprint name: policies/service-protection/postgresql
Parameters
policy
Parameter | policy.components |
Description | List of additional circuit components. |
Type | Array of Object (aperture.spec.v1.Component) |
Default Value | Expand
|
Parameter | policy.evaluation_interval |
Description | The interval between successive evaluations of the Circuit. |
Type | string |
Default Value | 10s |
Parameter | policy.policy_name |
Description | Name of the policy. |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Parameter | policy.promql_query |
Description | PromQL query to detect PostgreSQL overload. |
Type | string |
Default Value | (sum(postgresql_backends) / sum(postgresql_connection_max)) * 100 |
Parameter | policy.resources |
Description | Additional resources. |
Type | Object (aperture.spec.v1.Resources) |
Default Value | Expand
|
Parameter | policy.setpoint |
Description | Setpoint. |
Type | Number (double) |
Default Value | __REQUIRED_FIELD__ |
Parameter | policy.postgresql |
Description | Configuration for PostgreSQL OpenTelemetry receiver. Refer https://docs.fluxninja.com/integrations/metrics/postgresql for more information. |
Type | Object (postgresql) |
Default Value | Expand
|
policy.service_protection_core
Parameter | policy.service_protection_core.adaptive_load_scheduler |
Description | Parameters for Adaptive Load Scheduler. |
Type | Object (aperture.spec.v1.AdaptiveLoadSchedulerParameters) |
Default Value | Expand
|
Parameter | policy.service_protection_core.dry_run |
Description | Default configuration for setting dry run mode on Load Scheduler. In dry run mode, the Load Scheduler acts as a passthrough and does not throttle flows. This config can be updated at runtime without restarting the policy. |
Type | Boolean |
Default Value | false |
Parameter | policy.service_protection_core.overload_confirmations |
Description | List of overload confirmation criteria. Load scheduler can throttle flows when all of the specified overload confirmation criteria are met. |
Type | Array of Object (overload_confirmation) |
Default Value | Expand
|
policy.service_protection_core.cpu_overload_confirmation
Parameter | policy.service_protection_core.cpu_overload_confirmation.operator |
Description | The operator for the overload confirmation criteria. oneof: `gt | lt | gte | lte | eq | neq`. |
Type | string |
Default Value | gte |
Parameter | policy.service_protection_core.cpu_overload_confirmation.query_string |
Description | The Prometheus query to be run to get the PostgreSQL CPU utilization. Must return a scalar or a vector with a single element. |
Type | string |
Default Value | avg(k8s_pod_cpu_utilization_ratio{k8s_statefulset_name="__REQUIRED_FIELD__"}) |
Parameter | policy.service_protection_core.cpu_overload_confirmation.threshold |
Description | Threshold value for CPU utilizatio if it has to be used as overload confirmation. |
Type | Number (double) |
Default Value |
|
dashboard
Parameter | dashboard.extra_filters |
Description | Additional filters to pass to each query to Grafana datasource. |
Type | Object (map[string]string) |
Default Value | Expand
|
Parameter | dashboard.refresh_interval |
Description | Refresh interval for dashboard panels. |
Type | string |
Default Value | 15s |
Parameter | dashboard.time_from |
Description | Time from of dashboard. |
Type | string |
Default Value | now-15m |
Parameter | dashboard.time_to |
Description | Time to of dashboard. |
Type | string |
Default Value | now |
Parameter | dashboard.title |
Description | Name of the main dashboard. |
Type | string |
Default Value | Aperture Service Protection for PostgreSQL |
dashboard.datasource
Parameter | dashboard.datasource.filter_regex |
Description | Datasource filter regex. |
Type | string |
Default Value |
|
Parameter | dashboard.datasource.name |
Description | Datasource name. |
Type | string |
Default Value | $datasource |
Schemas
overload_confirmation
Parameter | operator |
Description | The operator for the overload confirmation criteria. oneof: `gt | lt | gte | lte | eq | neq` |
Type | string |
Default Value |
|
Parameter | query_string |
Description | The Prometheus query to be run. Must return a scalar or a vector with a single element. |
Type | string |
Default Value |
|
Parameter | threshold |
Description | The threshold for the overload confirmation criteria. |
Type | Number (double) |
Default Value |
|
postgresql
Parameter | agent_group |
Description | Name of the Aperture Agent group. |
Type | string |
Default Value | default |
Parameter | collection_interval |
Description | This receiver collects metrics on an interval. |
Type | string |
Default Value |
|
Parameter | database |
Description | The list of databases for which the receiver will attempt to collect statistics. |
Type | Array of string |
Default Value |
|
Parameter | endpoint |
Description | Endpoint of the PostgreSQL. |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Parameter | initial_delay |
Description | Defines how long this receiver waits before starting. |
Type | string |
Default Value |
|
Parameter | password |
Description | Password of the PostgreSQL. |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Parameter | transport |
Description | The transport protocol being used to connect to postgresql. Available options are tcp and unix. |
Type | string |
Default Value |
|
Parameter | username |
Description | Username of the PostgreSQL. |
Type | string |
Default Value | __REQUIRED_FIELD__ |
tls
Parameter | ca_file |
Description | A set of certificate authorities used to validate the database server SSL certificate. |
Type | string |
Default Value |
|
Parameter | cert_file |
Description | A cerficate used for client authentication, if necessary. |
Type | string |
Default Value |
|
Parameter | insecure |
Description | Whether to enable client transport security for the postgresql connection. |
Type | Boolean |
Default Value |
|
Parameter | insecure_skip_verify |
Description | Whether to validate server name and certificate if client transport security is enabled. |
Type | Boolean |
Default Value |
|
Parameter | key_file |
Description | An SSL key used for client authentication, if necessary. |
Type | string |
Default Value |
|
Dynamic Configuration
The following configuration parameters can be dynamically configured at runtime, without reloading the policy.
Parameters
Parameter | dry_run |
Description | Dynamic configuration for setting dry run mode at runtime without restarting this policy. In dry run mode the scheduler acts as pass through to all flow and does not queue flows. It is useful for observing the behavior of load scheduler without disrupting any real traffic. |
Type | Boolean |
Default Value | __REQUIRED_FIELD__ |