说明
之前有写过使用k8s部署redis exporter监控所有的Redis实例,简单分析了关于redis的监控和告警部署与配置。
本文将结合ConsulManager
部署一个redis exporter
监控所有的Redis实例。
部署 redis exporter
这里提供两种部署方式,我这里选择使用k8s部署方式,大家按需选择:
-
使用
docker-compose
部署 -
使用
k8s
部署
使用 docker-compose 部署 exporter
新建一个docker-compose.yml,内容如下:
version: "3.2"
services:
redis-exporter:
image: oliver006/redis_exporter
container_name: redis-exporter
restart: unless-stopped
command:
- "-redis.password-file=/redis_passwd.json"
volumes:
- /usr/share/zoneinfo/PRC:/etc/localtime
- /data/redis-exporter/redis_passwd.json:/redis_passwd.json
expose:
- 9121
network_mode: "host"
新建一个redis的实例地址与密码文件,/data/redis-exporter/redis_passwd.json:
{
"redis://:6379":"",
"redis://:6379":"q1azw2sx"
}
-
docker-compose中挂载配置文件文件的本地路径注意根据实际情况修改。
-
配置文件的格式为json,每行一个实例的信息格式为:"redis://实例地址端口":"redis密码"
-
实例地址端口请查看云REDIS列表或自建redis管理的实例字段。
-
如redis无密码,保留空双引号即可""。
启动:
docker-compose up -d
更多详情,请参考官网。
使用 k8s 部署 export
新建一个redis-exporter.yaml
文件,内容如下:
cat > redis-exporter.yaml <<EOF
---
apiVersion: v1
data:
redis_passwd.json: |
{
"redis://192.168.10.2:6379":"test@2000",
"redis://192.168.10.3:6379":"test@2000",
"redis://192.168.10.4:6379":""
}
kind: ConfigMap
metadata:
name: redis-passwd-cm
namespace: kubesphere-monitoring-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: redis-exporter-prod
name: redis-exporter-prod
namespace: kubesphere-monitoring-system
spec:
replicas: 1
selector:
matchLabels:
app: redis-exporter-prod
template:
metadata:
labels:
app: redis-exporter-prod
spec:
containers:
- name: redis-exporter
image: oliver006/redis_exporter:latest
env:
- name: TZ
value: "Asia/Shanghai"
args:
- "-redis.password-file=/opt/redis_passwd.json"
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- name: http-metrics
containerPort: 9121
protocol: TCP
volumeMounts:
- name: redis-passwd-conf-map
mountPath: "/opt"
volumes:
- name: redis-passwd-conf-map
configMap:
name: redis-passwd-cm
---
apiVersion: v1
kind: Service
metadata:
labels:
app: redis-exporter-prod
name: redis-exporter-prod
namespace: kubesphere-monitoring-system
spec:
ports:
- name: http-metirc
protocol: TCP
port: 9121
targetPort: 9121
selector:
app: redis-exporter-prod
EOF
部署 export,命令如下:
kubectl apply -f redis-exporter.yaml
Prometheus 自动发现配置
下面提供一个样例,也可以在consulmanager上进行配置生成:
cat > prometheus-additional.yaml << EOF
- job_name: redis_exporter
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /scrape
consul_sd_configs:
- server: '192.168.10.60:8500' ## consul 服务地址和端口
token: 'fe48c9a4-364e-af23-81df-9f28303012af'
refresh_interval: 30s
services: ['selfredis_exporter']
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*OFF.*
action: drop
- source_labels: [__meta_consul_service_address,__meta_consul_service_port]
regex: ([^:]+)(?::\d+)?;(\d+)
target_label: __param_target
replacement: $1:$2
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: redis-exporter-prod.kubesphere-monitoring-system.svc:9121 ## redis exporter 服务地址和端口
- source_labels: ['__meta_consul_service_metadata_vendor']
target_label: vendor
- source_labels: ['__meta_consul_service_metadata_region']
target_label: region
- source_labels: ['__meta_consul_service_metadata_group']
target_label: group
- source_labels: ['__meta_consul_service_metadata_account']
target_label: account
- source_labels: ['__meta_consul_service_metadata_name']
target_label: name
- source_labels: ['__meta_consul_service_metadata_iid']
target_label: iid
- source_labels: ['__meta_consul_service_metadata_mem']
target_label: mem
- source_labels: ['__meta_consul_service_metadata_itype']
target_label: itype
- source_labels: ['__meta_consul_service_metadata_ver']
target_label: ver
- source_labels: ['__meta_consul_service_metadata_exp']
target_label: exp
EOF
加载上述自动发现配置,我这边prometheus无需手动更新:
kubectl delete secret additional-configs -n kubesphere-monitoring-system
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n kubesphere-monitoring-system
Grafana 看板
Grafana 看板详情,样例如下:
告警规则
样例如下:
apiVersion: /v1
kind: PrometheusRule
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: consul-redis-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 1.3.1
prometheus: k8s
role: alert-rules
name: consul-redis-exporter-rules
namespace: kubesphere-monitoring-system
spec:
groups:
- name: REDIS-Alert
rules:
- alert: RedisDown
expr: redis_up == 0
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis down (instance {{ $labels.instance }})
description: "Redis instance is down\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisMissingMaster
expr: (count(redis_instance_info{role="master"}) by (name,region,vendor,instance)) < 1
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis missing master (instance {{ $labels.instance }})
description: "Redis cluster has no node marked as master.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisTooManyMasters
expr: count(redis_instance_info{role="master"}) by (name,region,vendor,instance) > 1
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis too many masters (instance {{ $labels.instance }})
description: "Redis cluster has too many nodes marked as master.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisDisconnectedSlaves
expr: count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 > 1
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis disconnected slaves (instance {{ $labels.instance }})
description: "Redis not replicating for all slaves. Consider reviewing the redis replication status.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisReplicationBroken
expr: delta(redis_connected_slaves[2m]) < 0
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis replication broken (instance {{ $labels.instance }})
description: "Redis instance lost a slave\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisClusterFlapping
expr: changes(redis_connected_slaves[1m]) > 1
for: 2m
labels:
severity: 紧急
annotations:
summary: Redis cluster flapping (instance {{ $labels.instance }})
description: "Changes have been detected in Redis replica connection. This can occur when replica nodes lose connection to the master and reconnect (a.k.a flapping).\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisMissingBackup
expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis missing backup (instance {{ $labels.instance }})
description: "Redis has not been backuped for 24 hours\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
# The exporter must be started with --include-system-metrics flag or REDIS_EXPORTER_INCL_SYSTEM_METRICS=true environment variable.
- alert: RedisOutOfSystemMemory
expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
for: 2m
labels:
severity: 警告
annotations:
summary: Redis out of system memory (instance {{ $labels.instance }})
description: "Redis is running out of system memory (> 90%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
#- alert: RedisOutOfConfiguredMaxmemory
# expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90
# for: 2m
# labels:
# severity: 警告
# annotations:
# summary: Redis out of configured maxmemory (instance {{ $labels.instance }})
# description: "Redis is running out of configured maxmemory (> 90%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisTooManyConnections
expr: redis_connected_clients > 100
for: 2m
labels:
severity: 警告
annotations:
summary: Redis too many connections (instance {{ $labels.instance }})
description: "Redis instance has too many connections\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisNotEnoughConnections
expr: redis_connected_clients < 1
for: 2m
labels:
severity: 警告
annotations:
summary: Redis not enough connections (instance {{ $labels.instance }})
description: "Redis instance should have more connections (> 5)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisRejectedConnections
expr: increase(redis_rejected_connections_total[2m]) > 0
for: 0m
labels:
severity: 紧急
annotations:
summary: Redis rejected connections (instance {{ $labels.instance }})
description: "Some connections to Redis has been rejected\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
钉钉报警
告警和告警恢复样例如下~
告警样例:
告警恢复样例:
参考文档
部署一个redis exporter监控所有的Redis实例