当前位置：首页 > news >正文

用statefulset部署redis集群-因podIP变化造成集群状态异常解决办法

news 2025/12/16 17:45:19

背景

1、当redis集群某pod重建后Pod IP发生变化，redis服务并不能发现并去更新集群node配置文件中的相应节点的IP 地址，使得集群节点之间发生无法通信现象，进而导致集群异常出问题。
2、初始化redis cluster 集群使用域名连接各个节点会报错，使得headless service 无用武之地

/redis/client.rb:126:in `call’: ERR Invalid node address specified: redis-cluster. redis-service.redis-app-0.svc.cluster.local:6379 (Redis::CommandError)

集群拓补架构

在这里插入图片描述

集群状态文件

redis集群配置文件configmap

kubectl apply -f redis-configmap.yaml -n redis-clu-9

apiVersion: v1
kind: ConfigMap
metadata:name: redis-confnamespace: redis-cluster
data:fix-pod-ip.sh: |#!/bin/shCLUSTER_CONFIG="/var/lib/redis/nodes.conf"
if [ -f ${CLUSTER_CONFIG} ]; then# 如果获取新pod ip 失败给出提示输出，也可以发送告警通知if [ -z "${POD_IP}" ]; thenecho "Unable to determine Pod IP address!"### do something here ###exit 1fi# 替换集群node配置文件中旧pod IPsed -i -e '/myself/ s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/'${POD_IP}'/' ${CLUSTER_CONFIG}### 检查IP是否替换成功count=`grep -c ${POD_IP} ${CLUSTER_CONFIG}`if [[ $count > 0 ]];thenecho " Updating my Ip to ${POD_IP} in ${CLUSTER_CONFIG}"fifiexec "$@"redis.conf: |# 在这里粘贴你的 redis.conf 文件的内容  # 或者你可以使用 kubectl create configmap --dry-run -o yaml ... 来生成#开启集群模式cluster-enabled yes# 监听ipbind 0.0.0.0port 6379#保护模式protected-mode no#redis后台运行#daemonize yes#设置客户端连接的超时时间，避免长时间占用连接资源timeout 300#设置集群节点之间通信的超时时间cluster-node-timeout 5000#指定集群配置文件名称cluster-config-file /var/lib/redis/nodes.conf#数据存储目录dir /var/lib/redis#设置同时连接客户端的最大数量maxclients 10000#指定服务器冗余级别loglevel notice############### 设置Redis能够使用的最大内存量。这有助于防止Redis因内存耗尽而崩溃maxmemory 1000mb#内存淘汰策略，根据业务特性配置适合的策略maxmemory-policy volatile-lru# 数据持久化appendonly yes#设置AOF的同步策略,如everysec(每秒同步一次)以平衡性能和数据安全性。appendfsync everysec#AOF文件名appendfilename "appendonly.aof"#自动重写附加文件条件配置auto-aof-rewrite-percentage 100auto-aof-rewrite-min-size 64mb# 允许在AOF重写期间进行增量fsync操作它可以帮助减少延迟并减轻fsync对应用程序性能的影响aof-rewrite-incremental-fsync yes# 优化内存使用hash-max-ziplist-entries 512hash-max-ziplist-value 64kb
# 频率
# 值越大redis 响应时间越短性能较好但同时资源消(cpu)耗也会增大
# 最好结合监控数据如： cpu memery network 来测试验证设定
# 也可以结合业务性能要求来设置hz 10

redis集群svc

kubectl apply  -f  headless-service.yaml

apiVersion: v1
kind: Service
metadata:name: redis-servicenamespace: redis-clusterlabels:app: redis
spec:ports:- name: redis-port
port: 6379- name: redis-cluster-portport: 16379clusterIP: Noneselector:app: redisappCluster: redis-cluster

暴漏访问集群入口

kubectl apply  -f  redis-cluster-access-service.yaml

apiVersion: v1
kind: Service
metadata:name: redis-cluster-access-servicenamespace: redis-clusterlabels:app: redis
spec:ports:- name: redis-cluster-portprotocol: TCPport: 6379targetPort: 6379nodePort: 34476selector:app: redisappCluster: redis-clustertype: NodePort

redis集群statefulset文件

kubectl apply  -f  redis- statefulset.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:name: redis-appnamespace: redis-cluster
spec:serviceName: "redis-service"selector:matchLabels:app: redisreplicas: 6template:metadata:labels:app: redisappCluster: redis-clusterspec:terminationGracePeriodSeconds: 20affinity:podAntiAffinity:preferredDuringSchedulingIgnoredDuringExecution:- weight: 100podAffinityTerm:labelSelector:matchExpressions:- key: appoperator: Invalues:- redistopologyKey: kubernetes.io/hostnamecontainers:- name: redisimage: redis:6.2.12command: ["/etc/redis/fix-pod-ip.sh", "redis-server", "/etc/redis/redis.conf"]resources:requests:cpu: "100m"    #此处根据业务需求修改memory: "100Mi" #此处根据业务需求修改env:- name: POD_IP # 获取新pod ipvalueFrom:fieldRef:fieldPath: status.podIPports:- name: rediscontainerPort: 6379protocol: "TCP"- name: clustercontainerPort: 16379protocol: "TCP"volumeMounts:- name: "redis-conf"mountPath: "/etc/redis"readOnly: false- name: "redis-data"mountPath: "/var/lib/redis"readOnly: falsevolumes:- name: "redis-conf"configMap:name: "redis-conf"defaultMode: 0755volumeClaimTemplates:- metadata:name: redis-datanamespace: redis-clusterspec:accessModes: [ "ReadWriteMany" ]resources:requests:storage: 1000MstorageClassName: nfs

测试&结果

模拟测试

手动模拟一台从库使其故障后重启，通过修改主节点镜像的方式故障

kubectl edit -n redis-cluster po redis-app-2 -o yaml

在这里插入图片描述

测试结果

然后观察pod日志输出
在这里插入图片描述
这里可以看到pod 重启后由于IP发生了变化，fix-pod-ip.sh 脚本检测到后更新到集群node配置文件 /var/lib/redis/nodes.conf 中，集群检测到node配置文件变化使用新的pod ip和master 库开始同步直到主从同步状态成功。

结论

1、fix-pod-ip.sh 脚本的作用用于当redis集群某pod重建后Pod IP发生变化，在/data/nodes.conf中将新的Pod IP替换原Pod IP，不然集群会出问题。
2、使用configmap中添加更新脚本配置，并在pod 中挂载，在pod 状态变更之后，再启动redis 实例之前将node.conf集群配置文件更新为正确的ip地址，从而能大概率避免集群异常
3、这种方法并非完美方案，存在更新pod失败的情况，需要再脚本中添加告警监控时间来第一时间监控到集群pod节点异常，并介入处理

查看全文

http://www.mrgr.cn/news/53179.html