背景:
在Kubernetes中,为了实现组件高可用,同一个组件需要部署多个副本,例如多个apiserver、scheduler、controller-manager等,其中apiserver是无状态的,每个组件都可以工作,而scheduler与controller-manager是有状态的,同一时刻只能存在一个活跃的,需要进行选主。
Kubernetes中是通过leaderelection来实现组件的高可用的。在Kubernetes本身的组件中,kube-scheduler和kube-manager-controller两个组件是有leader选举的,这个选举机制是Kubernetes对于这两个组件的高可用保障。即正常情况下kube-scheduler或kube-manager-controller组件的多个副本只有一个是处于业务逻辑运行状态,其它副本则不断的尝试去获取锁,去竞争leader,直到自己成为leader。如果正在运行的leader因某种原因导致当前进程退出,或者锁丢失,则由其它副本去竞争新的leader,获取leader继而执行业务逻辑。
不光是Kubernetes本身组件用到了这个选举策略,我们自己定义的服务同样可以用这个算法去实现选主。在Kubernetes client-go包中就提供了接口供用户使用。代码路径在client-go/tools/leaderelection下。
改造方案:
无状态组件:调整实例数,比较简单,由于是无状态直接调整实例数就行
有状态组件:使用选举的方式,只有一个是处于业务逻辑运行状态,其它副本则不断的尝试去获取锁
下面详细说下选举的实现方式:
选举可以参考代码:
/* Copyright 2018 The Kubernetes Authors. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ package main import ( "context" "flag" "os" "os/signal" "syscall" "time" "github.com/google/uuid" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" clientset "k8s.io/client-go/kubernetes" "k8s.io/client-go/rest" "k8s.io/client-go/tools/clientcmd" "k8s.io/client-go/tools/leaderelection" "k8s.io/client-go/tools/leaderelection/resourcelock" "k8s.io/klog/v2" ) func buildConfig(kubeconfig string) (*rest.Config, error) { if kubeconfig != "" { cfg, err := clientcmd.BuildConfigFromFlags( "" , kubeconfig) if err != nil { return nil, err } return cfg, nil } cfg, err := rest.InClusterConfig() if err != nil { return nil, err } return cfg, nil } func main() { klog.InitFlags(nil) var kubeconfig string var leaseLockName string var leaseLockNamespace string var id string flag.StringVar(&kubeconfig, "kubeconfig" , "" , "absolute path to the kubeconfig file" ) flag.StringVar(&id, "id" , uuid.New().String(), "the holder identity name" ) flag.StringVar(&leaseLockName, "lease-lock-name" , "" , "the lease lock resource name" ) flag.StringVar(&leaseLockNamespace, "lease-lock-namespace" , "" , "the lease lock resource namespace" ) flag.Parse() if leaseLockName == "" { klog.Fatal( "unable to get lease lock resource name (missing lease-lock-name flag)." ) } if leaseLockNamespace == "" { klog.Fatal( "unable to get lease lock resource namespace (missing lease-lock-namespace flag)." ) } // leader election uses the Kubernetes API by writing to a // lock object, which can be a LeaseLock object (preferred), // a ConfigMap, or an Endpoints (deprecated) object. // Conflicting writes are detected and each client handles those actions // independently. // 获取当前集群的kubeconfig文件 client := clientset.NewForConfigOrDie(ctrl.GetConfigOrDie()) // 启动进行业务处理 run := func(ctx context.Context) { // complete your controller loop here klog.Info( "Controller loop..." ) select {} } // use a Go context so we can tell the leaderelection code when we // want to step down ctx, cancel := context.WithCancel(context.Background()) defer cancel() // listen for interrupts or the Linux SIGTERM signal and cancel // our context, which the leader election code will observe and // step down ch := make(chan os.Signal, 1 ) signal.Notify(ch, os.Interrupt, syscall.SIGTERM) go func() { <-ch klog.Info( "Received termination, signaling shutdown" ) cancel() }() // we use the Lease lock type since edits to Leases are less common // and fewer objects in the cluster watch "all Leases". // 选择leases作为锁 lock := &resourcelock.LeaseLock{ LeaseMeta: metav1.ObjectMeta{ Name: leaseLockName, Namespace: leaseLockNamespace, }, Client: client.CoordinationV1(), LockConfig: resourcelock.ResourceLockConfig{ Identity: id, }, } // start the leader election code loop leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{ Lock: lock, // IMPORTANT: you MUST ensure that any code you have that // is protected by the lease must terminate **before** // you call cancel. Otherwise, you could have a background // loop still running and another process could // get elected before your background loop finished, violating // the stated goal of the lease. ReleaseOnCancel: true , LeaseDuration: 60 * time.Second, RenewDeadline: 15 * time.Second, RetryPeriod: 5 * time.Second, Callbacks: leaderelection.LeaderCallbacks{ OnStartedLeading: func(ctx context.Context) { // we're notified when we start - this is where you would // usually put your code // 成为leader后执行的逻辑 run(ctx) }, OnStoppedLeading: func() { // we can do cleanup here klog.Infof( "leader lost: %s" , id) // 失去leader后执行的逻辑 os.Exit( 0 ) }, OnNewLeader: func(identity string) { // we're notified when new leader elected // 新leader选举后执行的逻辑 if identity == id { // I just got the lock return } klog.Infof( "new leader elected: %s" , identity) }, }, }) } |
业务代码嵌入:
通常做法在业务代码里面嵌入选举逻辑,当选leader后再提供服务,不再是leader则停止提供服务。
优点:资源占用少,不需要sidecar,业务容器只有一个提供服务
缺点:需要修改业务代码,官方有对应的库,改动量应该不大
sidecar模式1:
不改动业务代码,部署业务时,嵌入leader选举的sidecar
sidecar 实现思路:
成为leader则启动web服务,监听8080端口,不是leader则停止web服务
部署的deployment中,指定sidecar容器的readinessProbe为8080,监听web服务的状态
优点:不需要改动业务代码,另外嵌入一个sidecar容器做选举
缺点:sidecar只有一个处于ready状态,更新策略需要修改为recreate,重启时需要等待实例退出后再重新启动,高可用也会有影响
$ kubectl get pod -n default | grep election-example
election-example-789687f864-mvtkn 1/2 Running 0 89s
election-example-789687f864-rb5gt 2/2 Running 0 3m22s
查看deployment状态,只有一个处于ready状态
$kubectl get deployments -n default election-example
NAME READY UP-TO-DATE AVAILABLE AGE
election-example 0/2 2 0 34d
sidecar模式2:
简单改动业务代码,部署业务时,嵌入leader选举的sidecar
思路:
sidecar启动web服务,监听8080端口,通过http接口, 返回自己是否leader
业务容器不需要自己实现选举代码,但是需要隔一段时间检查sidecar是否是leader,是leader则提供服务
适用于业务难以直接实现选举代码的折中方案
权限控制:
需要实现选举,则需要给业务赋予相应的权限
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: election-example namespace: default rules: - apiGroups: - "coordination.k8s.io" resources: - leases verbs: - get - create - update - apiGroups: - "" resources: - configmaps - endpoints verbs: - get - create - update --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: election-example namespace: default subjects: - kind: ServiceAccount name: election-example roleRef: kind: Role name: election-example apiGroup: rbac.authorization.k8s.io --- apiVersion: v1 kind: ServiceAccount metadata: name: election-example namespace: default |