在最近一次 K8s 环境的维护中,发现多个 Pod 使用相同镜像时,调度到固定节点的问题导致集群节点资源分配不均的情况。 启用调度器的打分日志后发现这一现象是由 ImageLocality 打分策略所引起的(所有的节点中,只有一个节点有运行该 pod 的镜像,所以这个节点调度器打分最高); 最终,通过禁用该插件成功解决调度不均匀的问题。在此,我想分享这一经验,希望能够对大家有所帮助!
1、自定义配置文件
# vi /etc/kubernetes/config.yaml
# 此处禁用 ImageLocality 打分插件(设置权重为 0)
...
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
clientConnection:kubeconfig: /etc/kubernetes/scheduler.conf
profiles:- schedulerName: default-schedulerplugins:multiPoint:enabled:- name: ImageLocalityweight: 0
...
- (可跳过)禁用打分插件的第二种方式,根据实际情况进行配置
# vi /etc/kubernetes/config.yaml
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:kubeconfig: /etc/kubernetes/scheduler.conf
profiles:- schedulerName: default-schedulerplugins:score:disabled:- name: ImageLocality
2、配置 kube-scheduler
# 修改 kube-scheduler.yaml
# 在 spec.containers[0].command 添加参数;并挂载配置文件
vi /etc/kubernetes/manifests/kube-scheduler.yaml
...
- --config=/etc/kubernetes/config.yaml
...- mountPath: /etc/kubernetes/config.yaml # 添加挂载name: configreadOnly: true
...- hostPath:path: /etc/kubernetes/config.yamltype: FileOrCreatename: config
...# 等待 kube-scheduler 自动重启
kubectl get pod -n kube-system# 完整 kube-scheduler.yaml 如下
apiVersion: v1
kind: Pod
metadata:creationTimestamp: nulllabels:component: kube-schedulertier: control-planename: kube-schedulernamespace: kube-system
spec:containers:- command:- kube-scheduler- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf- --bind-address=127.0.0.1- --kubeconfig=/etc/kubernetes/scheduler.conf- --leader-elect=true- --config=/etc/kubernetes/config.yaml- --v=10image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.31.0imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 8httpGet:host: 127.0.0.1path: /healthzport: 10259scheme: HTTPSinitialDelaySeconds: 10periodSeconds: 10timeoutSeconds: 15name: kube-schedulerresources:requests:cpu: 100mstartupProbe:failureThreshold: 24httpGet:host: 127.0.0.1path: /healthzport: 10259scheme: HTTPSinitialDelaySeconds: 10periodSeconds: 10timeoutSeconds: 15volumeMounts:- mountPath: /etc/kubernetes/scheduler.confname: kubeconfigreadOnly: true- mountPath: /etc/kubernetes/config.yamlname: configreadOnly: truehostNetwork: truepriority: 2000001000priorityClassName: system-node-criticalsecurityContext:seccompProfile:type: RuntimeDefaultvolumes:- hostPath:path: /etc/kubernetes/scheduler.conftype: FileOrCreatename: kubeconfig- hostPath:path: /etc/kubernetes/config.yamltype: FileOrCreatename: config
status: {}
3、验证自定义配置生效
# kube-scheduler 日志如何开启可见博客
# https://blog.csdn.net/mm1234556/article/details/148686859# 通过日志查看配置文件
kubectl logs -n kube-system kube-scheduler-master |grep -A50 apiVersion:
# 手动创建一个 pod,查看 kube-scheduler 日志中调度的打分细节
kubectl logs kube-scheduler-master -n kube-system |grep -A10 score# 详细日志,最终调度到 node04 节点上
I0623 01:28:10.258309 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node02: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3200 memory:5674337280] ,score 94,
I0623 01:28:10.258309 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node01: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3820 memory:6427770880] ,score 94,
I0623 01:28:10.258312 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node04: NodeResourcesBalancedAllocation, map of allocatable resources map[cpu:8000 memory:32122888192], map of requested resources map[cpu:1050 memory:1468006400] ,score 91,
I0623 01:28:10.258334 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node02: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3200 memory:5674337280] ,score 85,
I0623 01:28:10.258339 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node04: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:8000 memory:32122888192], map of requested resources map[cpu:1050 memory:1468006400] ,score 90,
I0623 01:28:10.258338 1 resource_allocation.go:78] mysql-server-68468bcd96-j66bj -> node01: NodeResourcesLeastAllocated, map of allocatable resources map[cpu:28000 memory:32772333568], map of requested resources map[cpu:3820 memory:6427770880] ,score 83,
I0623 01:28:10.258375 1 generic_scheduler.go:504] Plugin NodePreferAvoidPods scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 1000000} {node04 1000000} {node02 1000000}]
I0623 01:28:10.258384 1 generic_scheduler.go:504] Plugin PodTopologySpread scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 0} {node04 0} {node02 0}]
I0623 01:28:10.258389 1 generic_scheduler.go:504] Plugin TaintToleration scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 100} {node04 100} {node02 100}]
I0623 01:28:10.258393 1 generic_scheduler.go:504] Plugin NodeResourcesBalancedAllocation scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 94} {node04 91} {node02 94}]
I0623 01:28:10.258396 1 generic_scheduler.go:504] Plugin InterPodAffinity scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 0} {node04 0} {node02 0}]
I0623 01:28:10.258400 1 generic_scheduler.go:504] Plugin NodeResourcesLeastAllocated scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 83} {node04 90} {node02 85}]
I0623 01:28:10.258404 1 generic_scheduler.go:504] Plugin NodeAffinity scores on test-5/mysql-server-68468bcd96-j66bj => [{node01 0} {node04 0} {node02 0}]
I0623 01:28:10.258409 1 generic_scheduler.go:560] Host node01 => Score 1000277
I0623 01:28:10.258412 1 generic_scheduler.go:560] Host node04 => Score 1000281
I0623 01:28:10.258414 1 generic_scheduler.go:560] Host node02 => Score 1000279