本文主要解决以下几个问题:
- pod无法访问ClusterIP
- busybox做dns查询
- pod间互访及访问外网
问题1: pod无法访问ClusterIP
step0 - kube-proxy日志
从kube-proxy的日志中看到Unknown proxy mode "", assuming iptables proxy
$ kubectl logs -n kube-system kube-proxy-5n29r | more
W0720 03:22:47.942827 1 server_others.go:559] Unknown proxy mode "", assuming iptables proxy
I0720 03:22:48.245820 1 node.go:136] Successfully retrieved node IP: 10.160.18.183
I0720 03:22:48.245876 1 server_others.go:186] Using iptables Proxier.
I0720 03:22:48.246253 1 server.go:583] Version: v1.18.3
I0720 03:22:48.395170 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0720 03:22:48.395210 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0720 03:22:48.395578 1 conntrack.go:83] Setting conntrack hashsize to 32768
I0720 03:22:48.414004 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0720 03:22:48.414067 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0720 03:22:48.533638 1 config.go:315] Starting service config controller
I0720 03:22:48.533673 1 shared_informer.go:223] Waiting for caches to sync for service config
I0720 03:22:48.533997 1 config.go:133] Starting endpoints config controller
I0720 03:22:48.534016 1 shared_informer.go:223] Waiting for caches to sync for endpoints config
step1 - 安装相关包
# 在所有节点上
$ apt install ipset ipvsadm
step2 - 加载module
$ modprobe -- ip_vs
$ modprobe -- ip_vs_rr
$ modprobe -- ip_vs_wrr
$ modprobe -- ip_vs_sh
$ modprobe -- nf_conntrack
# 注:很多帖子上都写的是nf_conntrack_ipv4,但在Ubuntu20.04上,这个已经变成了nf_conntrack
# 检查配置
$ lsmod | grep -e ipvs -e nf_conntrack
nf_conntrack_netlink 45056 0
nfnetlink 16384 3 nf_conntrack_netlink,ip_set
nf_conntrack 139264 5 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE,ip_vs
nf_defrag_ipv6 24576 2 nf_conntrack,ip_vs
nf_defrag_ipv4 16384 1 nf_conntrack
libcrc32c 16384 3 nf_conntrack,nf_nat,ip_vs
# 再注:Ubuntu 20.04上这些module可能默认已经加载了
注:如果ipvs默认没有加载的话,需要写一个脚本,系统重启时也需要加载
step3 - 修改kube-proxy配置文件
修改kube-proxy的configmap中的mode字段为ipvs
$ kubectl edit configmap kube-proxy -n kube-system
...
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: "ipvs"
nodePortAddresses: null
...
step4 - 重启kube-proxy
可以逐个删除kube-proxy的pod,由k8s自动重启,也可以批量删除
$ kubectl get pod -n kube-system | grep kube-proxy |awk '{system("kubectl delete pod "$1" -n kube-system")}'
查看kube-proxy的日志
$ kubectl logs -n kube-system kube-proxy-44zw5
I0720 05:37:30.026304 1 node.go:136] Successfully retrieved node IP: 10.160.18.181
I0720 05:37:30.026349 1 server_others.go:259] Using ipvs Proxier.
W0720 05:37:30.026600 1 proxier.go:429] IPVS scheduler not specified, use rr by default
I0720 05:37:30.026814 1 server.go:583] Version: v1.18.3
I0720 05:37:30.027200 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0720 05:37:30.027452 1 config.go:133] Starting endpoints config controller
I0720 05:37:30.027474 1 shared_informer.go:223] Waiting for caches to sync for endpoints config
I0720 05:37:30.027507 1 config.go:315] Starting service config controller
I0720 05:37:30.027529 1 shared_informer.go:223] Waiting for caches to sync for service config
I0720 05:37:30.127736 1 shared_informer.go:230] Caches are synced for endpoints config
I0720 05:37:30.127790 1 shared_informer.go:230] Caches are synced for service config
可以看到Using ipvs Proxier,说明IPVS已经启用了
现在,可以启动一个busybox的container来ping一下coredns的clusterIP了
问题2: busybox做dns查询失败
step0 - 问题现象
在解决了pod无法访问dns clusterIP的问题之后,发现busybox还是无法解析到某个service的IP
kubectl run -i --tty --image busybox