多集群网络Submariner集成

机器人
摘要
lomtom

介绍

在当今的云计算和容器化环境中,企业为了满足多样化的业务需求,常常会部署多个Kubernetes集群。例如,将生产环境和测试环境分别部署在不同的集群中,这样可以避免测试活动对生产环境造成影响;或者将不同地域的数据中心作为独立的集群,以提高数据处理的效率和响应速度。

然而,这些集群之间往往需要进行网络通信,以实现资源共享、服务调用等功能。比如,一个微服务架构的应用可能会将不同的服务部署在不同的集群中,这些服务之间需要进行相互调用;或者企业需要在不同地域的数据中心之间进行数据同步和备份。

Submariner为解决跨集群网络通信问题提供了一种有效的解决方案。它能够实现不同集群中的Pod和Service之间的通信,就好像它们在同一个集群中一样。接下来,我们将详细介绍如何安装和配置Submariner。

安装

前提条件

在开始安装Submariner之前,我们需要确保满足以下前提条件。这些条件是Submariner能够正常工作的基础,每一个条件都有其重要的作用。

  1. 准备至少两个Kubernetes集群:每个集群至少有一个节点,并且集群之间能够互相通信。这是因为Submariner的主要功能是实现跨集群的网络通信,如果集群之间无法通信,那么Submariner也就无法发挥作用。
  2. 集群之间的Pod CIDR和Service CIDR尽量不重叠:否则需要使用Globalnet。Pod CIDR和Service CIDR是集群中用于分配IP地址的范围,如果重叠,可能会导致IP地址冲突,从而影响网络通信。
  3. 集群已经安装网络插件:例如Calico(本文为例)、Flannel等。网络插件负责集群内部的网络通信,Submariner需要依赖这些网络插件来实现跨集群的通信。如果使用Calico,还需要满足以下两个条件:
    • 安装Calico API Server:因为需要使用ippools.projectcalico.org/v3 CRD,而不是ippools.crd.projectcalico.org/v1。恰好ippools.projectcalico.org/v3 CRD是由Calico API Server生成的。
    • 将Calico的网络模式修改为VXLAN:默认为IPIP。
  4. 将kube - proxy的网络模式修改为iptables:默认为ipvs。
  5. 关闭nodelocaldns:Submariner不支持nodelocaldns。nodelocaldns是Kubernetes中的一个本地DNS缓存服务,由于Submariner修改nodelocaldns的配置文件存在缺陷,所以关闭它可以避免该问题。
  6. 添加标签:在每个集群挑选一个节点打上 submariner.io/gateway=true Label,Submariner会将这些节点作为Gateway节点。Gateway节点是跨集群通信的关键节点,它负责转发不同集群之间的网络流量。
  7. 添加注解(内网无法访问时):对于两个集群之间内网不能访问的,需要为Gateway节点添加公网ip注解:kubectl annotate node --all gateway.submariner.io/public-ip=ipv4:<public-ip> (用户使用时自行变更,注意更改里面的实际IP)

规划

在进行具体的安装操作之前,我们需要对集群进行规划。规划的目的是明确各个集群的相关信息,包括context路径、节点、角色和公网IP等,以便后续的安装和配置工作能够顺利进行。以下是我们的规划表格:

集群context路径节点角色public-ip
cluster~/.kube/confignode1broker && operator10.53.23.11
cluster1~/.kube/config-1node2operator10.54.10.7

注意:后续操作默认在cluster集群的node1节点执行,Submariner版本为v0.20.0

安装submariner-broker(helm)

完成规划后,我们首先要安装submariner - broker。

submariner - broker是Submariner的核心组件之一,它负责管理集群之间的连接信息和资源分配。

通过Helm进行安装可以方便地管理和部署Submariner的相关组件,以下是安装命令:

# add repo
helm repo add submariner-latest https://submariner-io.github.io/submariner-charts/charts

# export env
export BROKER_NS=submariner-k8s-broker

# install
helm install "${BROKER_NS}" submariner-latest/submariner-k8s-broker \
             --create-namespace \
             --namespace "${BROKER_NS}" \
             --version 0.20.0 

安装submariner-operator(helm)

安装完submariner - broker后,接下来我们要安装submariner - operator。

submariner - operator负责管理和监控Submariner的各个组件,确保它们正常运行。我们需要分别在cluster和cluster1两个集群上进行安装。

cluster集群安装:

# cluster
export BROKER_NS=submariner-k8s-broker
export SUBMARINER_NS=submariner-operator
# psk 可固定为某一特定值
export SUBMARINER_PSK=$(LC_CTYPE=C tr -dc 'a-zA-Z0-9' < /dev/urandom | fold -w 64 | head -n 1)
# mac 上无法使用以上命令,可以固定为某一特定值
export SUBMARINER_PSK='yxywUMWl85AHqVi0aoVbzPlLEiBb2EnLmZNCF5HxqNHbT44PPnSmOpTHpqyR5nN9'

# broker param
export KUBECONFIG=~/.kube/config
# 访问 Broker 集群 API-Server 的 url
export SUBMARINER_BROKER_URL=$(kubectl -n default get endpoints kubernetes -o jsonpath="{.subsets[0].addresses[0].ip}:{.subsets[0].ports[?(@.name=='https')].port}")
# 访问 Broker 集群 API-Server 的 CA证书
export SUBMARINER_BROKER_CA=$(kubectl -n "${BROKER_NS}" get secrets submariner-k8s-broker-client-token -o jsonpath="{.data['ca\.crt']}")
# 访问 Broker 集群 API-Server 的 Token
export SUBMARINER_BROKER_TOKEN=$(kubectl -n "${BROKER_NS}" get secrets submariner-k8s-broker-client-token -o jsonpath="{.data.token}"|base64 --decode)

export KUBECONFIG=~/.kube/config
# set cluster id
export CLUSTER_ID=cluster
# get current cluster pod cidr and service cidr
export CLUSTER_CIDR=$(kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' | grep podSubnet | awk '{print $2}')
export SERVICE_CIDR=$(kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' | grep serviceSubnet | awk '{print $2}')

# install 安装之前请确保每个变量都是正确的值
helm install submariner-operator submariner-latest/submariner-operator \
        --version 0.20.0 \
        --create-namespace \
        --namespace "${SUBMARINER_NS}" \
        --set ipsec.psk="${SUBMARINER_PSK}" \
        --set broker.server="${SUBMARINER_BROKER_URL}" \
        --set broker.token="${SUBMARINER_BROKER_TOKEN}" \
        --set broker.namespace="${BROKER_NS}" \
        --set broker.ca="${SUBMARINER_BROKER_CA}" \
        --set broker.insecure=true \
        --set submariner.clusterId="${CLUSTER_ID}" \
        --set submariner.clusterCidr="${CLUSTER_CIDR}" \
        --set submariner.serviceCidr="${SERVICE_CIDR}" \
        --set submariner.natEnabled="true" \
        --set submariner.images.repository="swr.cn-east-3.myhuaweicloud.com/lomtom-common" \
        --set operator.image.repository="swr.cn-east-3.myhuaweicloud.com/lomtom-common/submariner-operator"

cluster1集群安装

# cluster1
export BROKER_NS=submariner-k8s-broker
export SUBMARINER_NS=submariner-operator
# psk 可固定为某一特定值
export SUBMARINER_PSK=$(LC_CTYPE=C tr -dc 'a-zA-Z0-9' < /dev/urandom | fold -w 64 | head -n 1)
# mac 上无法使用以上命令,可以固定为某一特定值
export SUBMARINER_PSK='yxywUMWl85AHqVi0aoVbzPlLEiBb2EnLmZNCF5HxqNHbT44PPnSmOpTHpqyR5nN9'

# broker param
export KUBECONFIG=~/.kube/config
# 访问 Broker 集群 API-Server 的 url
export SUBMARINER_BROKER_URL=$(kubectl -n default get endpoints kubernetes -o jsonpath="{.subsets[0].addresses[0].ip}:{.subsets[0].ports[?(@.name=='https')].port}")
# 访问 Broker 集群 API-Server 的 CA证书
export SUBMARINER_BROKER_CA=$(kubectl -n "${BROKER_NS}" get secrets submariner-k8s-broker-client-token -o jsonpath="{.data['ca\.crt']}")
# 访问 Broker 集群 API-Server 的 Token
export SUBMARINER_BROKER_TOKEN=$(kubectl -n "${BROKER_NS}" get secrets submariner-k8s-broker-client-token -o jsonpath="{.data.token}"|base64 --decode)

export KUBECONFIG=~/.kube/config-1
# set cluster id
export CLUSTER_ID=cluster1
# get current cluster pod cidr and service cidr
export CLUSTER_CIDR=$(kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' | grep podSubnet | awk '{print $2}')
export SERVICE_CIDR=$(kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' | grep serviceSubnet | awk '{print $2}')

# install 安装之前请确保每个变量都是正确的值
helm install submariner-operator submariner-latest/submariner-operator \
        --version 0.20.0 \
        --create-namespace \
        --namespace "${SUBMARINER_NS}" \
        --set ipsec.psk="${SUBMARINER_PSK}" \
        --set broker.server="${SUBMARINER_BROKER_URL}" \
        --set broker.token="${SUBMARINER_BROKER_TOKEN}" \
        --set broker.namespace="${BROKER_NS}" \
        --set broker.ca="${SUBMARINER_BROKER_CA}" \
        --set broker.insecure=true \
        --set submariner.clusterId="${CLUSTER_ID}" \
        --set submariner.clusterCidr="${CLUSTER_CIDR}" \
        --set submariner.serviceCidr="${SERVICE_CIDR}" \
        --set submariner.natEnabled="true" \
        --set submariner.images.repository="swr.cn-east-3.myhuaweicloud.com/lomtom-common" \
        --set operator.image.repository="swr.cn-east-3.myhuaweicloud.com/lomtom-common/submariner-operator"

安装subctl工具

subctl是Submariner的命令行工具,它可以帮助我们更方便地管理Submariner集群。如果你不需要使用subctl工具,可以跳过此步骤。

以下是安装subctl工具的两种方式:

方式一:自动安装工具

# 方式一:auto install tool
curl https://get.submariner.io | VERSION=0.20.0 bash
export PATH=$PATH:~/.local/bin
echo export PATH=\$PATH:~/.local/bin >> ~/.profile

方式二:手动安装工具

# 方式二:manual install tool
# amd64 
wget https://github.com/submariner-io/releases/releases/download/v0.20.0/subctl-v0.20.0-linux-amd64.tar.gz
tar -xvf subctl-v0.20.0-linux-amd64.tar.gz
cp subctl-v0.20.0/subctl /usr/local/bin/

# arm64 
wget https://github.com/submariner-io/releases/releases/download/v0.20.0/subctl-v0.20.0-linux-arm64.tar.gz
tar -xvf subctl-v0.20.0-linux-arm64.tar.gz
cp subctl-v0.20.0/subctl /usr/local/bin/

安装验证

完成上述所有安装步骤后,我们需要对安装结果进行验证,以确保Submariner已经正常工作。验证主要包括两个方面:验证broker是否正常和验证operator是否正常。

  1. 验证broker是否正常
  • 获取所有集群是否已经正常join到broker集群
kubectl -n submariner-k8s-broker get clusters.submariner.io
NAME       AGE
cluster    35s
cluster1   39s
  • 查看CRD资源是否正常生成
# kubectl get crds | grep -iE 'submariner|multicluster.x-k8s.io'
clusters.submariner.io                                2025-05-15T08:09:35Z
endpoints.submariner.io                               2025-05-15T08:09:35Z
gateways.submariner.io                                2025-05-15T08:09:35Z
serviceexports.multicluster.x-k8s.io                  2025-05-15T08:09:35Z
serviceimports.multicluster.x-k8s.io                  2025-05-15T08:09:35Z
  1. 验证operator是否正常
  • 查看operator pod是否正常
kubectl get pod -n submariner-operator  
NAME                                             READY   STATUS    RESTARTS   AGE
submariner-gateway-p27px                         1/1     Running   0          21s
submariner-lighthouse-agent-5b678544d4-prm9k     1/1     Running   0          21s
submariner-lighthouse-coredns-56db555d7b-4m4c2   1/1     Running   0          20s
submariner-lighthouse-coredns-56db555d7b-9k6ld   1/1     Running   0          20s
submariner-metrics-proxy-pp5hd                   1/1     Running   0          21s
submariner-operator-785df79474-4k8fc             1/1     Running   0          39s
submariner-routeagent-g9cn5                      1/1     Running   0          21s
  • 使用subctl show all命令查看集群间的连接情况,输出如下:
# subctl show all
Cluster "kubernetes"
 Detecting broker(s)
 No brokers found

 Showing Connections
GATEWAY CLUSTER     REMOTE IP    NAT   CABLE DRIVER   SUBNETS                        STATUS      RTT avg.
node1   cluster1    10.54.10.7   yes   libreswan      10.96.4.0/22, 100.128.0.0/10   connected              

 Showing Endpoints
CLUSTER     ENDPOINT IP      PUBLIC IP     CABLE DRIVER   TYPE     
cluster     192.168.23.22    10.53.23.11   libreswan      local    
cluster1    192.168.80.221   10.54.10.7    libreswan      remote   

 Showing Gateways
NODE    HA STATUS   SUMMARY                               
node    active      All connections (1) are established   

 Showing Network details
    Discovered network details via Submariner:
        Network plugin:  calico
        Service CIDRs:   [10.96.0.0/22]
        Cluster CIDRs:   [100.64.0.0/10]

 Showing versions 
COMPONENT                       REPOSITORY                                      CONFIGURED   RUNNING                     ARCH    
submariner-gateway              swr.cn-east-3.myhuaweicloud.com/lomtom-common   0.20.0       release-0.20-f0a5355cabfc   amd64   
submariner-routeagent           swr.cn-east-3.myhuaweicloud.com/lomtom-common   0.20.0       release-0.20-f0a5355cabfc   amd64   
submariner-metrics-proxy        swr.cn-east-3.myhuaweicloud.com/lomtom-common   0.20.0       release-0.20-8fde9372397b   amd64   
submariner-operator             swr.cn-east-3.myhuaweicloud.com/lomtom-common   0.20.0       release-0.20-44970648cf5c   amd64   
submariner-lighthouse-agent     swr.cn-east-3.myhuaweicloud.com/lomtom-common   0.20.0       release-0.20-c9e76a4aee91   amd64   
submariner-lighthouse-coredns   swr.cn-east-3.myhuaweicloud.com/lomtom-common   0.20.0       release-0.20-c9e76a4aee91   amd64  

需注意几个点:

  • Showing Connections 中,STATUS为connected,表示集群间的连接已经建立
  • Showing Endpoints 中,TYPE为local和remote,并且ENDPOINT IP为节点的IP,PUBLIC IP为节点的公网IP
  • Showing Gateways 中,HA STATUS为active,并且SUMMARY中显示所有连接的数量为除此集群外的集群数量

服务间访问验证

  1. 部署一个nginx服务
export KUBECONFIG=~/.kube/config

# create namespace
NAMESPACE=nginx-test
kubectl create namespace $NAMESPACE

# create deployment and service
kubectl -n $NAMESPACE create deployment nginx --image=swr.cn-east-3.myhuaweicloud.com/lomtom-common/nginx-unprivileged:stable-alpine
kubectl -n $NAMESPACE expose deployment nginx --port 8080

# expose service
subctl export service --namespace $NAMESPACE nginx
  1. 通过另一个集群访问
export KUBECONFIG=~/.kube/config-1

# create namespace
NAMESPACE=nginx-test
kubectl create namespace $NAMESPACE

# create pod
kubectl run tmp-shell --rm -i --tty --image swr.cn-east-3.myhuaweicloud.com/lomtom-common/nettest:0.20.0 -- /bin/bash

# exec command
curl nginx.nginx-test.svc.clusterset.local:8080 
dig nginx.nginx-test.svc.clusterset.local

诊断

如果安装后无法访问,可以按照以下思路进行排查:

  1. 查看所有的前提条件是否满足
  2. 使用subctl diagnose all命令查看诊断信息,请确保所有的检查项均正确,输出如下:
# subctl diagnose all
Cluster "kubernetes"
 Checking Submariner support for the Kubernetes version
 Kubernetes version "v1.24.0" is supported

 Non-Globalnet deployment detected - checking that cluster CIDRs do not overlap
 Checking DaemonSet "submariner-gateway"
 Checking DaemonSet "submariner-routeagent"
 Checking DaemonSet "submariner-metrics-proxy"
 Checking Deployment "submariner-lighthouse-agent"
 Checking Deployment "submariner-lighthouse-coredns"
 Checking the status of all Submariner pods
 Checking that gateway metrics are accessible from non-gateway nodes
 Skipping this check as it's a single node cluster

 ✓ Checking Submariner support for the CNI network plugin
 ✓ The detected CNI network plugin ("calico") is supported
 ✓ Calico CNI detected, checking if the Submariner IPPool pre-requisites are configured
 ✓ Checking gateway connections
 ✓ Checking route agent connections
 ✓ There are no remote endpoint connections on route agent "node"
 ✓ Checking Submariner support for the kube-proxy mode 
 ✓ The kube-proxy mode is supported
 ✓ Checking that firewall configuration allows intra-cluster VXLAN traffic
 ✓ Skipping this check as it's a single node cluster

 Checking that services have been exported properly


Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually.
  1. 如果通过pod的IP或者service的IP可以访问,但是无法通过service的域名访问,请检查coredns配置是否正常,或者是否使用了其他的dns服务
  2. 查看gateway的日志
    • 可能会因为broker的证书错误无法访问 broker
    • 可能会因为内网ip无法互通,而未指定公网ip导致无法访问
  3. 排查route agent的日志

卸载

方法一:一键卸载

subctl uninstall

方法二:手动卸载

# delete submariner
kubectl delete submariners.submariner.io -n submariner-operator submariner

# delete operator
helm uninstall submariner-operator -n submariner-operator
# delete broker
helm uninstall submariner-k8s-broker -n submariner-k8s-broker

# delete crd
for CRD in `kubectl get crds | grep -iE 'submariner|multicluster.x-k8s.io'| awk '{print $1}'`; do kubectl delete crd $CRD; done

# delete clusterroler and clusterrolebinding
roles="submariner-operator submariner-operator-globalnet submariner-lighthouse submariner-networkplugin-syncer"
kubectl delete clusterrole,clusterrolebinding $roles --ignore-not-found

# delete namespace
kubectl delete namespace submariner-k8s-broker submariner-operator

参考

  1. submariner helm安装 🔗
  2. submariner 用户指南 🔗
  3. daocloud 安装 Submariner 🔗
  4. service export api 🔗
  5. 阿里云 多集群 🔗
  6. nodelocaldns 🔗

修改calico 网络模式

# kubectl edit installations.operator.tigera.io default 
# 修改encapsulation为 VXLAN
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    bgp: Disabled
    ipPools:
    - blockSize: 26
      cidr: 10.244.0.0/16
      encapsulation: VXLAN
      natOutgoing: Enabled
      nodeSelector: all()

修改kube-proxy

# kubectl edit configmap -n kube-system kube-proxy
# 修改mode为 iptables 后重启kube-proxy
# pod内 执行kube-proxy --cleanup 
# 主机sudo ipvsadm --clear
# 重启kubelet
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-proxy
  namespace: kube-system
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    bindAddressHardFail: false
    mode: iptables

修改nodelocaldns

# kubectl edit configmap -n kube-system nodelocaldns
# 安装时指定为nodelocaldns 并且增加bind参数 移除 lighthouse.server: | 并且修改为以下
apiVersion: v1
kind: ConfigMap
metadata:
  name: nodelocaldns
  namespace: kube-system
data:
  Corefile: |
    clusterset.local:53 {
        bind 169.254.25.10 # nodelocaldns固定
        forward . 10.234.30.44 # 实时生成的
    }
lomtom

标题:多集群网络Submariner集成

作者:lomtom

链接:https://lomtom.cn/d0arkpowk6jct