跳到主要内容

快速开始

这一页只保留复现路径中的关键命令。更完整的背景、问题记录和完全体路线见根目录 plan.md

0. 检查工具

terraform version
ansible-playbook --version
ansible-inventory --version
jq --version
kubectl version --client
helm version
cilium version --client
ssh -V

准备 SSH key:

ssh-keygen -t ed25519 -f ~/.ssh/hybrid-k8s -C "hybrid-k8s"

设置云凭据:

export ALICLOUD_ACCESS_KEY="<your-access-key>"
export ALICLOUD_SECRET_KEY="<your-secret-key>"
export ALICLOUD_REGION="cn-guangzhou"

export TENCENTCLOUD_SECRET_ID="<your-secret-id>"
export TENCENTCLOUD_SECRET_KEY="<your-secret-key>"
export TENCENTCLOUD_REGION="ap-guangzhou"

1. 准备变量

cp aliyun/terraform.tfvars.example aliyun/terraform.tfvars
cp tencent/terraform.tfvars.example tencent/terraform.tfvars

修改两个真实变量文件:

ssh_public_key_path = "~/.ssh/hybrid-k8s.pub"
admin_cidrs = ["<your-public-ip>/32"]
master_count = 1
worker_count = 1
system_disk_size = 40
enable_vpn_gateway = false

真实变量文件不提交。

2. Terraform 初始化和计划

make aliyun-init
make tencent-init
make aliyun-plan
make tencent-plan

make aliyun-planmake tencent-plan 会执行:

terraform fmt -recursive
terraform validate
terraform plan -out=tfplan

3. 创建资源和导出 outputs

make aliyun-apply
make aliyun-output
make tencent-apply
make tencent-output

检查输出:

jq '.master_public_ips.value' generated/aliyun/terraform-output.json
jq '.worker_public_ips.value' generated/aliyun/terraform-output.json
jq '.master_public_ips.value' generated/tencent/terraform-output.json
jq '.worker_public_ips.value' generated/tencent/terraform-output.json

4. SSH 验证

ssh -i ~/.ssh/hybrid-k8s <aliyun-user>@<aliyun-master-public-ip> 'hostname && ip addr'
ssh -i ~/.ssh/hybrid-k8s <aliyun-user>@<aliyun-worker-public-ip> 'hostname && ip addr'
ssh -i ~/.ssh/hybrid-k8s <tencent-user>@<tencent-master-public-ip> 'hostname && ip addr'
ssh -i ~/.ssh/hybrid-k8s <tencent-user>@<tencent-worker-public-ip> 'hostname && ip addr'

如果登录失败,先分别测试 rootubuntu,再同步 terraform.tfvars 中的 ssh_user

5. 生成 Ansible inventory

make ansible-inventory

生成文件:

ansible/inventory/generated.yml

该文件由 Terraform outputs 渲染而来,不直接读取 tfstate。

6. 初始化系统、配置 WireGuard、执行 kubeadm

make ansible-bootstrap
make ansible-underlay
make ansible-kubeadm
make check-underlay

如果失败,按顺序检查:

nc -vzu <node-public-ip> 51820
ssh -i ~/.ssh/hybrid-k8s <user>@<node-public-ip> 'sudo wg show'
ssh -i ~/.ssh/hybrid-k8s <user>@<node-public-ip> 'ip addr show wg0'

检查 kubeconfig:

ls generated/aliyun/kubeconfig generated/tencent/kubeconfig
make merge-kubeconfigs
kubectl --kubeconfig generated/kubeconfig --context aliyun-guangzhou get nodes -o wide
kubectl --kubeconfig generated/kubeconfig --context tencent-guangzhou get nodes -o wide

7. 渲染 Cilium 配置并安装

make render-configs
make install-cilium

检查:

KUBECONFIG=generated/kubeconfig cilium status --context aliyun-guangzhou --wait
KUBECONFIG=generated/kubeconfig cilium status --context tencent-guangzhou --wait

8. 启用 Cluster Mesh

make enable-clustermesh
make check-clusters
make check-clustermesh

期望:

kubectl get nodes 显示节点 Ready
cilium status 正常
cilium clustermesh status 显示 connected

9. 收尾

make cleanup-test-k8s
make destroy-aliyun
make destroy-tencent

销毁后到两家云控制台确认没有遗留的实例、负载均衡和 VPN 资源。

销毁后的恢复注意事项:

  • terraform -chdir=aliyun state listterraform -chdir=tencent state list 应为空。
  • generated/ 里的旧 output、inventory、kubeconfig 不能继续用于恢复,只能在重新 applyoutput 后再生成。
  • 如果本地 terraform.tfvars 保留 master_count = 3worker_count = 1,下一次创建会按 HA 规格重建。
  • 恢复时按 Terraform output → Ansible inventory → ping → bootstrap → underlay → kubeadm 的顺序执行。