快速开始
这一页只保留复现路径中的关键命令。更完整的背景、问题记录和完全体路线见根目录 plan.md。
0. 检查工具
terraform version
ansible-playbook --version
ansible-inventory --version
jq --version
kubectl version --client
helm version
cilium version --client
ssh -V
准备 SSH key:
ssh-keygen -t ed25519 -f ~/.ssh/hybrid-k8s -C "hybrid-k8s"
设置云凭据:
export ALICLOUD_ACCESS_KEY="<your-access-key>"
export ALICLOUD_SECRET_KEY="<your-secret-key>"
export ALICLOUD_REGION="cn-guangzhou"
export TENCENTCLOUD_SECRET_ID="<your-secret-id>"
export TENCENTCLOUD_SECRET_KEY="<your-secret-key>"
export TENCENTCLOUD_REGION="ap-guangzhou"
1. 准备变量
cp aliyun/terraform.tfvars.example aliyun/terraform.tfvars
cp tencent/terraform.tfvars.example tencent/terraform.tfvars
修改两个真实变量文件:
ssh_public_key_path = "~/.ssh/hybrid-k8s.pub"
admin_cidrs = ["<your-public-ip>/32"]
master_count = 1
worker_count = 1
system_disk_size = 40
enable_vpn_gateway = false
真实变量文件不提交。
2. Terraform 初始化和计划
make aliyun-init
make tencent-init
make aliyun-plan
make tencent-plan
make aliyun-plan 和 make tencent-plan 会执行:
terraform fmt -recursive
terraform validate
terraform plan -out=tfplan
3. 创建资源和导出 outputs
make aliyun-apply
make aliyun-output
make tencent-apply
make tencent-output
检查输出:
jq '.master_public_ips.value' generated/aliyun/terraform-output.json
jq '.worker_public_ips.value' generated/aliyun/terraform-output.json
jq '.master_public_ips.value' generated/tencent/terraform-output.json
jq '.worker_public_ips.value' generated/tencent/terraform-output.json
4. SSH 验证
ssh -i ~/.ssh/hybrid-k8s <aliyun-user>@<aliyun-master-public-ip> 'hostname && ip addr'
ssh -i ~/.ssh/hybrid-k8s <aliyun-user>@<aliyun-worker-public-ip> 'hostname && ip addr'
ssh -i ~/.ssh/hybrid-k8s <tencent-user>@<tencent-master-public-ip> 'hostname && ip addr'
ssh -i ~/.ssh/hybrid-k8s <tencent-user>@<tencent-worker-public-ip> 'hostname && ip addr'
如果登录失败,先分别测试 root 和 ubuntu,再同步 terraform.tfvars 中的 ssh_user。
5. 生成 Ansible inventory
make ansible-inventory
生成文件:
ansible/inventory/generated.yml
该文件由 Terraform outputs 渲染而来,不直接读取 tfstate。
6. 初始化系统、配置 WireGuard、执行 kubeadm
make ansible-bootstrap
make ansible-underlay
make ansible-kubeadm
make check-underlay
如果失败,按顺序检查:
nc -vzu <node-public-ip> 51820
ssh -i ~/.ssh/hybrid-k8s <user>@<node-public-ip> 'sudo wg show'
ssh -i ~/.ssh/hybrid-k8s <user>@<node-public-ip> 'ip addr show wg0'
检查 kubeconfig:
ls generated/aliyun/kubeconfig generated/tencent/kubeconfig
make merge-kubeconfigs
kubectl --kubeconfig generated/kubeconfig --context aliyun-guangzhou get nodes -o wide
kubectl --kubeconfig generated/kubeconfig --context tencent-guangzhou get nodes -o wide
7. 渲染 Cilium 配置并安装
make render-configs
make install-cilium
检查:
KUBECONFIG=generated/kubeconfig cilium status --context aliyun-guangzhou --wait
KUBECONFIG=generated/kubeconfig cilium status --context tencent-guangzhou --wait
8. 启用 Cluster Mesh
make enable-clustermesh
make check-clusters
make check-clustermesh
期望:
kubectl get nodes 显示节点 Ready
cilium status 正常
cilium clustermesh status 显示 connected
9. 收尾
make cleanup-test-k8s
make destroy-aliyun
make destroy-tencent
销毁后到两家云控制台确认没有遗留的实例、负载均衡和 VPN 资源。
销毁后的恢复注意事项:
terraform -chdir=aliyun state list和terraform -chdir=tencent state list应为空。generated/里的旧 output、inventory、kubeconfig 不能继续用于恢复,只能在重新apply和output后再生成。- 如果本地
terraform.tfvars保留master_count = 3、worker_count = 1,下一次创建会按 HA 规格重建。 - 恢复时按 Terraform output → Ansible inventory → ping → bootstrap → underlay → kubeadm 的顺序执行。