懒得排版了,直接去飞书文档看吧
https://paraparty.feishu.cn/docx/JQtkdMWu7oSsjwxwcpYcww7Inyf
代码 https://github.com/continuedev/continue/pull/2439
博客地址 https://blog.hylstudio.cn/archives/1350
飞书文档 https://paraparty.feishu.cn/docx/GEwVdulCgoLX7dxSHMecy4mBnch
argocd试玩,对于argocd定位是cd,能否胜任k8s面板存疑
这里作为备份,为了最好的阅读体验可以看飞书
0 1 |
mkdir argocd cd argocd wget https://raw.githubusercontent.com/argoproj/argo-cd/v2.9.3/manifests/ha/install.yaml |
0 1 2 3 4 5 6 7 8 9 10 11 |
sudo docker pull ghcr.io/dexidp/dex:v2.37.0 sudo docker tag ghcr.io/dexidp/dex:v2.37.0 harbor.hylstudio.local/dexidp/dex:v2.37.0 sudo docker push harbor.hylstudio.local/dexidp/dex:v2.37.0 sudo docker pull redis:7.0.11-alpine sudo docker tag redis:7.0.11-alpine harbor.hylstudio.local/library/redis:7.0.11-alpine sudo docker push harbor.hylstudio.local/library/redis:7.0.11-alpine sudo docker pull quay.io/argoproj/argocd:v2.9.3 sudo docker tag quay.io/argoproj/argocd:v2.9.3 harbor.hylstudio.local/argoproj/argocd:v2.9.3 sudo docker push harbor.hylstudio.local/argoproj/argocd:v2.9.3 |
0 1 |
kubectl create namespace argocd kubectl apply -n argocd -f install.yaml |
0 1 |
kubectl -n argocd port-forward service/argocd-server :80 |
0 1 |
0 1 |
0 1 |
0 1 |
0 1 |
0 1 |
https://argo-cd.readthedocs.io/en/stable/getting_started/
博客地址 https://blog.hylstudio.cn/archives/1343
飞书文档 https://paraparty.feishu.cn/docx/FTM7d1TIcoxL83xfRWzc34omnFe
安装k8s的时候最后一步会特别慢,之前追了一半,今天继续追完搞清楚发生了啥
懒得排版了,为了最佳阅读体验可以看飞书,这里做备份
0 1 2 3 |
verbosity (int) – Control how verbose the output of ansible-playbook is _input (io.FileIO) – An optional file or file-like object for use as input in a streaming pipeline _output (io.FileIO) – An optional file or file-like object for use as output in a streaming pipeline |
0 1 2 3 4 |
docker pull harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 docker run -it --entrypoint /bin/bash harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 cp /hooks/kubesphere/installRunner.py /hooks/kubesphere/installRunner.py.bak vi /hooks/kubesphere/installRunner.py |
0 1 2 3 4 5 |
import io ansible_log=io.FileIO('/home/kubesphere/ansible.log', 'w') _output=ansible_log, verbosity=5 |
0 1 2 |
docker commit 5274b25c35d5 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.3 docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.3 |
0 1 2 3 |
./vm.sh -c 4 -m 8 -d 80 -p on k8s1 ./vm.sh -c 2 -m 4 -d 40 -p on k8s2 ./vm.sh -c 2 -m 4 -d 40 -p on k8s3 |
0 1 2 3 4 5 |
#dns config 192.168.2.206 k8s-control.hylstudio.local 192.168.2.206 k8s1.k8s.local 192.168.2.177 k8s2.k8s.local 192.168.2.203 k8s3.k8s.local |
0 1 2 3 4 5 6 7 |
apiVersion: installer.kubesphere.io/v1alpha1 kind: ClusterConfiguration metadata: name: ks-installer namespace: kubesphere-system labels: version: v3.3.2 |
0 1 2 3 4 5 |
#做个假的3.3.2骗掉版本检测 docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2-bak docker image rm harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.3 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 |
0 1 2 3 4 5 6 7 8 9 |
namespace/kubesphere-system unchanged serviceaccount/ks-installer unchanged customresourcedefinition.apiextensions.k8s.io/clusterconfigurations.installer.kubesphere.io unchanged clusterrole.rbac.authorization.k8s.io/ks-installer unchanged clusterrolebinding.rbac.authorization.k8s.io/ks-installer unchanged deployment.apps/ks-installer unchanged clusterconfiguration.installer.kubesphere.io/ks-installer created 13:55:05 UTC success: [k8s1] Please wait for the installation to complete: >>---> |
0 1 2 3 4 5 6 7 8 9 10 11 |
ks-installer-566ffb8f44-ml9gm:/kubesphere$ ps aux PID USER TIME COMMAND 1 kubesphe 0:00 /shell-operator start 56 kubesphe 0:06 python3 /hooks/kubesphere/installRunner.py 2501 kubesphe 1:21 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubespher 4348 kubesphe 0:01 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubespher 5261 kubesphe 0:00 /bin/sh -c /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700921243. 5262 kubesphe 0:00 /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700921243.7008557-434 5263 kubesphe 0:00 /usr/local/bin/kubectl apply -f /kubesphere/kubesphere/ks-core/crds/iam.kubesphere.io_ 5287 kubesphe 0:00 bash 5299 kubesphe 0:00 ps aux |
0 1 2 3 4 5 6 7 8 9 10 |
Start installing monitoring Start installing multicluster Start installing openpitrix Start installing network ************************************************** Waiting for all tasks to be completed ... task network status is successful (1/4) task openpitrix status is successful (2/4) task multicluster status is successful (3/4) 只能怀疑是monitoring了 |
0 1 |
docker cp xxx:/hooks/kubesphere/installRunner.py . |
0 1 2 3 |
FROM harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2-bak RUN rm -rf /hooks/kubesphere/installRunner.py COPY installRunner.py /hooks/kubesphere/ |
0 1 2 3 4 5 6 7 8 |
mkdir imgbuild mv installRunner.py DockerFile imgbuild cd imagebuild docker build -f DockerFile -t harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.4 . docker image rm harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.4 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.4 docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 |
--skip-push-images
来避免harbor的镜像被覆盖
0 1 2 3 4 5 6 |
Start installing monitoring Start installing multicluster Start installing openpitrix Start installing network ************************************************** Waiting for all tasks to be completed ... |
0 1 2 3 4 5 |
readyToEnabledList = [ 'monitoring', 'multicluster', 'openpitrix', 'network'] |
0 1 2 3 4 5 6 7 8 9 10 11 12 |
ks-installer-566ffb8f44-zft9h:/hooks/kubesphere$ ps aux|more PID USER TIME COMMAND 1 kubesphe 0:00 /shell-operator start 18 kubesphe 5:49 python3 /hooks/kubesphere/installRunner.py 2171 kubesphe 0:00 bash 4053 kubesphe 1:54 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubesphere/config/ks-config.json -e @/kubesphere/config/ks-status.json -e @/kubesphere/results/env/extravars /kubesphere/playbooks/monitoring.yaml 8876 kubesphe 0:00 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubesphere/config/ks-config.json -e @/kubesphere/config/ks-status.json -e @/kubesphere/results/env/extravars /kubesphere/playbooks/monitoring.yaml 8899 kubesphe 0:00 /bin/sh -c /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700926548.0542264-8876-155728454178142/AnsiballZ_command.py && sleep 0 8900 kubesphe 0:01 /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700926548.0542264-8876-155728454178142/AnsiballZ_command.py 8926 kubesphe 0:00 /usr/local/bin/kubectl apply -f /kubesphere/kubesphere/prometheus/alertmanager 8957 kubesphe 0:00 ps aux 8958 kubesphe 0:00 more |
watch -n 1 'ps aux|more'
可以看到执行过程,其实没卡住就是安装的慢
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
def generateTaskLists(): readyToEnabledList, readyToDisableList = getComponentLists() tasksDict = {} for taskName in readyToEnabledList: playbookPath = os.path.join(playbookBasePath, str(taskName) + '.yaml') artifactDir = os.path.join(privateDataDir, str(taskName)) if os.path.exists(artifactDir): shutil.rmtree(artifactDir) tasksDict[str(taskName)] = component( playbook=playbookPath, private_data_dir=privateDataDir, artifact_dir=artifactDir, ident=str(taskName), quiet=False, rotate_artifacts=1 ) return tasksDict def installRunner(self): installer = ansible_runner.run_async( playbook=self.playbook, private_data_dir=self.private_data_dir, artifact_dir=self.artifact_dir, ident=self.ident, quiet=self.quiet, rotate_artifacts=self.rotate_artifacts, verbosity=5 ) task_name = self.ident thread = installer[0] log_file = open('/tmp/'+task_name+'.debug.log', 'w') thread.stdout = log_file return installer[1] |
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
--- a/installRunner.py.bak +++ b/installRunner.py @@ -90,8 +90,13 @@ class component(): artifact_dir=self.artifact_dir, ident=self.ident, quiet=self.quiet, - rotate_artifacts=self.rotate_artifacts + rotate_artifacts=self.rotate_artifacts, + verbosity=5 ) + task_name = self.ident + thread = installer[0] + log_file = open('/tmp/'+task_name+'.debug.log', 'w') + thread.stdout = log_file return installer[1] @@ -263,7 +268,7 @@ def generateTaskLists(): private_data_dir=privateDataDir, artifact_dir=artifactDir, ident=str(taskName), - quiet=True, + quiet=False, rotate_artifacts=1 ) |
0 1 2 3 4 5 6 7 8 9 |
mkdir imgbuild mv installRunner.py DockerFile imgbuild cd imagebuild docker build -f DockerFile -t harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.7 . docker image rm harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.7 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.7 docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 docker image ls --digests|grep installer |
Image ID
是docker客户端侧的概念,所以仓库不管= =thread.stdout = log_file
并不会输出,根据https://github.com/ansible/ansible-runner/blob/e0371d634426dfbdb9d3bfacb20e2dd4b039b499/src/ansible_runner/runner.py#L155C28-L155C48self.config.suppress_output_file
为假才会输出到文件self.config.artifact_dir
的/stdout和stderr中,artifact_dir来自/kubesphere/results/{task_name},suppress_output_file
和suppress_ansible_output
默认值也都是False--skip-push-images
,修改quite=False后去/kubesphere/results下查看是否有stdout和stderr
0 1 |
%s/quiet=True/quiet=False/g |
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
--- a/installRunner.py.bak +++ b/installRunner.py @@ -85,6 +85,7 @@ class component(): def installRunner(self): installer = ansible_runner.run_async( + verbosity=5, playbook=self.playbook, private_data_dir=self.private_data_dir, artifact_dir=self.artifact_dir, @@ -263,7 +264,7 @@ def generateTaskLists(): private_data_dir=privateDataDir, artifact_dir=artifactDir, ident=str(taskName), - quiet=True, + quiet=False, rotate_artifacts=1 ) @@ -341,6 +342,7 @@ def preInstallTasks(): for task, paths in preInstallTasks.items(): pretask = ansible_runner.run( + verbosity=5, playbook=paths[0], private_data_dir=privateDataDir, artifact_dir=paths[1], @@ -353,11 +355,12 @@ def preInstallTasks(): def resultInfo(resultState=False, api=None): ks_config = ansible_runner.run( + verbosity=5, playbook=os.path.join(playbookBasePath, 'ks-config.yaml'), private_data_dir=privateDataDir, artifact_dir=os.path.join(privateDataDir, 'ks-config'), ident='ks-config', - quiet=True + quiet=False ) if ks_config.rc != 0: @@ -365,11 +368,12 @@ def resultInfo(resultState=False, api=None): exit() result = ansible_runner.run( + verbosity=5, playbook=os.path.join(playbookBasePath, 'result-info.yaml'), private_data_dir=privateDataDir, artifact_dir=os.path.join(privateDataDir, 'result-info'), ident='result', - quiet=True + quiet=False ) if result.rc != 0: @@ -380,6 +384,7 @@ def resultInfo(resultState=False, api=None): if "migration" in resource['status']['core'] and resource['status']['core']['migration'] and resultState == False: migration = ansible_runner.run( + verbosity=5, playbook=os.path.join(playbookBasePath, 'ks-migration.yaml'), private_data_dir=privateDataDir, artifact_dir=os.path.join(privateDataDir, 'ks-migration'), @@ -395,11 +400,12 @@ def resultInfo(resultState=False, api=None): logging.info(info) telemeter = ansible_runner.run( + verbosity=5, playbook=os.path.join(playbookBasePath, 'telemetry.yaml'), private_data_dir=privateDataDir, artifact_dir=os.path.join(privateDataDir, 'telemetry'), ident='telemetry', - quiet=True + quiet=False ) if telemeter.rc != 0: |
0 1 2 3 4 5 |
drwxr-xr-x 3 kubesphe kubesphe 4.0K Dec 2 13:37 common drwxr-xr-x 1 kubesphe kubesphe 4.0K Feb 3 2023 env drwxr-xr-x 3 kubesphe kubesphe 4.0K Dec 2 13:41 ks-core drwxr-xr-x 3 kubesphe kubesphe 4.0K Dec 2 13:37 metrics_server drwxr-xr-x 3 kubesphe kubesphe 4.0K Dec 2 13:37 preinstall |
0 1 2 3 4 5 6 7 8 9 |
namespace/kubesphere-system unchanged serviceaccount/ks-installer unchanged customresourcedefinition.apiextensions.k8s.io/clusterconfigurations.installer.kubesphere.io unchanged clusterrole.rbac.authorization.k8s.io/ks-installer unchanged clusterrolebinding.rbac.authorization.k8s.io/ks-installer unchanged deployment.apps/ks-installer unchanged clusterconfiguration.installer.kubesphere.io/ks-installer created 13:35:26 UTC success: [k8s1] Please wait for the installation to complete: >>---> |
0 1 2 3 4 5 6 |
Start installing monitoring Start installing multicluster Start installing openpitrix Start installing network ************************************************** Waiting for all tasks to be completed ... |
0 1 2 3 4 |
PLAY RECAP ********************************************************************* localhost : ok=24 changed=22 unreachable=0 failed=0 skipped=24 rescued=0 ignored=0 task monitoring status is successful (4/4) |
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
changed: [localhost] => (item=kubesphere-config.yaml) => { "ansible_loop_var": "item", "changed": true, "cmd": "/usr/local/bin/kubectl apply -f /kubesphere/kubesphere/kubesphere-config.yaml", "delta": "0:00:07.874273", "end": "2023-12-02 14:09:16.591793", "failed_when_result": false, "invocation": { "module_args": { "_raw_params": "/usr/local/bin/kubectl apply -f /kubesphere/kubesphere/kubesphere-config.yaml", "_uses_shell": true, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true, "warn": true } }, "item": "kubesphere-config.yaml", "rc": 0, "start": "2023-12-02 14:09:08.717520", "stderr": "", "stderr_lines": [], "stdout": "configmap/kubesphere-config created", "stdout_lines": [ "configmap/kubesphere-config created" ] } |
总结一些无需保密的经验作为总结
监听代码变动
打包环境容器化
平滑关机
健康检查
流量路由
多环境打包
配置中心
一般每个独立动作称作action,一次独立完整的流程称作pipeline,一条pipeline由多个action编排组成
pipeline可人为指定触发条件,可以选择手动触发、定时触发、git事件触发等
action之间可串行并行随意组合,action节点可选择自动执行、定时执行、人工执行、人工二次确认等选项。action节点的类型不同平台略有差异。
代码变动监听一般使用git hooks,配置在git server上,可以监听包括但不限于push、merge requests、tag等动作事件并回调给指定的api
一般配合分支和tag规范来共同管理开发到上线的全流程,例如feature分支->dev分支->master分支三层结构或更细化的多层分支结构。在不同类型的分支上触发的pipeline有所区别,例如feature分支可能额外触发代码检查pipeline流程,dev分支额外触发单元测试pipeline流程,master分支或tag触发上线pipeline流程。
打包的环境通常不在开发机而是专用的打包机,打包机的环境除非特殊需求一般使用docker容器通过镜像统一管理打包环境,一般会选择包含docker环境的docker作为打包基础环境,在此基础上定制各种打包的环境依赖、调整打包依赖的缓存文件夹共享策略,以此来保证打包环境的稳定和可控。通常还会配套使用私有镜像中心(如docker registry)和私有包管理中心(如jfrog)来加速打包过程排除网络的影响
对于重要的服务不仅需要滚动重启,还必须实现平滑关机来保证服务滚动重启的过程中的稳定性,一般思路是程序自行实现平滑关机流程并允许通过监听SIGTERM、SIGKILL等信号或单独实现http或rpc接口触发。在上线的pipeline或控制器或上级调度程序提前调整网络流量后,再通知对应的节点执行平滑关机流程并自行退出,在一定时间超时后强行杀死,因此通常配合健康检查和流量路由使用。
健康检查一般也由应用程序自行实现,可分为可达性检查和可用性检查,其中可达性检查主要用于探测网络链路是否通畅,不关心程序本身是否可用,可用性检查在可达性检查的基础还必须满足可用的定义。
可用的定义在不同的场景下定义不同,对web服务来说比较简单的定义就是可接收http请求并成功回应,再严格一点的是要同时满足依赖的数据库服务、缓存服务、队列服务同时可用,但严格的定义会导致底层服务故障时无法让带有降级逻辑的程序执行降级导致服务彻底不可用,因此一般不轻易采用,而是让程序自行选择是执行降级逻辑还是直接认为服务不可用。
流量路由主要负责处理流量的流向,上一篇总结的路径上的流量通常被称作是南北向流量或者纵向流量,应用程序之间的通信通常被称作东西向流量或横向流量
对某个服务集群来说一般通过网络层路由、客户端或服务器服务发现、负载均衡来完成精确的控制流量流向。特别的对于云原生的服务网格来说主要处理的是东西向流量。
针对同一份代码需要运行在多个数据中心、多个公司、多个环境,因此需要通过某种方式来区分运行环境。可选策略一般有几个思路,一种是用同一份产物通过配置中心、配置文件、环境变量等做出区分来改变程序能感知的环境,另一种是打包时根据需要直接按不同环境组合生成多种环境运行的产物。
对于使用配置中心、配置文件、环境变量来处理的方案,需要额外注意启动时候的传入参数和配置中心的设置,方便集中管理但很容易出意外
对于使用直接打包多种产物来处理的方案,虽然不用担心环境参数错误,但需要额外考虑环境参数的保密问题也就是产物的保密问题
总结一些无需保密的经验作为总结
一般web系统都是由接入层网络+业务层网络组成,从用户客户端到业务曾服务的入口这段流量路径通常是人们所说的接入层网络
网络层这中间通常会包括:DNS、X层负载均衡
物理链路上通常会包括三大运营商接入网络、小运营商接入网络、机房间专线、核心交换机互联
相关技术包括虚拟IP、4层/7层网络报文解析等
一个典型的wen服务接入层网络通常是由 DNS系统、4层负载均衡、7层负载均衡组成
无论是哪层,核心思想都是把多个一样的东西对外表现的”像是”一个,理解了这个就能理解接入层网络的大部分逻辑
在DNS层,DNS可以做到分地区、分运营商进行多IP解析、主备策略、BGP机房策略、健康检查、灾备切换
在网络协议栈第4层,可以将多个IP+端口对外当成一个IP+端口使用,多个IP一般叫做L4RS,一个IP一般叫做L4VIP
当然这一层也会包含负载均衡的策略、健康检查、灾备自动切换等功能
在网络协议栈第7曾,可以将多个host+端口对外当成一个host+端口,host的所在一般叫虚拟主机,多个host+端口一般称作L7RS或serviceIP或RealIP
当然这一层也会包含负载均衡策略、健康检查、灾备自动切换
还有一种特殊流程是L4VIP直通serviceIPs,对于IM类特殊应用就不会都使用7层负载均衡
典型拓扑图如下,敏感信息已虚化处理
kubesphere有点小bug想修一修,准备搭一个本地的环境。
利用最新版的kubekey已经可以支持搭建自定义域名的harbor当作镜像中心了,利用这个镜像中心可离线或加速安装当前官网文档上的kubersphere和k8s。
当安装完成后进入负责安装的pod内查看来下,发现是/shell-operator在运行,打算从这个东西入手来尝试搞清楚kubesphere的安装和运行逻辑,从而对现有的kubesphere做补丁升级
0 1 2 3 4 5 6 7 8 9 10 |
ubuntu@k8s1:~$ kubectl exec -it -n kubesphere-system ks-installer-566ffb8f44-l769h -- bash ks-installer-566ffb8f44-l769h:/kubesphere$ pwd /kubesphere ks-installer-566ffb8f44-l769h:/kubesphere$ ls config installer kubesphere playbooks results ks-installer-566ffb8f44-l769h:/kubesphere$ ps aux PID USER TIME COMMAND 1 kubesphe 0:01 /shell-operator start 13488 kubesphe 0:00 bash 13495 kubesphe 0:00 ps aux |
前置知识:kubectl、shell-operator、python、ansible
从https://github.com/kubesphere/ks-installer/blob/20a6daa18adf10410a385b48ab2769e55d8bdee2/Dockerfile.complete#L8 我们可以看到主入口的程序来源是https://github.com/flant/shell-operator 这是一个运行在k8s集群中事件驱动脚本的工具。
根据https://github.com/flant/shell-operator/blob/55ca7a92c873cccfdad0c7591048eaeb8cf0dd4b/docs/src/HOOKS.md?plain=1#L11 描述shell-operator默认会递归扫描hooks目录下的文件把按文件的标准输出当作声明监听event
根据https://github.com/kubesphere/ks-installer/blob/20a6daa18adf10410a385b48ab2769e55d8bdee2/Dockerfile#L5 可知controller下的文件会被放到/hooks下作为监听,其中包括installerRunner.py
从 https://github.com/kubesphere/ks-installer/blob/20a6daa18adf10410a385b48ab2769e55d8bdee2/controller/installRunner.py#L30C12-L30C32 可知installRunner.py会在ClusterConfiguration创建或更新的时候被触发。
TODO 待详细查看触发动作中yaml是否包含版本参数
从 https://github.com/kubesphere/ks-installer/blob/20a6daa18adf10410a385b48ab2769e55d8bdee2/controller/installRunner.py#L338C41-L338C53 可以看到实际执行动作的是ansible
做实验需要反复创建虚机装系统,略麻烦。打算给家里云做个低配版的虚机自动化管理
esxi本身有接口和SDK可以给各种语言做自动化对接,从官方github就能找到不少资料
ubuntu自动安装也比较成熟,常见的方案都是使用cloud-init来提供user-data,可以参考官方文档试试
使用govc命令行即可,脚本参考https://github.com/vmware/govmomi/blob/main/scripts/vcsa/create-esxi-vm.sh 去掉了对我没用的部分并增加了几个参数方便自己使用,完整文档请参考https://github.com/vmware/govmomi/tree/main/govc#usage
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
#!/bin/bash -e # Create a VM and boot vm from cdrom/iso, modified from https://github.com/vmware/govmomi/blob/main/scripts/vcsa/create-esxi-vm.sh # GOVC_* environment variables also apply, see https://github.com/vmware/govmomi/tree/main/govc#usage # If GOVC_USERNAME is set, it is used to login ESXi # If GOVC_PASSWORD is set, the account password will be set to this value # 6.7 U3 https://docs.vmware.com/en/VMware-vSphere/6.7/rn/vsphere-esxi-vcenter-server-67-release-notes.html export GOVC_URL=1.2.3.4 #esxi的IP export GOVC_INSECURE=1 export GOVC_USERNAME=ESXI_USER #esxi用户名 export GOVC_PASSWORD=ESXI_PASSWORD #esxi密码 export GOVC_DATASTORE=datastore1 export GOVC_NETWORK=ESXI_NET#默认交换机网络,我用的是test #. ~/.govcrc set -o pipefail usage() { cat <<'EOF' Usage: $0 [-c CORE] [-m MEM_GB] [-d DISK_GB] [-t DISK_THICK] [-i ISO] [-p POWER_STATE] VM_NAME Example1: $0 vmname Example2: $0 -c 2 -m 2 -d 40 -t true -i ubuntu-22.04.1-live-server-amd64.iso -p on vmname EOF } core=2 mem=2 disk=40 thick=true iso=ubuntu-22.04.1-live-server-amd64.iso power=off while getopts c:m:d:t:i:p:h flag do echo "flag=$flag" echo "val=$OPTARG" case $flag in c) core=$OPTARG ;; m) mem=$OPTARG ;; d) disk=$OPTARG ;; t) thick=$OPTARG ;; i) iso=$OPTARG ;; p) power=$OPTARG ;; h) usage exit ;; *) usage 1>&2 exit 1 ;; esac done shift $((OPTIND-1)) if [ $# -ne 1 ] ; then usage exit 1 fi # if [[ "$iso" == *"-Installer-"* ]] ; then # echo "Invalid iso name (need stateless, not installer): $iso" 1>&2 # exit 1 # fi network=${GOVC_NETWORK:-"VM Network"} username=$GOVC_USERNAME password=$GOVC_PASSWORD guest=${GUEST:-"ubuntu64Guest"} name=$1 echo -n "Checking govc version..." govc version -require 0.15.0 #vm echo "Creating vm ${name}..." govc vm.create -version 6.7 -on=false -net "$network" -m $((mem*1024)) -c $core -g "$guest" -net.adapter=vmxnet3 -disk.controller pvscsi "$name" #cdrom echo "Adding cdrom device to ${name}..." id=$(govc device.cdrom.add -vm "$name") boot="sysimg/$iso" # upload iso # if ! govc datastore.ls "$boot" > /dev/null 2>&1 ; then # govc datastore.upload "$iso" "$boot" # fi echo "Inserting $boot into $name cdrom device..." govc device.cdrom.insert -vm "$name" -device "$id" "$boot" #disk echo "Creating disk for use by $name..." diskname=$name govc vm.disk.create -vm "$name" -name "$name"/"$diskname" -size "${disk}G" -thick=$thick #change boot seq echo "Change bios boot seq for $name..." govc device.boot -vm "$name" -order disk,ethernet,cdrom #get mac address echo "Powering on $name VM..." govc vm.power -on "$name" mac="" while [ x$mac = x"" ] || [ x$mac = x"null" ] do mac=$(govc vm.info -json $name | jq -r ".VirtualMachines[0].Config.Hardware.Device[]|select(.Backing.DeviceName == \"$network\")|.MacAddress") sleep 1 done echo "Get vm macaddress succ, $mac" #power on if [ x$power = x"on" ] ; then echo "Waiting for $name IP..." vm_ip=$(govc vm.ip "$name") ! govc events -n 100 "vm/$name" | grep -E 'warning|error' echo "$name IP get succ" echo $vm_ip echo "Create VM Done: $name, $vm_ip" else govc vm.power -off "$name" echo "Create VM Done: $name" echo "You can power on it by:" echo govc vm.power -on "$name" fi exit 0 |
官方的cloud-image.ova我测了好几次都都没法用govc正常传递-options参数,json里指定的user-data并没有生效,不知道为什么,因此先尝试半自动手动修改引导参数测试
首先按官方文档准备一个目录,放入user-data和user-meta后启动http服务器
user-data内容如下,有这么几点需要注意
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
#cloud-config autoinstall: version: 1 refresh-installer: update: yes identity: hostname: ubuntu1 username: ubuntu password: "$6$exDY1mhS4KUYCE/2$zmn9ToZwTKLhCw.b4/b.ZRTIZM30JZ4QrOQ2aOXJ8yk96xpcCof0kxKwuX1kqLG/ygbJ1f8wxED22bTL4F46P0" package_upgrade: false package_update: true packages: - net-tools apt: disable_suites: [security] primary: - arches: [default] uri: https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ ssh: install-server: true allow-pw: false authorized-keys: - ssh-rsa 放你的ssh公钥 storage: layout: name: lvm sizing-policy: all runcmd: - echo test |
20.04.5的图形界面按F6和esc后可以编辑启动菜单的指令,改成这样就能从远程加载user-data了。
话说不知道为啥vmware粘贴多了会导致有概率内容丢失
0 1 |
/casper/vmlinuz initrd=/casper/initrd autoinstall ds=nocloud-net;s=http://192.168.2.159/ubuntu1/ |
vmware在这个命令行下的粘贴完全没问题
0 1 |
set gfxpayload=keep;linux /casper/vmlinuz autoinstall ds=nocloud-net\;s=http://192.168.2.159/ubuntu1/ ---;initrd /casper/initrd;boot |
1.设置openwrt当DHCP server指定 tftp服务
https://openwrt.org/docs/guide-user/services/tftp.pxe-server
https://forum.openwrt.org/t/solved-dhcp-config-pxe-boot-from-external-tftp-server/5880
配置如下图
2.tftp由op、jumpbox、群晖负责(群晖的tftp测试失败,疑似是因为我的群晖不在测试网段,暂时使用op自带的tftp)
tftp内容制作如下,参考openwrt的文档修改而成。我使用的是BIOS legency引导,示例如下。
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
cd autoinstall mkdir tftp cd tftp mkdir syslinux cd syslinux wget --no-check-certificate https://www.kernel.org/pub/linux/utils/boot/syslinux/syslinux-6.03.tar.gz tar -zxvf syslinux-6.03.tar.gz cd syslinux-6.03/bios cp core/pxelinux.0 com32/elflink/ldlinux/ldlinux.c32 com32/menu/vesamenu.c32 com32/lib/libcom32.c32 com32/libutil/libutil.c32 ../../../../tftp DEFAULT vesamenu.c32 PROMPT 0 TIMEOUT 30 MENU TITLE Hyl PXE-Boot Menu # LABEL install # MENU LABEL Ubuntu Live 22.04 64-Bit # KERNEL casper/vmlinuz # INITRD casper/initrd # APPEND root=/dev/ram0 cloud-config-url=/dev/null ramdisk_size=1500000 ip=dhcp url=http://192.168.2.159/iso/ubuntu-22.04.1-live-server-amd64.iso # TEXT HELP # Starts the Ubuntu 22.04 64-Bit # ENDTEXT LABEL autoinstallcd MENU LABEL Ubuntu Live 22.04 64-Bit CD ROM KERNEL /casper/vmlinuz APPEND cloud-config-url=/dev/null autoinstall ds=nocloud-net;s=http://192.168.2.159/ubuntu1/ INITRD /casper/initrd TEXT HELP Starts the Ubuntu 22.04 64-Bit autoinsttall from CD ROM ENDTEXT LABEL autoinstall MENU LABEL Ubuntu Live 22.04 64-Bit autoinstall KERNEL casper/vmlinuz APPEND root=/dev/ram0 cloud-config-url=/dev/null ramdisk_size=1500000 ip=dhcp url=http://192.168.2.159/iso/ubuntu-22.04.1-live-server-amd64.iso autoinstall ds=nocloud-net;s=http://192.168.2.159/ubuntu1/ INITRD casper/initrd TEXT HELP Starts the Ubuntu 22.04 64-Bit autoinstall from http ENDTEXT |
注意,根据https://askubuntu.com/questions/1235723/automated-20-04-server-installation-using-pxe-and-live-server-image中描述
使用cloud-config-url=/dev/null可以降低内存的使用要求
3.autoinstall的user-data和iso文件由jumpbox或omv负责提供
如上面参数所示临时用python启动了一个http服务器放在了http://192.168.2.159/iso/ubuntu-22.04.1-live-server-amd64.iso,后面考虑换成omv或群晖提供。理论上这里过了DHCP过程因此不受测试网段的限制才对
尝试使用cdrom挂载iso但使用pxe修改内核参数从光驱执行自动安装
https://www.cnblogs.com/xyshun/p/9427472.html
https://www.dddns.icu/posts/pxe/
https://github.com/pypxe/PyPXE
https://github.com/netbootxyz/netboot.xyz
1.esxi自动化由govc搞定,创建后可拿到网卡mac地址
虚机创建见前面的脚本
从cdrom安装这种方式需要修改bios启动顺序才行,disk>net>cd-rom,disk放第一个是防止安装成功后还重复执行安装,net比cd-rom靠前是因为避免自动进光盘页面无法走自动化。
0 1 |
govc device.boot -vm "$name" -order disk,ethernet,cdrom |
2.通过mac地址可控制openwrt的tftp按mac地址返回引导选项
自动获取mac如下
0 1 2 3 4 5 6 7 |
mac="" while [ x$mac = x"" ] || [ x$mac = x"null" ] do mac=$(govc vm.info -json $name | jq -r ".VirtualMachines[0].Config.Hardware.Device[]|select(.Backing.DeviceName == \"$network\")|.MacAddress") sleep 1 done echo "Get vm macaddress succ, $mac" |
编写脚本按mac和hostname动态生成引导菜单并上传到tftp
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
pxecfgname=01-${mac//:/-} cat <<-EOF | tee /tmp/$pxecfgname DEFAULT vesamenu.c32 PROMPT 0 TIMEOUT 30 MENU TITLE Hyl PXE-Boot Menu LABEL autoinstallcd MENU LABEL Ubuntu Live 22.04 64-Bit CD auto KERNEL /casper/vmlinuz APPEND cloud-config-url=/dev/null autoinstall ds=nocloud-net;s=http://192.168.0.6:5003/$name/ INITRD /casper/initrd TEXT HELP Starts the Ubuntu 22.04 64-Bit autoinsttall from CD ROM ENDTEXT EOF echo "Generated menu on pxelinux.cfg/$pxecfgname:" # cat /tmp/$pxecfgname scp /tmp/$pxecfgname root@192.168.2.1:/root/tftp/pxelinux.cfg/ rm /tmp/$pxecfgname |
注意文件名应该以01-开头,详见https://wiki.syslinux.org/wiki/index.php?title=PXELINUX
还可以用16进制的ip地址,总规则是
After attempting the file as specified in the DHCP or hardcoded options, PXELINUX will probe the following paths, prefixed with “pxelinux.cfg/“, under the initial Working Directory.
3.引导选项中的iso暂时由群晖webstaion提供
静态站点,无需多言
暂时不需要了,ramdisk有问题。使用ramdisk加载iso后安装很慢,应该是我给的内存太少了。
4.引导选项中的autoinstall配置文件也由webstation提供
群晖自带的webstation可以用PHP,打算尝试使用PHP测试伪静态
使/prefix/{hostname}/user-data可按我指定的模板返回,并动态替换hostname
找到webstation的主nginxconfg文件,/var/tmp/nginx/app.d/server.webstation-vhost.conf,可以看到有include
0 1 2 3 4 5 |
server { ............ include /usr/local/etc/nginx/conf.d/blablablabla/user.conf*; } |
按路径新建一个 /usr/local/etc/nginx/conf.d/blablablabla/user.conf
/usr/local/etc/nginx/conf.d/a9d1c5c8-082a-482c-8fe1-73afc670ff6c/user.conf
感谢eric提供的参考
0 1 2 3 4 5 6 |
location ~ ^/(.+)/user-data$ { try_files $uri $uri/ /index.php?hostname=$1; } location ~ ^/(.+)/meta-data$ { return 200 ""; } |
编写index.php如下
0 1 2 3 4 5 6 7 8 9 10 11 |
<?php $hostname = $_GET['hostname']; $file = fopen("template/user-data", "r"); while(!feof($file)){ $buffer = $buffer.fgets($file, 4096); } fclose($file); $buffer = str_replace("\${hostname}", $hostname, $buffer); // echo $hostname; header('Content-Type: application/octet-stream'); echo $buffer; |
https://ubuntu.com/server/docs/install/autoinstall
https://github.com/hedzr/pxe-server-and-focal
https://hedzr.com/devops/linux/build-pxe-server-and-autoinstall-ubuntu-server
https://github.com/vmware/govmomi/blob/main/govc/README.md
https://blog.amrom.tk/2022/2022-03-03-esxi-cloud-init/
https://cloudinit.readthedocs.io/en/latest/topics/examples.html
https://nickhowell.uk/2020/05/01/Automating-Ubuntu2004-Images/
https://discourse.ubuntu.com/t/automated-server-installer-config-file-reference/16613/42
http://hmli.ustc.edu.cn/doc/linux/ubuntu-autoinstall/ubuntu-autoinstall.html