未分类 – 第 3 页 – HYL Studio

20241002continue在projector环境下的加载和相关问题

懒得排版了，直接去飞书文档看吧

https://paraparty.feishu.cn/docx/JQtkdMWu7oSsjwxwcpYcww7Inyf

代码 https://github.com/continuedev/continue/pull/2439

20231202argocd搭建试玩

0.背景

博客地址 https://blog.hylstudio.cn/archives/1350

飞书文档 https://paraparty.feishu.cn/docx/GEwVdulCgoLX7dxSHMecy4mBnch

argocd试玩，对于argocd定位是cd，能否胜任k8s面板存疑

结论：不能当作k8s面板使用，但对小白开发比较友好

这里作为备份，为了最好的阅读体验可以看飞书

1.安装


mkdir argocd cd argocd wget https://raw.githubusercontent.com/argoproj/argo-cd/v2.9.3/manifests/ha/install.yaml

0 1	mkdir argocd cd argocd wget https://raw.githubusercontent.com/argoproj/argo-cd/v2.9.3/manifests/ha/install.yaml

提取镜像路径，做本地化


sudo docker pull ghcr.io/dexidp/dex:v2.37.0
sudo docker tag ghcr.io/dexidp/dex:v2.37.0 harbor.hylstudio.local/dexidp/dex:v2.37.0
sudo docker push harbor.hylstudio.local/dexidp/dex:v2.37.0

sudo docker pull redis:7.0.11-alpine
sudo docker tag redis:7.0.11-alpine harbor.hylstudio.local/library/redis:7.0.11-alpine
sudo docker push harbor.hylstudio.local/library/redis:7.0.11-alpine

sudo docker pull quay.io/argoproj/argocd:v2.9.3
sudo docker tag quay.io/argoproj/argocd:v2.9.3 harbor.hylstudio.local/argoproj/argocd:v2.9.3
sudo docker push harbor.hylstudio.local/argoproj/argocd:v2.9.3

sudo docker pull ghcr.io/dexidp/dex:v2.37.0

sudo docker tag ghcr.io/dexidp/dex:v2.37.0 harbor.hylstudio.local/dexidp/dex:v2.37.0

sudo docker push harbor.hylstudio.local/dexidp/dex:v2.37.0

sudo docker pull redis:7.0.11-alpine

sudo docker tag redis:7.0.11-alpine harbor.hylstudio.local/library/redis:7.0.11-alpine

sudo docker push harbor.hylstudio.local/library/redis:7.0.11-alpine

sudo docker pull quay.io/argoproj/argocd:v2.9.3

sudo docker tag quay.io/argoproj/argocd:v2.9.3 harbor.hylstudio.local/argoproj/argocd:v2.9.3

sudo docker push harbor.hylstudio.local/argoproj/argocd:v2.9.3

替换install.yaml中的镜像路径


kubectl create namespace argocd kubectl apply -n argocd -f install.yaml

0 1	kubectl create namespace argocd kubectl apply -n argocd -f install.yaml

安装后容器列表如图

资源占用如图

2.kubectl配置转发


kubectl -n argocd port-forward service/argocd-server :80

0 1	kubectl -n argocd port-forward service/argocd-server :80

到argocd-initial-admin-secret这里找默认密码，用户名是admin

3.本地命令行配置

mkdir -p /usr/local/argocd/bin cd /usr/local/argocd/bin wget https://localhost:43819/download/argocd-linux-amd64 mv argocd-linux-amd64 argocd cd /usr/local/bin ln -s /usr/local/argocd/bin/argocd

4.first demo

kubectl config set-context –current –namespace=argocd argocd app create guestbook –repo https://github.com/argoproj/argocd-example-apps.git –path guestbook –dest-server https://kubernetes.default.svc –dest-namespace default argocd app get guestbook argocd app sync guestbook

参考

https://argo-cd.readthedocs.io/en/stable/getting_started/

20231125ks-installer追踪

0.背景

博客地址 https://blog.hylstudio.cn/archives/1343

飞书文档 https://paraparty.feishu.cn/docx/FTM7d1TIcoxL83xfRWzc34omnFe

安装k8s的时候最后一步会特别慢，之前追了一半，今天继续追完搞清楚发生了啥

懒得排版了，为了最佳阅读体验可以看飞书，这里做备份

1.追踪过程

前置知识：kubectl、shell-operator、python、ansible、helm

从https://github.com/kubesphere/ks-installer/blob/20a6daa18adf10410a385b48ab2769e55d8bdee2/Dockerfile.complete#L8 我们可以看到主入口的程序来源是https://github.com/flant/shell-operator 这是一个运行在k8s集群中事件驱动脚本的工具。

根据https://github.com/flant/shell-operator/blob/55ca7a92c873cccfdad0c7591048eaeb8cf0dd4b/docs/src/HOOKS.md?plain=1#L11 描述shell-operator默认会递归扫描hooks目录下的文件把按文件的标准输出当作声明监听event

根据https://github.com/kubesphere/ks-installer/blob/20a6daa18adf10410a385b48ab2769e55d8bdee2/Dockerfile#L5 可知controller下的文件会被放到/hooks下作为监听，其中包括installerRunner.py

从 https://github.com/kubesphere/ks-installer/blob/20a6daa18adf10410a385b48ab2769e55d8bdee2/controller/installRunner.py#L30C12-L30C32 可知installRunner.py会在ClusterConfiguration创建或更新的时候被触发。

从 https://github.com/kubesphere/kubekey/blob/ffaa19f430a6b13aa219c7ec699fc2ea705c3a93/cmd/kk/pkg/kubesphere/modules.go#L101 可找到这个触发动作

TODO 待详细查看触发动作中yaml是如何传递kubesphere版本信息的

从 https://github.com/kubesphere/ks-installer/blob/20a6daa18adf10410a385b48ab2769e55d8bdee2/controller/installRunner.py#L338C41-L338C53 可以看到实际执行动作的是ansible，用的是https://ansible.readthedocs.io/projects/runner/en/stable/index.html

run方法文档https://ansible.readthedocs.io/projects/runner/en/stable/ansible_runner/#ansible_runner.interface.run


verbosity (int) – Control how verbose the output of ansible-playbook is
_input (io.FileIO) – An optional file or file-like object for use as input in a streaming pipeline
_output (io.FileIO) – An optional file or file-like object for use as output in a streaming pipeline

verbosity (int) – Control how verbose the output of ansible-playbook is

_input (io.FileIO) – An optional file or file-like object for use as input in a streaming pipeline

_output (io.FileIO) – An optional file or file-like object for use as output in a streaming pipeline

TODO 从ansible playbook中找到kubesphere版本信息是哪个变量
TODO 确认从kubekey到ansible playbook的版本信息传递数据流

参考https://www.cnblogs.com/sylvia-liu/p/14933776.html强制覆盖entrypoint进去修改python


docker pull harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2
docker run -it --entrypoint /bin/bash harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2
cp /hooks/kubesphere/installRunner.py /hooks/kubesphere/installRunner.py.bak
vi /hooks/kubesphere/installRunner.py

docker pull harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

docker run -it --entrypoint /bin/bash harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

cp /hooks/kubesphere/installRunner.py /hooks/kubesphere/installRunner.py.bak

vi /hooks/kubesphere/installRunner.py

进入shell修改python脚本


import io
ansible_log=io.FileIO('/home/kubesphere/ansible.log', 'w')

            _output=ansible_log,
            verbosity=5

import io

ansible_log=io.FileIO('/home/kubesphere/ansible.log', 'w')

_output=ansible_log,

verbosity=5

重新打一个镜像v3.3.3


docker commit 5274b25c35d5 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.3
docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.3

docker commit 5274b25c35d5 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.3

docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.3

安装虚机


./vm.sh -c 4 -m 8 -d 80 -p on k8s1
./vm.sh -c 2 -m 4 -d 40 -p on k8s2
./vm.sh -c 2 -m 4 -d 40 -p on k8s3

./vm.sh -c 4 -m 8 -d 80 -p on k8s1

./vm.sh -c 2 -m 4 -d 40 -p on k8s2

./vm.sh -c 2 -m 4 -d 40 -p on k8s3

分配域名


#dns config
192.168.2.206 k8s-control.hylstudio.local
192.168.2.206 k8s1.k8s.local
192.168.2.177 k8s2.k8s.local
192.168.2.203 k8s3.k8s.local

#dns config

192.168.2.206 k8s-control.hylstudio.local

192.168.2.206 k8s1.k8s.local

192.168.2.177 k8s2.k8s.local

192.168.2.203 k8s3.k8s.local

强行修改ClusterConfiguration发现kk会报错，改回3.3.2先安装


apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
  name: ks-installer
  namespace: kubesphere-system
  labels:
    version: v3.3.2

apiVersion: installer.kubesphere.io/v1alpha1

kind: ClusterConfiguration

metadata:

name: ks-installer

namespace: kubesphere-system

labels:

version: v3.3.2

尝试给v3.3.2替换成v3.3.3的内容，替换之前对v3.3.2备份

#做个假的3.3.2骗掉版本检测


#做个假的3.3.2骗掉版本检测
docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2-bak
docker image rm harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2
docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.3 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2
docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

#做个假的3.3.2骗掉版本检测

docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2-bak

docker image rm harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.3 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

安装到最后一步的时候进入ks-installer


namespace/kubesphere-system unchanged
serviceaccount/ks-installer unchanged
customresourcedefinition.apiextensions.k8s.io/clusterconfigurations.installer.kubesphere.io unchanged
clusterrole.rbac.authorization.k8s.io/ks-installer unchanged
clusterrolebinding.rbac.authorization.k8s.io/ks-installer unchanged
deployment.apps/ks-installer unchanged
clusterconfiguration.installer.kubesphere.io/ks-installer created
13:55:05 UTC success: [k8s1]
Please wait for the installation to complete:    >>--->

namespace/kubesphere-system unchanged

serviceaccount/ks-installer unchanged

customresourcedefinition.apiextensions.k8s.io/clusterconfigurations.installer.kubesphere.io unchanged

clusterrole.rbac.authorization.k8s.io/ks-installer unchanged

clusterrolebinding.rbac.authorization.k8s.io/ks-installer unchanged

deployment.apps/ks-installer unchanged

clusterconfiguration.installer.kubesphere.io/ks-installer created

13:55:05 UTC success: [k8s1]

Please wait for the installation to complete: >>--->

查看python文件发现不是从本地pull的镜像，但describe显示地址和tag都是对的。查看harbor上是sha也是ok的，怀疑启动时候还有别的逻辑会修改python文件

相关进程如下


ks-installer-566ffb8f44-ml9gm:/kubesphere$ ps aux
PID   USER     TIME  COMMAND
    1 kubesphe  0:00 /shell-operator start
   56 kubesphe  0:06 python3 /hooks/kubesphere/installRunner.py
 2501 kubesphe  1:21 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubespher
 4348 kubesphe  0:01 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubespher
 5261 kubesphe  0:00 /bin/sh -c /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700921243.
 5262 kubesphe  0:00 /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700921243.7008557-434
 5263 kubesphe  0:00 /usr/local/bin/kubectl apply -f /kubesphere/kubesphere/ks-core/crds/iam.kubesphere.io_
 5287 kubesphe  0:00 bash
 5299 kubesphe  0:00 ps aux

ks-installer-566ffb8f44-ml9gm:/kubesphere$ ps aux

PID USER TIME COMMAND

1 kubesphe 0:00 /shell-operator start

56 kubesphe 0:06 python3 /hooks/kubesphere/installRunner.py

2501 kubesphe 1:21 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubespher

4348 kubesphe 0:01 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubespher

5261 kubesphe 0:00 /bin/sh -c /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700921243.

5262 kubesphe 0:00 /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700921243.7008557-434

5263 kubesphe 0:00 /usr/local/bin/kubectl apply -f /kubesphere/kubesphere/ks-core/crds/iam.kubesphere.io_

5287 kubesphe 0:00 bash

5299 kubesphe 0:00 ps aux

日志显示如下


Start installing monitoring
Start installing multicluster
Start installing openpitrix
Start installing network
**************************************************
Waiting for all tasks to be completed ...
task network status is successful  (1/4)
task openpitrix status is successful  (2/4)
task multicluster status is successful  (3/4)
只能怀疑是monitoring了

Start installing monitoring

Start installing multicluster

Start installing openpitrix

Start installing network

**************************************************

Waiting for all tasks to be completed ...

task network status is successful (1/4)

task openpitrix status is successful (2/4)

task multicluster status is successful (3/4)

只能怀疑是monitoring了

但从外面执行的时候发现entrypoint已经不对了，用dockerfile重新打一个测试，不要用docker commit

手动cp出来修改过后的python文件


docker cp xxx:/hooks/kubesphere/installRunner.py .

0 1	docker cp xxx:/hooks/kubesphere/installRunner.py .

编写DockerFile


FROM harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2-bak
RUN rm -rf /hooks/kubesphere/installRunner.py
COPY installRunner.py /hooks/kubesphere/

FROM harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2-bak

RUN rm -rf /hooks/kubesphere/installRunner.py

COPY installRunner.py /hooks/kubesphere/

打个新的镜像，注意新建个文件夹减小docker build context体积


mkdir imgbuild
mv installRunner.py DockerFile imgbuild
cd imagebuild
docker build -f DockerFile -t harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.4 .
docker image rm harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2
docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.4 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2
docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.4
docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

mkdir imgbuild

mv installRunner.py DockerFile imgbuild

cd imagebuild

docker build -f DockerFile -t harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.4 .

docker image rm harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.4 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.4

docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

再试一次，发现kubekey会动harbor上的tag导致最后使用的不是替换的镜像

不加-a参数尝试apply installer，注意观察harbor上的tag是否正确

手动删除节点上的镜像，重试。ansible.log没起作用，后面再看

stdout中输出的确实是-vvvvv的内容了等着看最后结果

还有一些方法更完美：

~~在kuberkey修改harbor之前，apply~~ ~~yaml之前就手动去机器上crictl pull正确的镜像，这样检测到本地有的镜像就不会被错误的远程覆盖掉~~
~~分两次安装，第一次安装k8s，手动crictl pull，第二次安装kubesphere~~
~~追下kubekey代码，绕过校验直接用正确的版本号安装~~
最终根据源码https://github.com/kubesphere/kubekey/blob/e755baf67198d565689d7207378174f429b508ba/cmd/kk/cmd/create/cluster.go#L141C43-L141C59得知可通过--skip-push-images来避免harbor的镜像被覆盖

通过观察前面的详细输出，可以确认不是卡在开头修改的run方法中调用的ansible任务


Start installing monitoring
Start installing multicluster
Start installing openpitrix
Start installing network
**************************************************
Waiting for all tasks to be completed ...

Start installing monitoring

Start installing multicluster

Start installing openpitrix

Start installing network

**************************************************

Waiting for all tasks to be completed ...

根据日志输出可找到实际的任务是https://github.com/kubesphere/ks-installer/blob/ef79beead3285698cdce559dd5505c79fe11dbff/controller/installRunner.py#L277C10-L277C20

component List


    readyToEnabledList = [
        'monitoring',
        'multicluster',
        'openpitrix',
        'network']

readyToEnabledList = [

'monitoring',

'multicluster',

'openpitrix',

'network']

根据代码可知执行的是https://github.com/kubesphere/ks-installer/blob/ef79beead3285698cdce559dd5505c79fe11dbff/playbooks/monitoring.yaml

实例化调用的是https://github.com/kubesphere/ks-installer/blob/ef79beead3285698cdce559dd5505c79fe11dbff/controller/installRunner.py#L57

参数来自https://github.com/kubesphere/ks-installer/blob/ef79beead3285698cdce559dd5505c79fe11dbff/controller/installRunner.py#L261

可以看到写死的quiet=True, https://github.com/kubesphere/ks-installer/blob/ef79beead3285698cdce559dd5505c79fe11dbff/controller/installRunner.py#L266C3-L266C24

接下来看看监控安装啥这么慢：

~~静态分析monitoring安装流程~~
~~继续修改这地方增加相同的verbose参数查看输出？异步调用ansible日志会输出到哪？~~
_~~output参数看起来无效，怎么调试。这个参数是给streaming pipeline用的~~
~~status_handler看起来能用~~

TODO kubekey为啥这4个使用异步调用ansible且不输出中间过程？

因quiet=True所以stdout看不到过程，通过查看当前ks-installer中的进程确定执行到哪了


ks-installer-566ffb8f44-zft9h:/hooks/kubesphere$ ps aux|more
PID   USER     TIME  COMMAND
    1 kubesphe  0:00 /shell-operator start
   18 kubesphe  5:49 python3 /hooks/kubesphere/installRunner.py
 2171 kubesphe  0:00 bash
 4053 kubesphe  1:54 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubesphere/config/ks-config.json -e @/kubesphere/config/ks-status.json -e @/kubesphere/results/env/extravars /kubesphere/playbooks/monitoring.yaml
 8876 kubesphe  0:00 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubesphere/config/ks-config.json -e @/kubesphere/config/ks-status.json -e @/kubesphere/results/env/extravars /kubesphere/playbooks/monitoring.yaml
 8899 kubesphe  0:00 /bin/sh -c /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700926548.0542264-8876-155728454178142/AnsiballZ_command.py && sleep 0
 8900 kubesphe  0:01 /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700926548.0542264-8876-155728454178142/AnsiballZ_command.py
 8926 kubesphe  0:00 /usr/local/bin/kubectl apply -f /kubesphere/kubesphere/prometheus/alertmanager
 8957 kubesphe  0:00 ps aux
 8958 kubesphe  0:00 more

ks-installer-566ffb8f44-zft9h:/hooks/kubesphere$ ps aux|more

PID USER TIME COMMAND

1 kubesphe 0:00 /shell-operator start

18 kubesphe 5:49 python3 /hooks/kubesphere/installRunner.py

2171 kubesphe 0:00 bash

4053 kubesphe 1:54 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubesphere/config/ks-config.json -e @/kubesphere/config/ks-status.json -e @/kubesphere/results/env/extravars /kubesphere/playbooks/monitoring.yaml

8876 kubesphe 0:00 {ansible-playboo} /usr/local/bin/python /usr/local/bin/ansible-playbook -e @/kubesphere/config/ks-config.json -e @/kubesphere/config/ks-status.json -e @/kubesphere/results/env/extravars /kubesphere/playbooks/monitoring.yaml

8899 kubesphe 0:00 /bin/sh -c /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700926548.0542264-8876-155728454178142/AnsiballZ_command.py && sleep 0

8900 kubesphe 0:01 /usr/local/bin/python /home/kubesphere/.ansible/tmp/ansible-tmp-1700926548.0542264-8876-155728454178142/AnsiballZ_command.py

8926 kubesphe 0:00 /usr/local/bin/kubectl apply -f /kubesphere/kubesphere/prometheus/alertmanager

8957 kubesphe 0:00 ps aux

8958 kubesphe 0:00 more

monitor的任务定义在https://github.com/kubesphere/ks-installer/blob/ef79beead3285698cdce559dd5505c79fe11dbff/roles/ks-monitor/tasks/main.yaml

通过watch -n 1 'ps aux|more'可以看到执行过程，其实没卡住就是安装的慢

根据https://github.com/ansible/ansible-runner/blob/e81b02cae85f7c3e402fcb1cc1512da5ee3bcf35/src/ansible_runner/interface.py#L228C28-L228C28可知返回值第一个是异步执行的线程，修改代码尝试重定向输出到日志文件

同时修改quiet = False


def generateTaskLists():
    readyToEnabledList, readyToDisableList = getComponentLists()
    tasksDict = {}
    for taskName in readyToEnabledList:
        playbookPath = os.path.join(playbookBasePath, str(taskName) + '.yaml')
        artifactDir = os.path.join(privateDataDir, str(taskName))
        if os.path.exists(artifactDir):
            shutil.rmtree(artifactDir)

        tasksDict[str(taskName)] = component(
            playbook=playbookPath,
            private_data_dir=privateDataDir,
            artifact_dir=artifactDir,
            ident=str(taskName),
            quiet=False,
            rotate_artifacts=1
        )

    return tasksDict
    

    def installRunner(self):
        installer = ansible_runner.run_async(
            playbook=self.playbook,
            private_data_dir=self.private_data_dir,
            artifact_dir=self.artifact_dir,
            ident=self.ident,
            quiet=self.quiet,
            rotate_artifacts=self.rotate_artifacts,
            verbosity=5
        )
        task_name = self.ident
        thread = installer[0]
        log_file = open('/tmp/'+task_name+'.debug.log', 'w')
        thread.stdout = log_file
        return installer[1]

def generateTaskLists():

readyToEnabledList, readyToDisableList = getComponentLists()

tasksDict = {}

for taskName in readyToEnabledList:

playbookPath = os.path.join(playbookBasePath, str(taskName) + '.yaml')

artifactDir = os.path.join(privateDataDir, str(taskName))

if os.path.exists(artifactDir):

shutil.rmtree(artifactDir)

tasksDict[str(taskName)] = component(

playbook=playbookPath,

private_data_dir=privateDataDir,

artifact_dir=artifactDir,

ident=str(taskName),

quiet=False,

rotate_artifacts=1

)

return tasksDict

def installRunner(self):

installer = ansible_runner.run_async(

playbook=self.playbook,

private_data_dir=self.private_data_dir,

artifact_dir=self.artifact_dir,

ident=self.ident,

quiet=self.quiet,

rotate_artifacts=self.rotate_artifacts,

verbosity=5

)

task_name = self.ident

thread = installer[0]

log_file = open('/tmp/'+task_name+'.debug.log', 'w')

thread.stdout = log_file

return installer[1]

最终修改对比


--- a/installRunner.py.bak
+++ b/installRunner.py
@@ -90,8 +90,13 @@ class component():
             artifact_dir=self.artifact_dir,
             ident=self.ident,
             quiet=self.quiet,
-            rotate_artifacts=self.rotate_artifacts
+            rotate_artifacts=self.rotate_artifacts,
+            verbosity=5
         )
+        task_name = self.ident
+        thread = installer[0]
+        log_file = open('/tmp/'+task_name+'.debug.log', 'w')
+        thread.stdout = log_file
         return installer[1]


@@ -263,7 +268,7 @@ def generateTaskLists():
             private_data_dir=privateDataDir,
             artifact_dir=artifactDir,
             ident=str(taskName),
-            quiet=True,
+            quiet=False,
             rotate_artifacts=1
         )

--- a/installRunner.py.bak

+++ b/installRunner.py

@@ -90,8 +90,13 @@ class component():

artifact_dir=self.artifact_dir,

ident=self.ident,

quiet=self.quiet,

- rotate_artifacts=self.rotate_artifacts

+ rotate_artifacts=self.rotate_artifacts,

+ verbosity=5

)

+ task_name = self.ident

+ thread = installer[0]

+ log_file = open('/tmp/'+task_name+'.debug.log', 'w')

+ thread.stdout = log_file

return installer[1]

@@ -263,7 +268,7 @@ def generateTaskLists():

private_data_dir=privateDataDir,

artifact_dir=artifactDir,

ident=str(taskName),

- quiet=True,

+ quiet=False,

rotate_artifacts=1

)

按最小修改打镜像


mkdir imgbuild
mv installRunner.py DockerFile imgbuild
cd imagebuild
docker build -f DockerFile -t harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.7 .
docker image rm harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2
docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.7 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2
docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.7
docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2
docker image ls --digests|grep installer

mkdir imgbuild

mv installRunner.py DockerFile imgbuild

cd imagebuild

docker build -f DockerFile -t harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.7 .

docker image rm harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

docker tag harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.7 harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.7

docker push harbor.hylstudio.local/kubesphereio/ks-installer:v3.3.2

docker image ls --digests|grep installer

强制删除kubesphere monitor的namespace，强制删除ks-installer这个pod和宿主机的installer镜像，迫使k8s重建pod，这时候会按harbor的镜像tag重新拉取

harbor看不到imageId，Image ID是docker客户端侧的概念，所以仓库不管= =

https://github.com/goharbor/harbor/issues/10293

https://github.com/goharbor/harbor/issues/2469

~~下次安装在kubekey apply~~ ~~installer之后，手动去edit镜像为指定的tag，其他的不动了~~

修改thread.stdout = log_file并不会输出，根据https://github.com/ansible/ansible-runner/blob/e0371d634426dfbdb9d3bfacb20e2dd4b039b499/src/ansible_runner/runner.py#L155C28-L155C48

可知self.config.suppress_output_file为假才会输出到文件self.config.artifact_dir的/stdout和stderr中，artifact_dir来自/kubesphere/results/{task_name}，suppress_output_file和suppress_ansible_output默认值也都是False

增加--skip-push-images，修改quite=False后去/kubesphere/results下查看是否有stdout和stderr


%s/quiet=True/quiet=False/g

0 1	%s/quiet=True/quiet=False/g


--- a/installRunner.py.bak
+++ b/installRunner.py
@@ -85,6 +85,7 @@ class component():

     def installRunner(self):
         installer = ansible_runner.run_async(
+            verbosity=5,
             playbook=self.playbook,
             private_data_dir=self.private_data_dir,
             artifact_dir=self.artifact_dir,
@@ -263,7 +264,7 @@ def generateTaskLists():
             private_data_dir=privateDataDir,
             artifact_dir=artifactDir,
             ident=str(taskName),
-            quiet=True,
+            quiet=False,
             rotate_artifacts=1
         )

@@ -341,6 +342,7 @@ def preInstallTasks():

     for task, paths in preInstallTasks.items():
         pretask = ansible_runner.run(
+            verbosity=5,
             playbook=paths[0],
             private_data_dir=privateDataDir,
             artifact_dir=paths[1],
@@ -353,11 +355,12 @@ def preInstallTasks():

 def resultInfo(resultState=False, api=None):
     ks_config = ansible_runner.run(
+        verbosity=5,
         playbook=os.path.join(playbookBasePath, 'ks-config.yaml'),
         private_data_dir=privateDataDir,
         artifact_dir=os.path.join(privateDataDir, 'ks-config'),
         ident='ks-config',
-        quiet=True
+        quiet=False
     )

     if ks_config.rc != 0:
@@ -365,11 +368,12 @@ def resultInfo(resultState=False, api=None):
         exit()
         
     result = ansible_runner.run(
+        verbosity=5,
         playbook=os.path.join(playbookBasePath, 'result-info.yaml'),
         private_data_dir=privateDataDir,
         artifact_dir=os.path.join(privateDataDir, 'result-info'),
         ident='result',
-        quiet=True
+        quiet=False
     )

     if result.rc != 0:
@@ -380,6 +384,7 @@ def resultInfo(resultState=False, api=None):

     if "migration" in resource['status']['core'] and resource['status']['core']['migration'] and resultState == False:
         migration = ansible_runner.run(
+            verbosity=5,
             playbook=os.path.join(playbookBasePath, 'ks-migration.yaml'),
             private_data_dir=privateDataDir,
             artifact_dir=os.path.join(privateDataDir, 'ks-migration'),
@@ -395,11 +400,12 @@ def resultInfo(resultState=False, api=None):
             logging.info(info)

     telemeter = ansible_runner.run(
+        verbosity=5,
         playbook=os.path.join(playbookBasePath, 'telemetry.yaml'),
         private_data_dir=privateDataDir,
         artifact_dir=os.path.join(privateDataDir, 'telemetry'),
         ident='telemetry',
-        quiet=True
+        quiet=False
     )

     if telemeter.rc != 0:

--- a/installRunner.py.bak

+++ b/installRunner.py

@@ -85,6 +85,7 @@ class component():

def installRunner(self):

installer = ansible_runner.run_async(

+ verbosity=5,

playbook=self.playbook,

private_data_dir=self.private_data_dir,

artifact_dir=self.artifact_dir,

@@ -263,7 +264,7 @@ def generateTaskLists():

private_data_dir=privateDataDir,

artifact_dir=artifactDir,

ident=str(taskName),

- quiet=True,

+ quiet=False,

rotate_artifacts=1

)

@@ -341,6 +342,7 @@ def preInstallTasks():

for task, paths in preInstallTasks.items():

pretask = ansible_runner.run(

+ verbosity=5,

playbook=paths[0],

private_data_dir=privateDataDir,

artifact_dir=paths[1],

@@ -353,11 +355,12 @@ def preInstallTasks():

def resultInfo(resultState=False, api=None):

ks_config = ansible_runner.run(

+ verbosity=5,

playbook=os.path.join(playbookBasePath, 'ks-config.yaml'),

private_data_dir=privateDataDir,

artifact_dir=os.path.join(privateDataDir, 'ks-config'),

ident='ks-config',

- quiet=True

+ quiet=False

)

if ks_config.rc != 0:

@@ -365,11 +368,12 @@ def resultInfo(resultState=False, api=None):

exit()

result = ansible_runner.run(

+ verbosity=5,

playbook=os.path.join(playbookBasePath, 'result-info.yaml'),

private_data_dir=privateDataDir,

artifact_dir=os.path.join(privateDataDir, 'result-info'),

ident='result',

- quiet=True

+ quiet=False

)

if result.rc != 0:

@@ -380,6 +384,7 @@ def resultInfo(resultState=False, api=None):

if "migration" in resource['status']['core'] and resource['status']['core']['migration'] and resultState == False:

migration = ansible_runner.run(

+ verbosity=5,

playbook=os.path.join(playbookBasePath, 'ks-migration.yaml'),

private_data_dir=privateDataDir,

artifact_dir=os.path.join(privateDataDir, 'ks-migration'),

@@ -395,11 +400,12 @@ def resultInfo(resultState=False, api=None):

logging.info(info)

telemeter = ansible_runner.run(

+ verbosity=5,

playbook=os.path.join(playbookBasePath, 'telemetry.yaml'),

private_data_dir=privateDataDir,

artifact_dir=os.path.join(privateDataDir, 'telemetry'),

ident='telemetry',

- quiet=True

+ quiet=False

)

if telemeter.rc != 0:

可到/kubesphere/results看到当前执行任务的实时日志


drwxr-xr-x    3 kubesphe kubesphe    4.0K Dec  2 13:37 common
drwxr-xr-x    1 kubesphe kubesphe    4.0K Feb  3  2023 env
drwxr-xr-x    3 kubesphe kubesphe    4.0K Dec  2 13:41 ks-core
drwxr-xr-x    3 kubesphe kubesphe    4.0K Dec  2 13:37 metrics_server
drwxr-xr-x    3 kubesphe kubesphe    4.0K Dec  2 13:37 preinstall

drwxr-xr-x 3 kubesphe kubesphe 4.0K Dec 2 13:37 common

drwxr-xr-x 1 kubesphe kubesphe 4.0K Feb 3 2023 env

drwxr-xr-x 3 kubesphe kubesphe 4.0K Dec 2 13:41 ks-core

drwxr-xr-x 3 kubesphe kubesphe 4.0K Dec 2 13:37 metrics_server

drwxr-xr-x 3 kubesphe kubesphe 4.0K Dec 2 13:37 preinstall

综上所述，当kuberkey出现Please wait for the installation to complete时，控制权是交给了ks-installer


namespace/kubesphere-system unchanged
serviceaccount/ks-installer unchanged
customresourcedefinition.apiextensions.k8s.io/clusterconfigurations.installer.kubesphere.io unchanged
clusterrole.rbac.authorization.k8s.io/ks-installer unchanged
clusterrolebinding.rbac.authorization.k8s.io/ks-installer unchanged
deployment.apps/ks-installer unchanged
clusterconfiguration.installer.kubesphere.io/ks-installer created
13:35:26 UTC success: [k8s1]
Please wait for the installation to complete:    >>--->

namespace/kubesphere-system unchanged

serviceaccount/ks-installer unchanged

customresourcedefinition.apiextensions.k8s.io/clusterconfigurations.installer.kubesphere.io unchanged

clusterrole.rbac.authorization.k8s.io/ks-installer unchanged

clusterrolebinding.rbac.authorization.k8s.io/ks-installer unchanged

deployment.apps/ks-installer unchanged

clusterconfiguration.installer.kubesphere.io/ks-installer created

13:35:26 UTC success: [k8s1]

Please wait for the installation to complete: >>--->

当ks-installer出现这样的字样时，可到/kubesphere/results继续查看剩余4个并行执行的任务状态


Start installing monitoring
Start installing multicluster
Start installing openpitrix
Start installing network
**************************************************
Waiting for all tasks to be completed ...

Start installing monitoring

Start installing multicluster

Start installing openpitrix

Start installing network

**************************************************

Waiting for all tasks to be completed ...

按之前的速度，/kubesphere/results/monitoring/monitoring是最慢的，着重观察这个。手动观察剩下三个都已经成功了，只剩下monitoring还在安装

突然发现安装过程居然也有helm，简直是究极套娃，技术栈极其复杂


PLAY RECAP *********************************************************************
localhost                  : ok=24   changed=22   unreachable=0    failed=0    skipped=24   rescued=0    ignored=0

task monitoring status is successful  (4/4)

PLAY RECAP *********************************************************************

localhost : ok=24 changed=22 unreachable=0 failed=0 skipped=24 rescued=0 ignored=0

task monitoring status is successful (4/4)

monitoring这么复杂的任务，不知道官方出于什么理由彻底隐藏了安装过程，不给任何进度提示。在monitoring安装完成后，才出现了下面kubesphere-config的apply操作，所以在这之前kubesphere的webui是无法访问的，因为这里包含了重要的jwttoken来访问kubesphere api，容器会一直报错等待这个configmap的初始化，至此曾经的疑惑均完成了解答


changed: [localhost] => (item=kubesphere-config.yaml) => {
    "ansible_loop_var": "item",
    "changed": true,
    "cmd": "/usr/local/bin/kubectl apply -f /kubesphere/kubesphere/kubesphere-config.yaml",
    "delta": "0:00:07.874273",
    "end": "2023-12-02 14:09:16.591793",
    "failed_when_result": false,
    "invocation": {
        "module_args": {
            "_raw_params": "/usr/local/bin/kubectl apply -f /kubesphere/kubesphere/kubesphere-config.yaml",
            "_uses_shell": true,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true,
            "warn": true
        }
    },
    "item": "kubesphere-config.yaml",
    "rc": 0,
    "start": "2023-12-02 14:09:08.717520",
    "stderr": "",
    "stderr_lines": [],
    "stdout": "configmap/kubesphere-config created",
    "stdout_lines": [
        "configmap/kubesphere-config created"
    ]
}

changed: [localhost] => (item=kubesphere-config.yaml) => {

"ansible_loop_var": "item",

"changed": true,

"cmd": "/usr/local/bin/kubectl apply -f /kubesphere/kubesphere/kubesphere-config.yaml",

"delta": "0:00:07.874273",

"end": "2023-12-02 14:09:16.591793",

"failed_when_result": false,

"invocation": {

"module_args": {

"_raw_params": "/usr/local/bin/kubectl apply -f /kubesphere/kubesphere/kubesphere-config.yaml",

"_uses_shell": true,

"argv": null,

"chdir": null,

"creates": null,

"executable": null,

"removes": null,

"stdin": null,

"stdin_add_newline": true,

"strip_empty_ends": true,

"warn": true

}

"item": "kubesphere-config.yaml",

"rc": 0,

"start": "2023-12-02 14:09:08.717520",

"stderr": "",

"stderr_lines": [],

"stdout": "configmap/kubesphere-config created",

"stdout_lines": [

"configmap/kubesphere-config created"

]

}

TODO 收集results文件夹下所有event的json画散点图观察

TODO 给kubekey增加参数，打印详细的输出。读取参数后参考上面的代码修改即可

TODO 监控对kubesphere ui的启动应该不是强依赖，梳理依赖关系增加参数来支持异步安装监控，提前让kubesphere可用。需要读ansible脚本确认。根据日志显示telemetry的安装就是完全独立于kubesphere的安装，可以参考

20231003DevOps相关特性总结

0.背景

总结一些无需保密的经验作为总结

20230610企业级web开发特性总结

1.DevOps相关特性总结

监听代码变动
打包环境容器化
平滑关机
健康检查
流量路由
多环境打包

配置中心

一般每个独立动作称作action，一次独立完整的流程称作pipeline，一条pipeline由多个action编排组成

pipeline可人为指定触发条件，可以选择手动触发、定时触发、git事件触发等

action之间可串行并行随意组合，action节点可选择自动执行、定时执行、人工执行、人工二次确认等选项。action节点的类型不同平台略有差异。

2.代码变动监听

代码变动监听一般使用git hooks，配置在git server上，可以监听包括但不限于push、merge requests、tag等动作事件并回调给指定的api

一般配合分支和tag规范来共同管理开发到上线的全流程，例如feature分支->dev分支->master分支三层结构或更细化的多层分支结构。在不同类型的分支上触发的pipeline有所区别，例如feature分支可能额外触发代码检查pipeline流程，dev分支额外触发单元测试pipeline流程，master分支或tag触发上线pipeline流程。

3.打包环境容器化

打包的环境通常不在开发机而是专用的打包机，打包机的环境除非特殊需求一般使用docker容器通过镜像统一管理打包环境，一般会选择包含docker环境的docker作为打包基础环境，在此基础上定制各种打包的环境依赖、调整打包依赖的缓存文件夹共享策略，以此来保证打包环境的稳定和可控。通常还会配套使用私有镜像中心（如docker registry）和私有包管理中心（如jfrog）来加速打包过程排除网络的影响

4.平滑关机

对于重要的服务不仅需要滚动重启，还必须实现平滑关机来保证服务滚动重启的过程中的稳定性，一般思路是程序自行实现平滑关机流程并允许通过监听SIGTERM、SIGKILL等信号或单独实现http或rpc接口触发。在上线的pipeline或控制器或上级调度程序提前调整网络流量后，再通知对应的节点执行平滑关机流程并自行退出，在一定时间超时后强行杀死，因此通常配合健康检查和流量路由使用。

5.健康检查

健康检查一般也由应用程序自行实现，可分为可达性检查和可用性检查，其中可达性检查主要用于探测网络链路是否通畅，不关心程序本身是否可用，可用性检查在可达性检查的基础还必须满足可用的定义。

可用的定义在不同的场景下定义不同，对web服务来说比较简单的定义就是可接收http请求并成功回应，再严格一点的是要同时满足依赖的数据库服务、缓存服务、队列服务同时可用，但严格的定义会导致底层服务故障时无法让带有降级逻辑的程序执行降级导致服务彻底不可用，因此一般不轻易采用，而是让程序自行选择是执行降级逻辑还是直接认为服务不可用。

6.流量路由

20230725接入层网络总结

流量路由主要负责处理流量的流向，上一篇总结的路径上的流量通常被称作是南北向流量或者纵向流量，应用程序之间的通信通常被称作东西向流量或横向流量

对某个服务集群来说一般通过网络层路由、客户端或服务器服务发现、负载均衡来完成精确的控制流量流向。特别的对于云原生的服务网格来说主要处理的是东西向流量。

7.多环境打包

针对同一份代码需要运行在多个数据中心、多个公司、多个环境，因此需要通过某种方式来区分运行环境。可选策略一般有几个思路，一种是用同一份产物通过配置中心、配置文件、环境变量等做出区分来改变程序能感知的环境，另一种是打包时根据需要直接按不同环境组合生成多种环境运行的产物。

对于使用配置中心、配置文件、环境变量来处理的方案，需要额外注意启动时候的传入参数和配置中心的设置，方便集中管理但很容易出意外

对于使用直接打包多种产物来处理的方案，虽然不用担心环境参数错误，但需要额外考虑环境参数的保密问题也就是产物的保密问题

20230725接入层网络总结

0.背景

总结一些无需保密的经验作为总结

20230610企业级web开发特性总结

1.接入层网络

一般web系统都是由接入层网络+业务层网络组成，从用户客户端到业务曾服务的入口这段流量路径通常是人们所说的接入层网络

网络层这中间通常会包括：DNS、X层负载均衡

物理链路上通常会包括三大运营商接入网络、小运营商接入网络、机房间专线、核心交换机互联

相关技术包括虚拟IP、4层/7层网络报文解析等

2.相关总结

一个典型的wen服务接入层网络通常是由 DNS系统、4层负载均衡、7层负载均衡组成

无论是哪层，核心思想都是把多个一样的东西对外表现的”像是”一个，理解了这个就能理解接入层网络的大部分逻辑

在DNS层，DNS可以做到分地区、分运营商进行多IP解析、主备策略、BGP机房策略、健康检查、灾备切换

在网络协议栈第4层，可以将多个IP+端口对外当成一个IP+端口使用，多个IP一般叫做L4RS，一个IP一般叫做L4VIP

当然这一层也会包含负载均衡的策略、健康检查、灾备自动切换等功能

在网络协议栈第7曾，可以将多个host+端口对外当成一个host+端口，host的所在一般叫虚拟主机，多个host+端口一般称作L7RS或serviceIP或RealIP

当然这一层也会包含负载均衡策略、健康检查、灾备自动切换

还有一种特殊流程是L4VIP直通serviceIPs，对于IM类特殊应用就不会都使用7层负载均衡

典型拓扑图如下，敏感信息已虚化处理

20230701kubesphere修复bug方案

0.背景

博客地址 https://blog.hylstudio.cn/archives/1258

飞书文档 https://y5eh3nr7kr.feishu.cn/docx/WofxdG39IomQZAxpg83c2LranGd

kubesphere有点小bug想修一修，准备搭一个本地的环境。

利用最新版的kubekey已经可以支持搭建自定义域名的harbor当作镜像中心了，利用这个镜像中心可离线或加速安装当前官网文档上的kubersphere和k8s。

当安装完成后进入负责安装的pod内查看来下，发现是/shell-operator在运行，打算从这个东西入手来尝试搞清楚kubesphere的安装和运行逻辑，从而对现有的kubesphere做补丁升级


ubuntu@k8s1:~$ kubectl exec -it  -n kubesphere-system ks-installer-566ffb8f44-l769h -- bash
ks-installer-566ffb8f44-l769h:/kubesphere$ pwd
/kubesphere
ks-installer-566ffb8f44-l769h:/kubesphere$ ls
config      installer   kubesphere  playbooks   results
ks-installer-566ffb8f44-l769h:/kubesphere$ ps aux
PID   USER     TIME  COMMAND
    1 kubesphe  0:01 /shell-operator start
13488 kubesphe  0:00 bash
13495 kubesphe  0:00 ps aux

ubuntu@k8s1:~$ kubectl exec -it -n kubesphere-system ks-installer-566ffb8f44-l769h -- bash

ks-installer-566ffb8f44-l769h:/kubesphere$ pwd

/kubesphere

ks-installer-566ffb8f44-l769h:/kubesphere$ ls

config installer kubesphere playbooks results

ks-installer-566ffb8f44-l769h:/kubesphere$ ps aux

PID USER TIME COMMAND

1 kubesphe 0:01 /shell-operator start

13488 kubesphe 0:00 bash

13495 kubesphe 0:00 ps aux

1.首先搞明白自动安装原理

前置知识：kubectl、shell-operator、python、ansible

从 https://github.com/kubesphere/kubekey/blob/ffaa19f430a6b13aa219c7ec699fc2ea705c3a93/cmd/kk/pkg/kubesphere/modules.go#L101 可找到这个触发动作

TODO 待详细查看触发动作中yaml是否包含版本参数

从 https://github.com/kubesphere/ks-installer/blob/20a6daa18adf10410a385b48ab2769e55d8bdee2/controller/installRunner.py#L338C41-L338C53 可以看到实际执行动作的是ansible

2.修改方案

根据上面说明可知控制链路是kubekey->ks-installer->shell-operator->ansible->kubectl->kubesphere服务组件

fork官方代码修改后重新制作镜像，获得一个独立的docker镜像地址+tag
更新kubesphere服务组件的镜像tag，根据和kubesphere距离可选
1. 通过修改对应的Pod声明
2. 通过修改ClusterConfiguration间接触发shell-operator
3. 通过修改kubekey配置文件

3.bug列表

多集群管理下，toolkit只显示master集群的kubectl config。https://github.com/kubesphere/kubesphere/issues/5766
toolkit的提示文案未和当前用户身份做匹配，用户无权限的按钮也有提示文案
筛选某个namespace的资源后，跳转到pvc后过滤条件丢失

20230623esxi自动创建虚机+安装ubuntu

0.背景

做实验需要反复创建虚机装系统，略麻烦。打算给家里云做个低配版的虚机自动化管理

1.方案调研

esxi本身有接口和SDK可以给各种语言做自动化对接，从官方github就能找到不少资料

ubuntu自动安装也比较成熟，常见的方案都是使用cloud-init来提供user-data，可以参考官方文档试试

2.虚机自动创建

使用govc命令行即可，脚本参考https://github.com/vmware/govmomi/blob/main/scripts/vcsa/create-esxi-vm.sh 去掉了对我没用的部分并增加了几个参数方便自己使用，完整文档请参考https://github.com/vmware/govmomi/tree/main/govc#usage


#!/bin/bash -e
# Create a VM and boot vm from cdrom/iso, modified from https://github.com/vmware/govmomi/blob/main/scripts/vcsa/create-esxi-vm.sh
# GOVC_* environment variables also apply, see https://github.com/vmware/govmomi/tree/main/govc#usage
# If GOVC_USERNAME is set, it is used to login ESXi
# If GOVC_PASSWORD is set, the account password will be set to this value
# 6.7 U3 https://docs.vmware.com/en/VMware-vSphere/6.7/rn/vsphere-esxi-vcenter-server-67-release-notes.html
export GOVC_URL=1.2.3.4 #esxi的IP
export GOVC_INSECURE=1
export GOVC_USERNAME=ESXI_USER #esxi用户名
export GOVC_PASSWORD=ESXI_PASSWORD #esxi密码
export GOVC_DATASTORE=datastore1
export GOVC_NETWORK=ESXI_NET#默认交换机网络，我用的是test
#. ~/.govcrc
set -o pipefail
usage() {
  cat <<'EOF'
Usage: $0 [-c CORE] [-m MEM_GB] [-d DISK_GB] [-t DISK_THICK] [-i ISO] [-p POWER_STATE] VM_NAME
Example1: $0 vmname
Example2: $0 -c 2 -m 2 -d 40 -t true -i ubuntu-22.04.1-live-server-amd64.iso -p on vmname
EOF
}

core=2
mem=2
disk=40
thick=true
iso=ubuntu-22.04.1-live-server-amd64.iso
power=off
while getopts c:m:d:t:i:p:h flag
do
  echo "flag=$flag"
  echo "val=$OPTARG"
  case $flag in
    c)
      core=$OPTARG
      ;;
    m)
      mem=$OPTARG
      ;;
    d)
      disk=$OPTARG
      ;;
    t)
      thick=$OPTARG
      ;;
    i)
      iso=$OPTARG
      ;;
    p)
      power=$OPTARG
      ;;
    h)
      usage
      exit
      ;;
    *)
      usage 1>&2
      exit 1
      ;;
  esac
done

shift $((OPTIND-1))

if [ $# -ne 1 ] ; then
  usage
  exit 1
fi

# if [[ "$iso" == *"-Installer-"* ]] ; then
#   echo "Invalid iso name (need stateless, not installer): $iso" 1>&2
#   exit 1
# fi
network=${GOVC_NETWORK:-"VM Network"}
username=$GOVC_USERNAME
password=$GOVC_PASSWORD
guest=${GUEST:-"ubuntu64Guest"}
name=$1
echo -n "Checking govc version..."
govc version -require 0.15.0
#vm
echo "Creating vm ${name}..."
govc vm.create -version 6.7 -on=false -net "$network" -m $((mem*1024)) -c $core -g "$guest" -net.adapter=vmxnet3 -disk.controller pvscsi "$name"
#cdrom
echo "Adding cdrom device to ${name}..."
id=$(govc device.cdrom.add -vm "$name")
boot="sysimg/$iso"
# upload iso
# if ! govc datastore.ls "$boot" > /dev/null 2>&1 ; then
#   govc datastore.upload "$iso" "$boot"
# fi
echo "Inserting $boot into $name cdrom device..."
govc device.cdrom.insert -vm "$name" -device "$id" "$boot"
#disk
echo "Creating disk for use by $name..."
diskname=$name
govc vm.disk.create -vm "$name" -name "$name"/"$diskname" -size "${disk}G" -thick=$thick
#change boot seq
echo "Change bios boot seq for $name..."
govc device.boot -vm "$name" -order disk,ethernet,cdrom
#get mac address
echo "Powering on $name VM..."
govc vm.power -on "$name"
mac=""
while [ x$mac = x"" ] || [ x$mac = x"null" ]
do
  mac=$(govc vm.info -json $name | jq -r ".VirtualMachines[0].Config.Hardware.Device[]|select(.Backing.DeviceName == \"$network\")|.MacAddress")
  sleep 1
done
echo "Get vm macaddress succ, $mac"
#power on
if [ x$power = x"on" ] ; then
    echo "Waiting for $name IP..."
    vm_ip=$(govc vm.ip "$name")
    ! govc events -n 100 "vm/$name" | grep -E 'warning|error'
    echo "$name IP get succ"
    echo $vm_ip
    echo "Create VM Done: $name, $vm_ip"
else
    govc vm.power -off "$name"
    echo "Create VM Done: $name"
    echo "You can power on it by:"
    echo govc vm.power -on "$name"
fi
exit 0

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

#!/bin/bash -e

# Create a VM and boot vm from cdrom/iso, modified from https://github.com/vmware/govmomi/blob/main/scripts/vcsa/create-esxi-vm.sh

# GOVC_* environment variables also apply, see https://github.com/vmware/govmomi/tree/main/govc#usage

# If GOVC_USERNAME is set, it is used to login ESXi

# If GOVC_PASSWORD is set, the account password will be set to this value

# 6.7 U3 https://docs.vmware.com/en/VMware-vSphere/6.7/rn/vsphere-esxi-vcenter-server-67-release-notes.html

export GOVC_URL=1.2.3.4 #esxi的IP

export GOVC_INSECURE=1

export GOVC_USERNAME=ESXI_USER #esxi用户名

export GOVC_PASSWORD=ESXI_PASSWORD #esxi密码

export GOVC_DATASTORE=datastore1

export GOVC_NETWORK=ESXI_NET#默认交换机网络，我用的是test

#. ~/.govcrc

set -o pipefail

usage() {

cat <<'EOF'

Usage: $0 [-c CORE] [-m MEM_GB] [-d DISK_GB] [-t DISK_THICK] [-i ISO] [-p POWER_STATE] VM_NAME

Example1: $0 vmname

Example2: $0 -c 2 -m 2 -d 40 -t true -i ubuntu-22.04.1-live-server-amd64.iso -p on vmname

EOF

}

core=2

mem=2

disk=40

thick=true

iso=ubuntu-22.04.1-live-server-amd64.iso

power=off

while getopts c:m:d:t:i:p:h flag

echo "flag=$flag"

echo "val=$OPTARG"

case $flag in

core=$OPTARG

;;

mem=$OPTARG

;;

disk=$OPTARG

;;

thick=$OPTARG

;;

iso=$OPTARG

;;

power=$OPTARG

;;

usage

exit

;;

usage 1>&2

exit 1

;;

esac

done

shift $((OPTIND-1))

if [ $# -ne 1 ] ; then

usage

exit 1

# if [[ "$iso" == *"-Installer-"* ]] ; then

# echo "Invalid iso name (need stateless, not installer): $iso" 1>&2

# exit 1

# fi

network=${GOVC_NETWORK:-"VM Network"}

username=$GOVC_USERNAME

password=$GOVC_PASSWORD

guest=${GUEST:-"ubuntu64Guest"}

name=$1

echo -n "Checking govc version..."

govc version -require 0.15.0

#vm

echo "Creating vm ${name}..."

govc vm.create -version 6.7 -on=false -net "$network" -m $((mem*1024)) -c $core -g "$guest" -net.adapter=vmxnet3 -disk.controller pvscsi "$name"

#cdrom

echo "Adding cdrom device to ${name}..."

id=$(govc device.cdrom.add -vm "$name")

boot="sysimg/$iso"

# upload iso

# if ! govc datastore.ls "$boot" > /dev/null 2>&1 ; then

# govc datastore.upload "$iso" "$boot"

# fi

echo "Inserting $boot into $name cdrom device..."

govc device.cdrom.insert -vm "$name" -device "$id" "$boot"

#disk

echo "Creating disk for use by $name..."

diskname=$name

govc vm.disk.create -vm "$name" -name "$name"/"$diskname" -size "${disk}G" -thick=$thick

#change boot seq

echo "Change bios boot seq for $name..."

govc device.boot -vm "$name" -order disk,ethernet,cdrom

#get mac address

echo "Powering on $name VM..."

govc vm.power -on "$name"

mac=""

while [ x$mac = x"" ] || [ x$mac = x"null" ]

mac=$(govc vm.info -json $name | jq -r ".VirtualMachines[0].Config.Hardware.Device[]|select(.Backing.DeviceName == \"$network\")|.MacAddress")

sleep 1

done

echo "Get vm macaddress succ, $mac"

#power on

if [ x$power = x"on" ] ; then

echo "Waiting for $name IP..."

vm_ip=$(govc vm.ip "$name")

! govc events -n 100 "vm/$name" | grep -E 'warning|error'

echo "$name IP get succ"

echo $vm_ip

echo "Create VM Done: $name, $vm_ip"

else

govc vm.power -off "$name"

echo "Create VM Done: $name"

echo "You can power on it by:"

echo govc vm.power -on "$name"

exit 0

3.ubuntu自动安装

官方的cloud-image.ova我测了好几次都都没法用govc正常传递-options参数，json里指定的user-data并没有生效，不知道为什么，因此先尝试半自动手动修改引导参数测试

3.1.user-data准备

首先按官方文档准备一个目录，放入user-data和user-meta后启动http服务器

user-data内容如下，有这么几点需要注意

开头的第一行注释不要删
在22.04.1这个版本的ubuntu下，如果不加refresh-installer会导致lvm的sizing-policy失效，40G系统盘默认只会用20G。
disable_suites如果不加security会导致本地实验老去联网升级安全补丁，为了加快速度我就关了，实际生产环境不要关
packages如果指定了内容，package_update无论是什么都会更新一次源


#cloud-config
autoinstall:
  version: 1
  refresh-installer:
    update: yes
  identity:
    hostname: ubuntu1
    username: ubuntu
    password: "$6$exDY1mhS4KUYCE/2$zmn9ToZwTKLhCw.b4/b.ZRTIZM30JZ4QrOQ2aOXJ8yk96xpcCof0kxKwuX1kqLG/ygbJ1f8wxED22bTL4F46P0"
  package_upgrade: false
  package_update: true
  packages:
    - net-tools
  apt:
    disable_suites: [security]
    primary:
      - arches: [default]
        uri: https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
  ssh:
    install-server: true
    allow-pw: false
    authorized-keys:
      - ssh-rsa 放你的ssh公钥
  storage:
    layout:
      name: lvm
      sizing-policy: all
  runcmd:
    - echo test

#cloud-config

autoinstall:

version: 1

refresh-installer:

update: yes

identity:

hostname: ubuntu1

username: ubuntu

password: "$6$exDY1mhS4KUYCE/2$zmn9ToZwTKLhCw.b4/b.ZRTIZM30JZ4QrOQ2aOXJ8yk96xpcCof0kxKwuX1kqLG/ygbJ1f8wxED22bTL4F46P0"

package_upgrade: false

package_update: true

packages:

- net-tools

apt:

disable_suites: [security]

primary:

- arches: [default]

uri: https://mirrors.tuna.tsinghua.edu.cn/ubuntu/

ssh:

install-server: true

allow-pw: false

authorized-keys:

- ssh-rsa 放你的ssh公钥

storage:

layout:

name: lvm

sizing-policy: all

runcmd:

- echo test

3.2.远程加载user-data

20.04.5的图形界面按F6和esc后可以编辑启动菜单的指令，改成这样就能从远程加载user-data了。

话说不知道为啥vmware粘贴多了会导致有概率内容丢失


/casper/vmlinuz initrd=/casper/initrd autoinstall ds=nocloud-net;s=http://192.168.2.159/ubuntu1/

0 1	/casper/vmlinuz initrd=/casper/initrd autoinstall ds=nocloud-net;s=http://192.168.2.159/ubuntu1/

22.04.1的那个图形界面没了，可以按c进入grub命令行，手动执行启动参数如下，注意nocloud-net后面的分号之前必须有反斜杠转义

vmware在这个命令行下的粘贴完全没问题


set gfxpayload=keep;linux /casper/vmlinuz autoinstall ds=nocloud-net\;s=http://192.168.2.159/ubuntu1/ ---;initrd /casper/initrd;boot

0 1	set gfxpayload=keep;linux /casper/vmlinuz autoinstall ds=nocloud-net\;s=http://192.168.2.159/ubuntu1/ ---;initrd /casper/initrd;boot

3.3.从网络启动并自定义内核启动参数

1.设置openwrt当DHCP server指定 tftp服务

https://openwrt.org/docs/guide-user/services/tftp.pxe-server

https://forum.openwrt.org/t/solved-dhcp-config-pxe-boot-from-external-tftp-server/5880

配置如下图

2.tftp由op、jumpbox、群晖负责（群晖的tftp测试失败，疑似是因为我的群晖不在测试网段，暂时使用op自带的tftp）

tftp内容制作如下，参考openwrt的文档修改而成。我使用的是BIOS legency引导，示例如下。


cd autoinstall
mkdir tftp
cd tftp
mkdir syslinux
cd syslinux
wget --no-check-certificate https://www.kernel.org/pub/linux/utils/boot/syslinux/syslinux-6.03.tar.gz
tar -zxvf syslinux-6.03.tar.gz
cd syslinux-6.03/bios
cp core/pxelinux.0 com32/elflink/ldlinux/ldlinux.c32 com32/menu/vesamenu.c32 com32/lib/libcom32.c32 com32/libutil/libutil.c32 ../../../../tftp

DEFAULT vesamenu.c32
PROMPT 0
TIMEOUT 30
MENU TITLE Hyl PXE-Boot Menu
# LABEL install
# 	MENU LABEL Ubuntu Live 22.04 64-Bit
# 	KERNEL casper/vmlinuz
# 	INITRD casper/initrd
# 	APPEND root=/dev/ram0  cloud-config-url=/dev/null ramdisk_size=1500000 ip=dhcp url=http://192.168.2.159/iso/ubuntu-22.04.1-live-server-amd64.iso
#     TEXT HELP
#             Starts the Ubuntu 22.04 64-Bit
#     ENDTEXT
LABEL autoinstallcd
	MENU LABEL Ubuntu Live 22.04 64-Bit CD ROM
	KERNEL /casper/vmlinuz
    APPEND cloud-config-url=/dev/null autoinstall ds=nocloud-net;s=http://192.168.2.159/ubuntu1/
	INITRD /casper/initrd
    TEXT HELP
            Starts the Ubuntu 22.04 64-Bit autoinsttall from CD ROM
    ENDTEXT
LABEL autoinstall
	MENU LABEL Ubuntu Live 22.04 64-Bit autoinstall
    KERNEL casper/vmlinuz
    APPEND root=/dev/ram0 cloud-config-url=/dev/null ramdisk_size=1500000 ip=dhcp url=http://192.168.2.159/iso/ubuntu-22.04.1-live-server-amd64.iso autoinstall ds=nocloud-net;s=http://192.168.2.159/ubuntu1/
    INITRD casper/initrd
    TEXT HELP
            Starts the Ubuntu 22.04 64-Bit autoinstall from http
    ENDTEXT

cd autoinstall

mkdir tftp

cd tftp

mkdir syslinux

cd syslinux

wget --no-check-certificate https://www.kernel.org/pub/linux/utils/boot/syslinux/syslinux-6.03.tar.gz

tar -zxvf syslinux-6.03.tar.gz

cd syslinux-6.03/bios

cp core/pxelinux.0 com32/elflink/ldlinux/ldlinux.c32 com32/menu/vesamenu.c32 com32/lib/libcom32.c32 com32/libutil/libutil.c32 ../../../../tftp

DEFAULT vesamenu.c32

PROMPT 0

TIMEOUT 30

MENU TITLE Hyl PXE-Boot Menu

# LABEL install

# MENU LABEL Ubuntu Live 22.04 64-Bit

# KERNEL casper/vmlinuz

# INITRD casper/initrd

# APPEND root=/dev/ram0 cloud-config-url=/dev/null ramdisk_size=1500000 ip=dhcp url=http://192.168.2.159/iso/ubuntu-22.04.1-live-server-amd64.iso

# TEXT HELP

# Starts the Ubuntu 22.04 64-Bit

# ENDTEXT

LABEL autoinstallcd

MENU LABEL Ubuntu Live 22.04 64-Bit CD ROM

KERNEL /casper/vmlinuz

APPEND cloud-config-url=/dev/null autoinstall ds=nocloud-net;s=http://192.168.2.159/ubuntu1/

INITRD /casper/initrd

TEXT HELP

Starts the Ubuntu 22.04 64-Bit autoinsttall from CD ROM

ENDTEXT

LABEL autoinstall

MENU LABEL Ubuntu Live 22.04 64-Bit autoinstall

KERNEL casper/vmlinuz

APPEND root=/dev/ram0 cloud-config-url=/dev/null ramdisk_size=1500000 ip=dhcp url=http://192.168.2.159/iso/ubuntu-22.04.1-live-server-amd64.iso autoinstall ds=nocloud-net;s=http://192.168.2.159/ubuntu1/

INITRD casper/initrd

TEXT HELP

Starts the Ubuntu 22.04 64-Bit autoinstall from http

ENDTEXT

注意，根据https://askubuntu.com/questions/1235723/automated-20-04-server-installation-using-pxe-and-live-server-image中描述

使用cloud-config-url=/dev/null可以降低内存的使用要求

3.autoinstall的user-data和iso文件由jumpbox或omv负责提供

如上面参数所示临时用python启动了一个http服务器放在了http://192.168.2.159/iso/ubuntu-22.04.1-live-server-amd64.iso，后面考虑换成omv或群晖提供。理论上这里过了DHCP过程因此不受测试网段的限制才对

尝试使用cdrom挂载iso但使用pxe修改内核参数从光驱执行自动安装

https://www.cnblogs.com/xyshun/p/9427472.html

https://www.dddns.icu/posts/pxe/

https://github.com/pypxe/PyPXE

https://github.com/netbootxyz/netboot.xyz

4.全自动化

1.esxi自动化由govc搞定，创建后可拿到网卡mac地址

虚机创建见前面的脚本

从cdrom安装这种方式需要修改bios启动顺序才行，disk>net>cd-rom，disk放第一个是防止安装成功后还重复执行安装，net比cd-rom靠前是因为避免自动进光盘页面无法走自动化。


govc device.boot -vm "$name" -order disk,ethernet,cdrom

0 1	govc device.boot -vm "$name" -order disk,ethernet,cdrom

2.通过mac地址可控制openwrt的tftp按mac地址返回引导选项

自动获取mac如下


mac=""
while [ x$mac = x"" ] || [ x$mac = x"null" ]
do
  mac=$(govc vm.info -json $name | jq -r ".VirtualMachines[0].Config.Hardware.Device[]|select(.Backing.DeviceName == \"$network\")|.MacAddress")
  sleep 1
done
echo "Get vm macaddress succ, $mac"

mac=""

while [ x$mac = x"" ] || [ x$mac = x"null" ]

mac=$(govc vm.info -json $name | jq -r ".VirtualMachines[0].Config.Hardware.Device[]|select(.Backing.DeviceName == \"$network\")|.MacAddress")

sleep 1

done

echo "Get vm macaddress succ, $mac"

编写脚本按mac和hostname动态生成引导菜单并上传到tftp


pxecfgname=01-${mac//:/-}
cat <<-EOF | tee /tmp/$pxecfgname
DEFAULT vesamenu.c32
PROMPT 0
TIMEOUT 30
MENU TITLE Hyl PXE-Boot Menu
LABEL autoinstallcd
	MENU LABEL Ubuntu Live 22.04 64-Bit CD auto
	KERNEL /casper/vmlinuz
    APPEND cloud-config-url=/dev/null autoinstall ds=nocloud-net;s=http://192.168.0.6:5003/$name/
	INITRD /casper/initrd
    TEXT HELP
            Starts the Ubuntu 22.04 64-Bit autoinsttall from CD ROM
    ENDTEXT
EOF
echo "Generated menu on pxelinux.cfg/$pxecfgname:"
# cat /tmp/$pxecfgname
scp /tmp/$pxecfgname root@192.168.2.1:/root/tftp/pxelinux.cfg/
rm /tmp/$pxecfgname

pxecfgname=01-${mac//:/-}

cat <<-EOF | tee /tmp/$pxecfgname

DEFAULT vesamenu.c32

PROMPT 0

TIMEOUT 30

MENU TITLE Hyl PXE-Boot Menu

LABEL autoinstallcd

MENU LABEL Ubuntu Live 22.04 64-Bit CD auto

KERNEL /casper/vmlinuz

APPEND cloud-config-url=/dev/null autoinstall ds=nocloud-net;s=http://192.168.0.6:5003/$name/

INITRD /casper/initrd

TEXT HELP

Starts the Ubuntu 22.04 64-Bit autoinsttall from CD ROM

ENDTEXT

EOF

echo "Generated menu on pxelinux.cfg/$pxecfgname:"

# cat /tmp/$pxecfgname

scp /tmp/$pxecfgname root@192.168.2.1:/root/tftp/pxelinux.cfg/

rm /tmp/$pxecfgname

注意文件名应该以01-开头，详见https://wiki.syslinux.org/wiki/index.php?title=PXELINUX

还可以用16进制的ip地址，总规则是

After attempting the file as specified in the DHCP or hardcoded options, PXELINUX will probe the following paths, prefixed with “pxelinux.cfg/“, under the initial Working Directory.

The client UUID, if provided by the PXE stack.

Note that some BIOSes do not have a valid UUID, and it might end up reporting something like all 1’s.

This value is represented in the standard UUID format using lowercase hexadecimal digits, e.g. “b8945908-d6a6-41a9-611d-74a6ab80b83d“.

The hardware type (using its ARP “htype” code) and address, all in lowercase hexadecimal with dash separators.

For example, for an Ethernet (i.e. ARP hardware type “1“) with address “88:99:AA:BB:CC:DD“, it would search for the filename “01-88-99-aa-bb-cc-dd“.

The client’s own IPv4 address in uppercase hexadecimal, followed by removing hex characters, one at a time, from the end. For example, “192.168.2.91” → “C0A8025B“.

The included program, “gethostip“, can be used to compute the hexadecimal IP address for any host.

Lowercase “default“.

3.引导选项中的iso暂时由群晖webstaion提供

静态站点，无需多言

暂时不需要了，ramdisk有问题。使用ramdisk加载iso后安装很慢，应该是我给的内存太少了。

4.引导选项中的autoinstall配置文件也由webstation提供

群晖自带的webstation可以用PHP，打算尝试使用PHP测试伪静态

使/prefix/{hostname}/user-data可按我指定的模板返回，并动态替换hostname

找到webstation的主nginxconfg文件，/var/tmp/nginx/app.d/server.webstation-vhost.conf，可以看到有include


server {
............
    include /usr/local/etc/nginx/conf.d/blablablabla/user.conf*;

}

server {

............

include /usr/local/etc/nginx/conf.d/blablablabla/user.conf*;

}

按路径新建一个 /usr/local/etc/nginx/conf.d/blablablabla/user.conf

/usr/local/etc/nginx/conf.d/a9d1c5c8-082a-482c-8fe1-73afc670ff6c/user.conf

感谢eric提供的参考


location ~ ^/(.+)/user-data$ {
        try_files $uri $uri/ /index.php?hostname=$1;
}
location ~ ^/(.+)/meta-data$ {
        return 200 "";
}

location ~ ^/(.+)/user-data$ {

try_files $uri $uri/ /index.php?hostname=$1;

}

location ~ ^/(.+)/meta-data$ {

return 200 "";

}

编写index.php如下


<?php
$hostname = $_GET['hostname'];
$file = fopen("template/user-data", "r");
while(!feof($file)){
    $buffer = $buffer.fgets($file, 4096);
}
fclose($file);
$buffer = str_replace("\${hostname}", $hostname, $buffer);
// echo $hostname;
header('Content-Type: application/octet-stream');
echo $buffer;