安装rancher2.5.6 docker 单机版

安装rancher2.6.5 docker 单节点版

准备镜像仓库

必须使用https,而且ssl证书必须有SAN(Subject Alternative Names),不然会报错x509: certificate relies on legacy Common Name field,和go version > 1.15有关,高版本弃用CN(CommonName) 字段

需要创建一个新的有效证书以包含subjectAltName属性,并且应该在使用 openssl 命令创建 SSL 自签名证书时通过指定-addext标志直接添加

1
openssl req -x509 -sha256 -nodes -days 36500 -newkey rsa:2048 -keyout harbor.key -out harbor.crt -subj "/CN=harbor.xxx.cn" -addext "subjectAltName = DNS:harbor.xxx.cn"

需要到的镜像

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
rancher/rancher:v2.6.5
rancher/shell:v0.1.16
rancher/rancher-webhook:v0.2.5
rancher/fleet:v0.3.9
rancher/gitjob:v0.1.26
rancher/fleet-agent:v0.3.9
rancher/rke-tools:v0.1.80
rancher/hyperkube:v1.23.6-rancher1
rancher/mirrored-coreos-etcd:v3.5.3
rancher/mirrored-pause:3.6
rancher/mirrored-calico-cni:v3.22.0
rancher/mirrored-calico-pod2daemon-flexvol:v3.22.0
rancher/kube-api-auth:v0.1.8
rancher/mirrored-calico-node:v3.22.0
rancher/mirrored-flannelcni-flannel:v0.17.0
rancher/mirrored-cluster-proportional-autoscaler:1.8.5
rancher/mirrored-metrics-server:v0.6.1
rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1
rancher/mirrored-coredns-coredns:1.9.0
rancher/mirrored-calico-kube-controllers:v3.22.0
rancher/nginx-ingress-controller:nginx-1.2.0-rancher1
rancher/rancher-agent:v2.6.5
1
2
# 获取某个版本的镜像
rke config --system-images --all

docker 命令

生成的证书,*key, *crt都放到/root/harbor/cert下面,然后映射到容器的/container/certs目录

1
docker run -d --name rancher2.6.5 --restart=unless-stopped -e CATTLE_SYSTEM_CATALOG=bundled -e SSL_CERT_DIR="/container/certs" -v /root/harbor/cert:/container/certs -p 3280:80 -p 3443:443 --privileged  --add-host harbor.xx.cn:10.xx.xx.205 rancher/rancher:v2.6.5

主要命令

1
2
3
4
docker run -d --restart=unless-stopped \
  -p 80:80 -p 443:443 \
  --privileged \
  rancher/rancher:latest

参数详解

加载system-charts,其实默认已经在rancher镜像里面,这个变量告诉 Rancher 使用本地的,而不是尝试从 GitHub 获取它们。

1
-e CATTLE_SYSTEM_CATALOG=bundled

Custom CA Root Certificates,参考这里面的 docker配置,这里配置Rancher 需要访问的服务需要用的自签名证书,不然会报错x509: certificate signed by unknown authority

1
-e SSL_CERT_DIR="/container/certs" -v /root/harbor/cert:/container/certs

配置私有仓库

根据这个Private Registry Configuration, 进到容器里面配置

1
2
3
docker exec -it rancher2.6.5 bash

vim /etc/rancher/k3s/registries.yaml

registries.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mirrors:
  "docker.io":
    endpoint:
      - "https://harbor.xxx.cn:4443"
configs:
  "docker.io":
    auth:
      username: admin
      password: Harbor12345
    tls:
      key_file: /container/certs/harbor.key
      cert_file: /container/certs/harbor.crt
      #ca_file: /container/certs/ca.crt
      insecure_skip_verify: true

重启容器

1
docker restart rancher2.6.5

重新进入容器,然后配置hosts,不然使用域名解析不了,不是配置coredns

1
echo "xxx.xxx.xxx.205 harbor.xxx.cn" >> c

重启后,要等containd启动,检查containd更新的配置

1
cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml

测试拉镜像

1
crictl pull rancher/shell:v0.1.16

配置system-default-registry

image-20231204162913339

高可用rancher

安装k3s

https://github.com/k3s-io/k3s/releases/tag/v1.23.6%2Bk3s1 里面的**k3s-images.txt **可以看k3s需要的镜像

使用v1.23.6+k3s1创建集群

1
K3S_TOKEN='ccd0f7ee6cf2f50f8563e434767b6488' INSTALL_K3S_SKIP_DOWNLOAD=true INSTALL_K3S_EXEC='server  --tls-san [IP] --node-external-ip [IP] --docker --cluster-init' INSTALL_K3S_VERSION='v1.23.6+k3s1' ./install.sh

install.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
#!/bin/sh
set -e
set -o noglob

# Usage:
#   curl ... | ENV_VAR=... sh -
#       or
#   ENV_VAR=... ./install.sh
#
# Example:
#   Installing a server without traefik:
#     curl ... | INSTALL_K3S_EXEC="--disable=traefik" sh -
#   Installing an agent to point at a server:
#     curl ... | K3S_TOKEN=xxx K3S_URL=https://server-url:6443 sh -
#
# Environment variables:
#   - K3S_*
#     Environment variables which begin with K3S_ will be preserved for the
#     systemd service to use. Setting K3S_URL without explicitly setting
#     a systemd exec command will default the command to "agent", and we
#     enforce that K3S_TOKEN or K3S_CLUSTER_SECRET is also set.
#
#   - INSTALL_K3S_SKIP_DOWNLOAD
#     If set to true will not download k3s hash or binary.
#
#   - INSTALL_K3S_FORCE_RESTART
#     If set to true will always restart the K3s service
#
#   - INSTALL_K3S_SYMLINK
#     If set to 'skip' will not create symlinks, 'force' will overwrite,
#     default will symlink if command does not exist in path.
#
#   - INSTALL_K3S_SKIP_ENABLE
#     If set to true will not enable or start k3s service.
#
#   - INSTALL_K3S_SKIP_START
#     If set to true will not start k3s service.
#
#   - INSTALL_K3S_VERSION
#     Version of k3s to download from github. Will attempt to download from the
#     stable channel if not specified.
#
#   - INSTALL_K3S_COMMIT
#     Commit of k3s to download from temporary cloud storage.
#     * (for developer & QA use)
#
#   - INSTALL_K3S_BIN_DIR
#     Directory to install k3s binary, links, and uninstall script to, or use
#     /usr/local/bin as the default
#
#   - INSTALL_K3S_BIN_DIR_READ_ONLY
#     If set to true will not write files to INSTALL_K3S_BIN_DIR, forces
#     setting INSTALL_K3S_SKIP_DOWNLOAD=true
#
#   - INSTALL_K3S_SYSTEMD_DIR
#     Directory to install systemd service and environment files to, or use
#     /etc/systemd/system as the default
#
#   - INSTALL_K3S_EXEC or script arguments
#     Command with flags to use for launching k3s in the systemd service, if
#     the command is not specified will default to "agent" if K3S_URL is set
#     or "server" if not. The final systemd command resolves to a combination
#     of EXEC and script args ($@).
#
#     The following commands result in the same behavior:
#       curl ... | INSTALL_K3S_EXEC="--disable=traefik" sh -s -
#       curl ... | INSTALL_K3S_EXEC="server --disable=traefik" sh -s -
#       curl ... | INSTALL_K3S_EXEC="server" sh -s - --disable=traefik
#       curl ... | sh -s - server --disable=traefik
#       curl ... | sh -s - --disable=traefik
#
#   - INSTALL_K3S_NAME
#     Name of systemd service to create, will default from the k3s exec command
#     if not specified. If specified the name will be prefixed with 'k3s-'.
#
#   - INSTALL_K3S_TYPE
#     Type of systemd service to create, will default from the k3s exec command
#     if not specified.
#
#   - INSTALL_K3S_SELINUX_WARN
#     If set to true will continue if k3s-selinux policy is not found.
#
#   - INSTALL_K3S_SKIP_SELINUX_RPM
#     If set to true will skip automatic installation of the k3s RPM.
#
#   - INSTALL_K3S_CHANNEL_URL
#     Channel URL for fetching k3s download URL.
#     Defaults to 'https://update.k3s.io/v1-release/channels'.
#
#   - INSTALL_K3S_CHANNEL
#     Channel to use for fetching k3s download URL.
#     Defaults to 'stable'.

GITHUB_URL=https://github.com/k3s-io/k3s/releases
STORAGE_URL=https://storage.googleapis.com/k3s-ci-builds
DOWNLOADER=
BIN_URL="${MINIO_BASE_URL}/agent/k3s/${INSTALL_K3S_VERSION}/k3s"

# --- helper functions for logs ---
info()
{
    echo '[INFO] ' "$@"
}
warn()
{
    echo '[WARN] ' "$@" >&2
}
fatal()
{
    echo '[ERROR] ' "$@" >&2
    exit 1
}

# --- fatal if no systemd or openrc ---
verify_system() {
    if [ -x /sbin/openrc-run ]; then
        HAS_OPENRC=true
        return
    fi
    if [ -x /bin/systemctl ] || type systemctl > /dev/null 2>&1; then
        HAS_SYSTEMD=true
        return
    fi
    fatal 'Can not find systemd or openrc to use as a process supervisor for k3s'
}

# --- add quotes to command arguments ---
quote() {
    for arg in "$@"; do
        printf '%s\n' "$arg" | sed "s/'/'\\''/g;1s/^/'/;\$s/\$/'/"
    done
}

# --- add indentation and trailing slash to quoted args ---
quote_indent() {
    printf ' \\n'
    for arg in "$@"; do
        printf '\t%s \\n' "$(quote "$arg")"
    done
}

# --- escape most punctuation characters, except quotes, forward slash, and space ---
escape() {
    printf '%s' "$@" | sed -e 's/\([][!#$%&()*;<=>?\_`{|}]\)/\/g;'
}

# --- escape double quotes ---
escape_dq() {
    printf '%s' "$@" | sed -e 's/"/\"/g'
}

# --- ensures $K3S_URL is empty or begins with https://, exiting fatally otherwise ---
verify_k3s_url() {
    case "${K3S_URL}" in
        "")
            ;;
        https://*)
            ;;
        *)
            fatal "Only https:// URLs are supported for K3S_URL (have ${K3S_URL})"
            ;;
    esac
}

# --- define needed environment variables ---
setup_env() {
    # --- use command args if passed or create default ---
    case "$1" in
        # --- if we only have flags discover if command should be server or agent ---
        (-*|"")
            if [ -z "${K3S_URL}" ]; then
                CMD_K3S=server
            else
                if [ -z "${K3S_TOKEN}" ] && [ -z "${K3S_TOKEN_FILE}" ] && [ -z "${K3S_CLUSTER_SECRET}" ]; then
                    fatal "Defaulted k3s exec command to 'agent' because K3S_URL is defined, but K3S_TOKEN, K3S_TOKEN_FILE or K3S_CLUSTER_SECRET is not defined."
                fi
                CMD_K3S=agent
            fi
        ;;
        # --- command is provided ---
        (*)
            CMD_K3S=$1
            shift
        ;;
    esac

    verify_k3s_url

    CMD_K3S_EXEC="${CMD_K3S}$(quote_indent "$@")"

    # --- use systemd name if defined or create default ---
    if [ -n "${INSTALL_K3S_NAME}" ]; then
        SYSTEM_NAME=k3s-${INSTALL_K3S_NAME}
    else
        if [ "${CMD_K3S}" = server ]; then
            SYSTEM_NAME=k3s
        else
            SYSTEM_NAME=k3s-${CMD_K3S}
        fi
    fi

    # --- check for invalid characters in system name ---
    valid_chars=$(printf '%s' "${SYSTEM_NAME}" | sed -e 's/[][!#$%&()*;<=>?\_`{|}/[:space:]]/^/g;' )
    if [ "${SYSTEM_NAME}" != "${valid_chars}"  ]; then
        invalid_chars=$(printf '%s' "${valid_chars}" | sed -e 's/[^^]/ /g')
        fatal "Invalid characters for system name:
            ${SYSTEM_NAME}
            ${invalid_chars}"
    fi

    # --- use sudo if we are not already root ---
    SUDO=sudo
    if [ $(id -u) -eq 0 ]; then
        SUDO=
    fi

    # --- use systemd type if defined or create default ---
    if [ -n "${INSTALL_K3S_TYPE}" ]; then
        SYSTEMD_TYPE=${INSTALL_K3S_TYPE}
    else
        if [ "${CMD_K3S}" = server ]; then
            SYSTEMD_TYPE=notify
        else
            SYSTEMD_TYPE=exec
        fi
    fi

    # --- use binary install directory if defined or create default ---
    if [ -n "${INSTALL_K3S_BIN_DIR}" ]; then
        BIN_DIR=${INSTALL_K3S_BIN_DIR}
    else
        # --- use /usr/local/bin if root can write to it, otherwise use /opt/bin if it exists
        BIN_DIR=/usr/local/bin
        if ! $SUDO sh -c "touch ${BIN_DIR}/k3s-ro-test && rm -rf ${BIN_DIR}/k3s-ro-test"; then
            if [ -d /opt/bin ]; then
                BIN_DIR=/opt/bin
            fi
        fi
    fi

    # --- use systemd directory if defined or create default ---
    if [ -n "${INSTALL_K3S_SYSTEMD_DIR}" ]; then
        SYSTEMD_DIR="${INSTALL_K3S_SYSTEMD_DIR}"
    else
        SYSTEMD_DIR=/etc/systemd/system
    fi

    # --- set related files from system name ---
    SERVICE_K3S=${SYSTEM_NAME}.service
    UNINSTALL_K3S_SH=${UNINSTALL_K3S_SH:-${BIN_DIR}/${SYSTEM_NAME}-uninstall.sh}
    KILLALL_K3S_SH=${KILLALL_K3S_SH:-${BIN_DIR}/k3s-killall.sh}

    # --- use service or environment location depending on systemd/openrc ---
    if [ "${HAS_SYSTEMD}" = true ]; then
        FILE_K3S_SERVICE=${SYSTEMD_DIR}/${SERVICE_K3S}
        FILE_K3S_ENV=${SYSTEMD_DIR}/${SERVICE_K3S}.env
    elif [ "${HAS_OPENRC}" = true ]; then
        $SUDO mkdir -p /etc/rancher/k3s
        FILE_K3S_SERVICE=/etc/init.d/${SYSTEM_NAME}
        FILE_K3S_ENV=/etc/rancher/k3s/${SYSTEM_NAME}.env
    fi

    # --- get hash of config & exec for currently installed k3s ---
    PRE_INSTALL_HASHES=$(get_installed_hashes)

    # --- if bin directory is read only skip download ---
    if [ "${INSTALL_K3S_BIN_DIR_READ_ONLY}" = true ]; then
        INSTALL_K3S_SKIP_DOWNLOAD=true
    fi

    # --- setup channel values
    INSTALL_K3S_CHANNEL_URL=${INSTALL_K3S_CHANNEL_URL:-'https://update.k3s.io/v1-release/channels'}
    INSTALL_K3S_CHANNEL=${INSTALL_K3S_CHANNEL:-'stable'}
}

# --- check if skip download environment variable set ---
can_skip_download() {
    if [ "${INSTALL_K3S_SKIP_DOWNLOAD}" != true ]; then
        return 1
    fi
}

# --- verify an executable k3s binary is installed ---
verify_k3s_is_executable() {
    if [ ! -x ${BIN_DIR}/k3s ]; then
        fatal "Executable k3s binary not found at ${BIN_DIR}/k3s"
    fi
}

# --- set arch and suffix, fatal if architecture not supported ---
setup_verify_arch() {
    if [ -z "$ARCH" ]; then
        ARCH=$(uname -m)
    fi
    case $ARCH in
        amd64)
            ARCH=amd64
            SUFFIX=
            ;;
        x86_64)
            ARCH=amd64
            SUFFIX=
            ;;
        arm64)
            ARCH=arm64
            SUFFIX=-${ARCH}
            ;;
        s390x)
            ARCH=s390x
            SUFFIX=-${ARCH}
            ;;
        aarch64)
            ARCH=arm64
            SUFFIX=-${ARCH}
            ;;
        arm*)
            ARCH=arm
            SUFFIX=-${ARCH}hf
            ;;
        *)
            fatal "Unsupported architecture $ARCH"
    esac
}

# --- verify existence of network downloader executable ---
verify_downloader() {
    # Return failure if it doesn't exist or is no executable
    [ -x "$(command -v $1)" ] || return 1

    # Set verified executable as our downloader program and return success
    DOWNLOADER=$1
    return 0
}

# --- create temporary directory and cleanup when done ---
setup_tmp() {
    TMP_DIR=$(mktemp -d -t k3s-install.XXXXXXXXXX)
    TMP_HASH=${TMP_DIR}/k3s.hash
    TMP_BIN=${TMP_DIR}/k3s.bin
    cleanup() {
        code=$?
        set +e
        trap - EXIT
        rm -rf ${TMP_DIR}
        exit $code
    }
    trap cleanup INT EXIT
}

# --- use desired k3s version if defined or find version from channel ---
get_release_version() {
    if [ -n "${INSTALL_K3S_COMMIT}" ]; then
        VERSION_K3S="commit ${INSTALL_K3S_COMMIT}"
    elif [ -n "${INSTALL_K3S_VERSION}" ]; then
        VERSION_K3S=${INSTALL_K3S_VERSION}
    else
        info "Finding release for channel ${INSTALL_K3S_CHANNEL}"
        version_url="${INSTALL_K3S_CHANNEL_URL}/${INSTALL_K3S_CHANNEL}"
        case $DOWNLOADER in
            curl)
                VERSION_K3S=$(curl -w '%{url_effective}' -L -s -S ${version_url} -o /dev/null | sed -e 's|.*/||')
                ;;
            wget)
                VERSION_K3S=$(wget -SqO /dev/null ${version_url} 2>&1 | grep -i Location | sed -e 's|.*/||')
                ;;
            *)
                fatal "Incorrect downloader executable '$DOWNLOADER'"
                ;;
        esac
    fi
    info "Using ${VERSION_K3S} as release"
}

# --- download from github url ---
download() {
    [ $# -eq 2 ] || fatal 'download needs exactly 2 arguments'

    case $DOWNLOADER in
        curl)
            curl -o $1 -sfL $2
            ;;
        wget)
            wget -qO $1 $2
            ;;
        *)
            fatal "Incorrect executable '$DOWNLOADER'"
            ;;
    esac

    # Abort if download command failed
    [ $? -eq 0 ] || fatal 'Download failed'
}

# --- download hash from github url ---
download_hash() {
    if [ -n "${INSTALL_K3S_COMMIT}" ]; then
        HASH_URL=${STORAGE_URL}/k3s${SUFFIX}-${INSTALL_K3S_COMMIT}.sha256sum
    else
        HASH_URL=${GITHUB_URL}/download/${VERSION_K3S}/sha256sum-${ARCH}.txt
    fi
    info "Downloading hash ${HASH_URL}"
    download ${TMP_HASH} ${HASH_URL}
    HASH_EXPECTED=$(grep " k3s${SUFFIX}$" ${TMP_HASH})
    HASH_EXPECTED=${HASH_EXPECTED%%[[:blank:]]*}
}

# --- check hash against installed version ---
installed_hash_matches() {
    if [ -x ${BIN_DIR}/k3s ]; then
        HASH_INSTALLED=$(sha256sum ${BIN_DIR}/k3s)
        HASH_INSTALLED=${HASH_INSTALLED%%[[:blank:]]*}
        if [ "${HASH_EXPECTED}" = "${HASH_INSTALLED}" ]; then
            return
        fi
    fi
    return 1
}

# --- download binary from github url ---
download_binary() {
#    if [ -n "${INSTALL_K3S_COMMIT}" ]; then
#        BIN_URL=${STORAGE_URL}/k3s${SUFFIX}-${INSTALL_K3S_COMMIT}
#    else
#        BIN_URL=${GITHUB_URL}/download/${VERSION_K3S}/k3s${SUFFIX}
#    fi
    info "Downloading binary ${BIN_URL}"
    download ${TMP_BIN} ${BIN_URL}
}

# --- verify downloaded binary hash ---
verify_binary() {
    info "Verifying binary download"
    HASH_BIN=$(sha256sum ${TMP_BIN})
    HASH_BIN=${HASH_BIN%%[[:blank:]]*}
    if [ "${HASH_EXPECTED}" != "${HASH_BIN}" ]; then
        fatal "Download sha256 does not match ${HASH_EXPECTED}, got ${HASH_BIN}"
    fi
}

# --- setup permissions and move binary to system directory ---
setup_binary() {
    chmod 755 ${TMP_BIN}
    info "Installing k3s to ${BIN_DIR}/k3s"
    $SUDO chown root:root ${TMP_BIN}
    $SUDO mv -f ${TMP_BIN} ${BIN_DIR}/k3s
}

# --- setup selinux policy ---
setup_selinux() {
    case ${INSTALL_K3S_CHANNEL} in
        *testing)
            rpm_channel=testing
            ;;
        *latest)
            rpm_channel=latest
            ;;
        *)
            rpm_channel=stable
            ;;
    esac

    rpm_site="rpm.rancher.io"
    if [ "${rpm_channel}" = "testing" ]; then
        rpm_site="rpm-testing.rancher.io"
    fi

    [ -r /etc/os-release ] && . /etc/os-release
    if [ "${ID_LIKE%%[ ]*}" = "suse" ]; then
        rpm_target=sle
        rpm_site_infix=microos
        package_installer=zypper
    elif [ "${VERSION_ID%%.*}" = "7" ]; then
        rpm_target=el7
        rpm_site_infix=centos/7
        package_installer=yum
    else
        rpm_target=el8
        rpm_site_infix=centos/8
        package_installer=yum
    fi

    if [ "${package_installer}" = "yum" ] && [ -x /usr/bin/dnf ]; then
        package_installer=dnf
    fi

    policy_hint="please install:
    ${package_installer} install -y container-selinux
    ${package_installer} install -y https://${rpm_site}/k3s/${rpm_channel}/common/${rpm_site_infix}/noarch/k3s-selinux-0.4-1.${rpm_target}.noarch.rpm
"

    if [ "$INSTALL_K3S_SKIP_SELINUX_RPM" = true ] || can_skip_download || [ ! -d /usr/share/selinux ]; then
        info "Skipping installation of SELinux RPM"
    elif  [ "${ID_LIKE:-}" != coreos ] && [ "${VARIANT_ID:-}" != coreos ]; then
        install_selinux_rpm ${rpm_site} ${rpm_channel} ${rpm_target} ${rpm_site_infix}
    fi

    policy_error=fatal
    if [ "$INSTALL_K3S_SELINUX_WARN" = true ] || [ "${ID_LIKE:-}" = coreos ] || [ "${VARIANT_ID:-}" = coreos ]; then
        policy_error=warn
    fi

    if ! $SUDO chcon -u system_u -r object_r -t container_runtime_exec_t ${BIN_DIR}/k3s >/dev/null 2>&1; then
        if $SUDO grep '^\s*SELINUX=enforcing' /etc/selinux/config >/dev/null 2>&1; then
            $policy_error "Failed to apply container_runtime_exec_t to ${BIN_DIR}/k3s, ${policy_hint}"
        fi
    elif [ ! -f /usr/share/selinux/packages/k3s.pp ]; then
        if [ -x /usr/sbin/transactional-update ]; then
            warn "Please reboot your machine to activate the changes and avoid data loss."
        else
            $policy_error "Failed to find the k3s-selinux policy, ${policy_hint}"
        fi
    fi
}

install_selinux_rpm() {
    if [ -r /etc/redhat-release ] || [ -r /etc/centos-release ] || [ -r /etc/oracle-release ] || [ "${ID_LIKE%%[ ]*}" = "suse" ]; then
        repodir=/etc/yum.repos.d
        if [ -d /etc/zypp/repos.d ]; then
            repodir=/etc/zypp/repos.d
        fi
        set +o noglob
        $SUDO rm -f ${repodir}/rancher-k3s-common*.repo
        set -o noglob
        if [ -r /etc/redhat-release ] && [ "${3}" = "el7" ]; then
            $SUDO yum install -y yum-utils
            $SUDO yum-config-manager --enable rhel-7-server-extras-rpms
        fi
        $SUDO tee ${repodir}/rancher-k3s-common.repo >/dev/null << EOF
[rancher-k3s-common-${2}]
name=Rancher K3s Common (${2})
baseurl=https://${1}/k3s/${2}/common/${4}/noarch
enabled=1
gpgcheck=1
repo_gpgcheck=0
gpgkey=https://${1}/public.key
EOF
        case ${3} in
        sle)
            rpm_installer="zypper --gpg-auto-import-keys"
            if [ "${TRANSACTIONAL_UPDATE=false}" != "true" ] && [ -x /usr/sbin/transactional-update ]; then
                rpm_installer="transactional-update --no-selfupdate -d run ${rpm_installer}"
                : "${INSTALL_K3S_SKIP_START:=true}"
            fi
            ;;
        *)
            rpm_installer="yum"
            ;;
        esac
        if [ "${rpm_installer}" = "yum" ] && [ -x /usr/bin/dnf ]; then
            rpm_installer=dnf
        fi
        # shellcheck disable=SC2086
        $SUDO ${rpm_installer} install -y "k3s-selinux"
    fi
    return
}

# --- download and verify k3s ---
download_and_verify() {
    if can_skip_download; then
       info 'Skipping k3s download and verify'
       verify_k3s_is_executable
       return
    fi

    setup_verify_arch
    verify_downloader curl || verify_downloader wget || fatal 'Can not find curl or wget for downloading files'
    setup_tmp
#    get_release_version
#    download_hash

    if installed_hash_matches; then
        info 'Skipping binary downloaded, installed k3s matches hash'
        return
    fi

    download_binary
#    verify_binary
    setup_binary
}

# --- add additional utility links ---
create_symlinks() {
    [ "${INSTALL_K3S_BIN_DIR_READ_ONLY}" = true ] && return
    [ "${INSTALL_K3S_SYMLINK}" = skip ] && return

    for cmd in kubectl crictl ctr; do
        if [ ! -e ${BIN_DIR}/${cmd} ] || [ "${INSTALL_K3S_SYMLINK}" = force ]; then
            which_cmd=$(command -v ${cmd} 2>/dev/null || true)
            if [ -z "${which_cmd}" ] || [ "${INSTALL_K3S_SYMLINK}" = force ]; then
                info "Creating ${BIN_DIR}/${cmd} symlink to k3s"
                $SUDO ln -sf k3s ${BIN_DIR}/${cmd}
            else
                info "Skipping ${BIN_DIR}/${cmd} symlink to k3s, command exists in PATH at ${which_cmd}"
            fi
        else
            info "Skipping ${BIN_DIR}/${cmd} symlink to k3s, already exists"
        fi
    done
}

# --- create killall script ---
create_killall() {
    [ "${INSTALL_K3S_BIN_DIR_READ_ONLY}" = true ] && return
    info "Creating killall script ${KILLALL_K3S_SH}"
    $SUDO tee ${KILLALL_K3S_SH} >/dev/null << \EOF
#!/bin/sh
[ $(id -u) -eq 0 ] || exec sudo $0 $@

for bin in /var/lib/rancher/k3s/data/**/bin/; do
    [ -d $bin ] && export PATH=$PATH:$bin:$bin/aux
done

set -x

for service in /etc/systemd/system/k3s*.service; do
    [ -s $service ] && systemctl stop $(basename $service)
done

for service in /etc/init.d/k3s*; do
    [ -x $service ] && $service stop
done

pschildren() {
    ps -e -o ppid= -o pid= | \
    sed -e 's/^\s*//g; s/\s\s*/\t/g;' | \
    grep -w "^$1" | \
    cut -f2
}

pstree() {
    for pid in $@; do
        echo $pid
        for child in $(pschildren $pid); do
            pstree $child
        done
    done
}

killtree() {
    kill -9 $(
        { set +x; } 2>/dev/null;
        pstree $@;
        set -x;
    ) 2>/dev/null
}

getshims() {
    ps -e -o pid= -o args= | sed -e 's/^ *//; s/\s\s*/\t/;' | grep -w 'k3s/data/[^/]*/bin/containerd-shim' | cut -f1
}

killtree $({ set +x; } 2>/dev/null; getshims; set -x)

do_unmount_and_remove() {
    set +x
    while read -r _ path _; do
        case "$path" in $1*) echo "$path" ;; esac
    done < /proc/self/mounts | sort -r | xargs -r -t -n 1 sh -c 'umount "$0" && rm -rf "$0"'
    set -x
}

do_unmount_and_remove '/run/k3s'
do_unmount_and_remove '/var/lib/rancher/k3s'
do_unmount_and_remove '/var/lib/kubelet/pods'
do_unmount_and_remove '/var/lib/kubelet/plugins'
do_unmount_and_remove '/run/netns/cni-'

# Remove CNI namespaces
ip netns show 2>/dev/null | grep cni- | xargs -r -t -n 1 ip netns delete

# Delete network interface(s) that match 'master cni0'
ip link show 2>/dev/null | grep 'master cni0' | while read ignore iface ignore; do
    iface=${iface%%@*}
    [ -z "$iface" ] || ip link delete $iface
done
ip link delete cni0
ip link delete flannel.1
ip link delete flannel-v6.1
ip link delete kube-ipvs0
rm -rf /var/lib/cni/
iptables-save | grep -v KUBE- | grep -v CNI- | grep -v flannel | iptables-restore
ip6tables-save | grep -v KUBE- | grep -v CNI- | grep -v flannel | ip6tables-restore
EOF
    $SUDO chmod 755 ${KILLALL_K3S_SH}
    $SUDO chown root:root ${KILLALL_K3S_SH}
}

# --- create uninstall script ---
create_uninstall() {
    [ "${INSTALL_K3S_BIN_DIR_READ_ONLY}" = true ] && return
    info "Creating uninstall script ${UNINSTALL_K3S_SH}"
    $SUDO tee ${UNINSTALL_K3S_SH} >/dev/null << EOF
#!/bin/sh
set -x
[ \$(id -u) -eq 0 ] || exec sudo \$0 \$@

${KILLALL_K3S_SH}

if command -v systemctl; then
    systemctl disable ${SYSTEM_NAME}
    systemctl reset-failed ${SYSTEM_NAME}
    systemctl daemon-reload
fi
if command -v rc-update; then
    rc-update delete ${SYSTEM_NAME} default
fi

rm -f ${FILE_K3S_SERVICE}
rm -f ${FILE_K3S_ENV}

remove_uninstall() {
    rm -f ${UNINSTALL_K3S_SH}
}
trap remove_uninstall EXIT

if (ls ${SYSTEMD_DIR}/k3s*.service || ls /etc/init.d/k3s*) >/dev/null 2>&1; then
    set +x; echo 'Additional k3s services installed, skipping uninstall of k3s'; set -x
    exit
fi

for cmd in kubectl crictl ctr; do
    if [ -L ${BIN_DIR}/\$cmd ]; then
        rm -f ${BIN_DIR}/\$cmd
    fi
done

rm -rf /etc/rancher/k3s
rm -rf /run/k3s
rm -rf /run/flannel
rm -rf /var/lib/rancher/k3s
rm -rf /var/lib/kubelet
rm -f ${BIN_DIR}/k3s
rm -f ${KILLALL_K3S_SH}

if type yum >/dev/null 2>&1; then
    yum remove -y k3s-selinux
    rm -f /etc/yum.repos.d/rancher-k3s-common*.repo
elif type zypper >/dev/null 2>&1; then
    uninstall_cmd="zypper remove -y k3s-selinux"
    if [ "\${TRANSACTIONAL_UPDATE=false}" != "true" ] && [ -x /usr/sbin/transactional-update ]; then
        uninstall_cmd="transactional-update --no-selfupdate -d run \$uninstall_cmd"
    fi
    \$uninstall_cmd
    rm -f /etc/zypp/repos.d/rancher-k3s-common*.repo
fi
EOF
    $SUDO chmod 755 ${UNINSTALL_K3S_SH}
    $SUDO chown root:root ${UNINSTALL_K3S_SH}
}

# --- disable current service if loaded --
systemd_disable() {
    $SUDO systemctl disable ${SYSTEM_NAME} >/dev/null 2>&1 || true
    $SUDO rm -f /etc/systemd/system/${SERVICE_K3S} || true
    $SUDO rm -f /etc/systemd/system/${SERVICE_K3S}.env || true
}

# --- capture current env and create file containing k3s_ variables ---
create_env_file() {
    info "env: Creating environment file ${FILE_K3S_ENV}"
    $SUDO touch ${FILE_K3S_ENV}
    $SUDO chmod 0600 ${FILE_K3S_ENV}
    sh -c export | while read x v; do echo $v; done | grep -E '^(K3S|CONTAINERD)_' | $SUDO tee ${FILE_K3S_ENV} >/dev/null
    sh -c export | while read x v; do echo $v; done | grep -Ei '^(NO|HTTP|HTTPS)_PROXY' | $SUDO tee -a ${FILE_K3S_ENV} >/dev/null
}

# --- write systemd service file ---
create_systemd_service_file() {
    info "systemd: Creating service file ${FILE_K3S_SERVICE}"
    $SUDO tee ${FILE_K3S_SERVICE} >/dev/null << EOF
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=${SYSTEMD_TYPE}
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-${FILE_K3S_ENV}
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=${BIN_DIR}/k3s \
    ${CMD_K3S_EXEC}

EOF
}

# --- write openrc service file ---
create_openrc_service_file() {
    LOG_FILE=/var/log/${SYSTEM_NAME}.log

    info "openrc: Creating service file ${FILE_K3S_SERVICE}"
    $SUDO tee ${FILE_K3S_SERVICE} >/dev/null << EOF
#!/sbin/openrc-run

depend() {
    after network-online
    want cgroups
}

start_pre() {
    rm -f /tmp/k3s.*
}

supervisor=supervise-daemon
name=${SYSTEM_NAME}
command="${BIN_DIR}/k3s"
command_args="$(escape_dq "${CMD_K3S_EXEC}")
    >>${LOG_FILE} 2>&1"

output_log=${LOG_FILE}
error_log=${LOG_FILE}

pidfile="/var/run/${SYSTEM_NAME}.pid"
respawn_delay=5
respawn_max=0

set -o allexport
if [ -f /etc/environment ]; then source /etc/environment; fi
if [ -f ${FILE_K3S_ENV} ]; then source ${FILE_K3S_ENV}; fi
set +o allexport
EOF
    $SUDO chmod 0755 ${FILE_K3S_SERVICE}

    $SUDO tee /etc/logrotate.d/${SYSTEM_NAME} >/dev/null << EOF
${LOG_FILE} {
        missingok
        notifempty
        copytruncate
}
EOF
}

# --- write systemd or openrc service file ---
create_service_file() {
    [ "${HAS_SYSTEMD}" = true ] && create_systemd_service_file
    [ "${HAS_OPENRC}" = true ] && create_openrc_service_file
    return 0
}

# --- get hashes of the current k3s bin and service files
get_installed_hashes() {
    $SUDO sha256sum ${BIN_DIR}/k3s ${FILE_K3S_SERVICE} ${FILE_K3S_ENV} 2>&1 || true
}

# --- enable and start systemd service ---
systemd_enable() {
    info "systemd: Enabling ${SYSTEM_NAME} unit"
    $SUDO systemctl enable ${FILE_K3S_SERVICE} >/dev/null
    $SUDO systemctl daemon-reload >/dev/null
}

systemd_start() {
    info "systemd: Starting ${SYSTEM_NAME}"
    $SUDO systemctl restart ${SYSTEM_NAME}
}

# --- enable and start openrc service ---
openrc_enable() {
    info "openrc: Enabling ${SYSTEM_NAME} service for default runlevel"
    $SUDO rc-update add ${SYSTEM_NAME} default >/dev/null
}

openrc_start() {
    info "openrc: Starting ${SYSTEM_NAME}"
    $SUDO ${FILE_K3S_SERVICE} restart
}

# --- startup systemd or openrc service ---
service_enable_and_start() {
    if [ -f "/proc/cgroups" ] && [ "$(grep memory /proc/cgroups | while read -r n n n enabled; do echo $enabled; done)" -eq 0 ];
    then
        info 'Failed to find memory cgroup, you may need to add "cgroup_memory=1 cgroup_enable=memory" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)'
    fi

    [ "${INSTALL_K3S_SKIP_ENABLE}" = true ] && return

    [ "${HAS_SYSTEMD}" = true ] && systemd_enable
    [ "${HAS_OPENRC}" = true ] && openrc_enable

    [ "${INSTALL_K3S_SKIP_START}" = true ] && return

    POST_INSTALL_HASHES=$(get_installed_hashes)
    if [ "${PRE_INSTALL_HASHES}" = "${POST_INSTALL_HASHES}" ] && [ "${INSTALL_K3S_FORCE_RESTART}" != true ]; then
        info 'No change detected so skipping service start'
        return
    fi

    [ "${HAS_SYSTEMD}" = true ] && systemd_start
    [ "${HAS_OPENRC}" = true ] && openrc_start
    return 0
}

# --- re-evaluate args to include env command ---
eval set -- $(escape "${INSTALL_K3S_EXEC}") $(quote "$@")

# --- run the install process --
{
    verify_system
    setup_env "$@"
    download_and_verify
#    setup_selinux
    create_symlinks
    create_killall
    create_uninstall
    systemd_disable
    create_env_file
    create_service_file
    service_enable_and_start
}

创建kubeconfig,不然helm会报错

1
2
mkdir -p ~/.kube
cp /etc/rancher/k3s/k3s.yaml ~/.kube/config

配置harbor coredns 持久化

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom  # 必须这个名字
  namespace: kube-system
data:
    xxx.server: | # 必须.server结尾
          example1.org example3.org  {
              hosts {
                   127.0.0.1 example1.org example3.org
                   fallthrough
              }
          }

重启coredns读配置

1
1
kubectl rollout restart deployment  coredns -n kube-system

安装helm

https://helm.sh/docs/intro/install/

安装cent-manager

cert-manager镜像,docker load 到每个节点,不然要改cert-manager helm chart

1
2
3
4
quay.io/jetstack/cert-manager-cainjector:v1.7.1
quay.io/jetstack/cert-manager-controller:v1.7.1
quay.io/jetstack/cert-manager-ctl:v1.7.1
quay.io/jetstack/cert-manager-webhook:v1.7.1

下载 参考

1
2
3
4
5
6
7
8
9
10
11
12
# If you have installed the CRDs manually instead of with the `--set installCRDs=true` option added to your Helm install command, you should upgrade your CRD resources before upgrading the Helm chart:

# 下载crd
https://github.com/jetstack/cert-manager/releases/download/v1.7.1/cert-manager.crds.yaml

# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io

# Update your local Helm chart repository cache
helm repo update

helm fetch jetstack/cert-manager --version 1.7.1

安装

1
2
3
4
5
6
kubectl apply -f cert-manager.crds.yaml

helm install cert-manager cert-manager-v1.7.1.tgz \
  --namespace cert-manager \
  --create-namespace \
  --version v1.7.1

验证

1
2
3
4
5
6
kubectl get pods --namespace cert-manager

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-5c6866597-zw7kh               1/1     Running   0          2m
cert-manager-cainjector-577f6d9fd7-tr77l   1/1     Running   0          2m
cert-manager-webhook-787858fcdb-nlzsq      1/1     Running   0          2m

安装rancher

下载

1
2
3
4
5
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable

helm repo update

helm fetch rancher-stable/rancher --version 2.6.5

安装,选项详解

1
2
3
4
5
6
7
8
9
kubectl create  namespace cattle-system

helm install rancher rancher-2.6.5.tgz \
  --namespace cattle-system \
  --set hostname=205.xxx.cn \
  --set bootstrapPassword=fafafa \
  --set replicas=1 \
  --set useBundledSystemChart=true \
  --set additionalTrustedCAs=true
  • hostname rancher 域名

  • bootstrapPassword:登录密码

  • replicas: rancher副本数

  • useBundledSystemChart: 是否使用system-charts packaged with Rancher server

  • additionalTrustedCAs:信任第三方证书(harbor)配合使用

    1
    kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem
    

    ca-additional.pem是harbor的自签名cert证书,要重命名为ca-additional.pem

安装使用自签名SAN证书

必须使用CA签名,不然纳管agent报错

1
Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get \"https://ums.xxx.cn:31393\": x509: certificate signed by unknown authority

准备openssl.conf 证书生成参考https://www.golinuxcloud.com/openssl-subject-alternative-name/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[req]
distinguished_name = req_distinguished_name
req_extensions = req_ext
prompt = no

[req_distinguished_name]
CN = ums.xxx.cn

[req_ext]
subjectAltName = @alt_names

[alt_names]
IP.1 = 192.168.25.99
DNS.1 = ums.xxx.cn
DNS.2 = *.xxx.cn

使用自签名证书不用certmanager,要保证hostname CN subjectAltName 一致

1
2
3
4
5
6
7
8
9
10
# 生成ca
openssl req -newkey rsa:2048 -nodes -keyout ca.key -x509 -days 36500 -out ca.crt -subj "/C=xx/ST=x/L=x/O=x/OU=x/CN=ca/emailAddress=x/"

# 给生成的证书CA签名
openssl genrsa -out tls.key 2048
openssl req -new -key tls.key -out tls.csr -config openssl.conf
openssl x509 -req -days 36500 -in tls.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out tls.crt -extensions req_ext -extfile openssl.conf

# 验证SAN
openssl x509 -text -noout -in tls.crt | grep -A 1 "Subject Alternative Name"
1
2
3
4
5
6
7
8
9
10
11
kubectl -n cattle-system create secret tls tls-rancher-ingress \
  --cert=tls.crt \
  --key=tls.key
  
# Error from server (AlreadyExists): secrets "tls-rancher-ingress" already exists 报错的话先删除再重新创建
kubectl -n cattle-system delete secret tls-rancher-ingress


mv ca.crt cacerts.pem
kubectl -n cattle-system create secret generic tls-ca \
  --from-file=cacerts.pem=./cacerts.pem

自签名证书安装 Private CA signed certificate , add --set privateCA=true to the command:

1
2
3
4
5
6
7
8
9
helm install rancher rancher-2.6.5.tgz \
  --namespace cattle-system \
  --set hostname=205.xxx.cn \
  --set bootstrapPassword=fafafa \
  --set replicas=1 \
  --set useBundledSystemChart=true \
  --set additionalTrustedCAs=true \
  --set ingress.tls.source=secret \
  --set privateCA=true

验证

1
2
3
4
5
kubectl -n cattle-system get pod

NAME                              READY   STATUS    RESTARTS   AGE
rancher-6f7df66cf7-2fnw5          1/1     Running   0          3d23h
rancher-webhook-6994b4677-tpvf8   1/1     Running   0          3d23h

备份还原Rancher Backups (2.1.2)

参考

镜像

1
2
rancher/backup-restore-operator:v2.1.2
rancher/kubectl:v1.21.9

安装

image-20220606161846013

根据需要安装就行,可以保存到pv或者s3

  • docker单节点有可能会遇到k8s版本太高,rancher backup 版本太低导致

    1
    no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beata1"
    

    例如

    image-20220824101348866

    这时候要直接docker cp适合版本的安装文件到容器里面,使用helm_v3安装。

验证

1
2
3
4
kubectl -n cattle-resources-system get pod

NAME                              READY   STATUS    RESTARTS   AGE
rancher-backup-7f9ff4c6cb-68jd5   1/1     Running   0          5d

刷新后多出选项

image-20220606162818445

关于minio配置,必须用https

  • minio自身用https

    在docker目录/root/.minio对应的挂载卷目录,创建certs目录,对应的密钥和证书重命名成 private.key and public.crt使用SAN完整证书链,加入docker container ip(可以通过 docker log 查看到),server ip

    1
    docker run -d -p 19000:9000 -p 15000:5000 --name minio -e "MINIO_ROOT_USER=admin" -e "MINIO_ROOT_PASSWORD=12345678"  -v /data/minio/data:/data   -v /data/minio/config:/root/.minio minio/minio server --console-address ":5000" /data
    

    开启成功的话

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    docker restart minio
      
    docker logs -f minio
      
    WARNING: Detected Linux kernel version older than 4.0.0 release, there are some known potential performance problems with this kernel version. MinIO recommends a minimum of 4.x.x linux kernel version for best performance
    API: https://172.17.0.3:9000  https://127.0.0.1:9000
      
    Console: https://172.17.0.3:5000 https://127.0.0.1:5000
      
    Documentation: https://docs.min.io
    Finished loading IAM sub-system (took 0.0s of 0.0s to load data).
    
  • 关于账密

    创建secret

    1
    2
    3
    4
    5
    6
    7
    8
    apiVersion: v1
    kind: Secret
    metadata:
      name: creds
    type: Opaque
    data:
      accessKey: <Enter your base64-encoded access key>
      secretKey: <Enter your base64-encoded secret key>
    
  • 配置填写

    endpoint ca 必须填写使用证书base64后的内容,不要直接用证书明文。

    image-20220606163428904

迁移rancher

假设集群A迁移到B,A要先备份,尽量AB集群所用Kubernetes 版本相同,因为不同的版本apiVersion不一样

先备份

无论是docker或者集群安装的rancher,备份必须使用rancher backup operator,而不是参考docker backup,否则使用operator恢复备份文件时候会报错(参考Rancher backup panics when it encounters an invalid tarball)

1
2
3
4
5
6
7
8
9
10
11
12
goroutine 403 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1691bc0, 0xc0005a10c0)
        /go/pkg/mod/k8s.io/apimachinery@v0.18.0/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/apimachinery@v0.18.0/pkg/util/runtime/runtime.go:48 +0x82
panic(0x1691bc0, 0xc0005a10c0)
        /usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/rancher/backup-restore-operator/pkg/controllers/restore.getGVR(0xc0000aa000, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/rancher/backup-restore-operator/pkg/controllers/restore/controller.go:677 +0x2d4
github.com/rancher/backup-restore-operator/pkg/controllers/restore.(*handler).loadDataFromFile(0xc0000892c0, 0xc0003a3180, 0xc000248a00, 0x13d, 0x200, 0xc0003b8630, 0xc000291380, 0x0, 0xc0004ac120)
        /go/src/github.com/rancher/backup-restore-operator/pkg/controllers/restore/download.go:109 +0x17d
github.com/rancher/backup-restore-operator/pkg/controllers/restore.(*handler).LoadFromTarGzip(0xc0000892c0, 0xc0004ac120, 0x2f, 0xc0003b8630, 0xc000291380, 0x0, 0x0)

统一参考备份还原Rancher Backups (2.1.2)

minio可能会遇到的问题

检查minio和集群时区使用date,不然minio下载备份数据报错The difference between the request time and the server's time is too large., requeuing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 使用timedatectl比较时区和rtc
root@d-ecs-38357230:~/af# timedatectl
                      Local time: Tue 2022-06-07 17:17:30 CST
                  Universal time: Tue 2022-06-07 09:17:30 UTC
                        RTC time: Tue 2022-06-07 16:58:33
                       Time zone: Asia/Shanghai (CST, +0800)
       System clock synchronized: no
systemd-timesyncd.service active: yes
                 RTC in local TZ: yes


# 可能会用到的命令 调整时区和
timedatectl set-timezone Asia/Shanghai
timedatectl set-local-rtc 1

# 修改时间 可以使用watch -n 1 date查看minio服务器实时时间
date -s "2022-06-07 17:15:40"

迁移流程

下载rancher-backup相关的包

1
2
3
4
5
helm repo add rancher-charts https://charts.rancher.io
helm repo update

helm fetch rancher-charts/rancher-backup-crd --version 2.1.2
helm fetch rancher-charts/rancher-backup --version 2.1.2

在B安装,安装前可以使用这个脚本清理节点

1
2
3
4
helm install rancher-backup-crd rancher-backup-crd-2.1.2.tgz -n cattle-resources-system --create-namespace

# docker pull 版本对应的镜像包,然后指定
helm install rancher-backup rancher-backup-2.1.2.tgz -n cattle-resources-system  -f values.yaml

values.yaml

1
2
3
4
5
6
7
image:
  repository: harbor.xxx.cn:4443/rancher/backup-restore-operator
  tag: v2.1.2
global:
  kubectl:
    repository: harbor.xxx.cn:4443/rancher/kubectl
    tag: v1.21.9

验证

1
2
3
kubectl get pod -n cattle-resources-system
NAME                             READY   STATUS    RESTARTS   AGE
rancher-backup-94944dc7b-b87z9   1/1     Running   0          171m

在B创建和A相同的minio secert

1
2
3
4
5
6
7
8
9
apiVersion: v1
data:
  accessKey: YWRtaW4=
  secretKey: MTIzNDU2Nzg=
kind: Secret
metadata: 
  name: s3minio
  namespace: default
type: Opaque

创建restore自定义资源,在恢复自定义资源中,prune必须设置为 false。endpointCA 证书内容得base64编码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: resources.cattle.io/v1
kind: Restore
metadata:
  name: restore-migration
spec:
  backupFilename: minio-59ae5e34-b0b5-484a-9c51-d4df16766257-2022-06-06T07-19-59Z.tar.gz
  prune: false
  storageLocation:
    s3:
      bucketName: rancher-backup
      credentialSecretName: s3minio
      credentialSecretNamespace: default
      endpoint: xxx.xxx.xxx.205:9000
      endpointCA: xxxx
      insecureTLSSkipVerify: true

验证

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
vim restore-apply.yaml
kubectl apply -f restore-apply.yaml
restore.resources.cattle.io/restore-migration created

# 看restore状态
kubectl get restore
NAME                BACKUP-SOURCE   BACKUP-FILE                                                              AGE   STATUS
restore-migration                   minio-59ae5e34-b0b5-484a-9c51-d4df16766257-2022-06-06T07-19-59Z.tar.gz   8s


# 看log
kubectl get pods -n cattle-resources-system
NAME                             READY   STATUS    RESTARTS   AGE
rancher-backup-94944dc7b-b87z9   1/1     Running   0          3h5m

kubectl logs -n cattle-resources-system --tail 100 -f rancher-backup-94944dc7b-b87z9



# 成功后可以开始安装rancher
kubectl get restore
NAME                BACKUP-SOURCE   BACKUP-FILE                                                              AGE     STATUS
restore-migration   S3              minio-59ae5e34-b0b5-484a-9c51-d4df16766257-2022-06-06T07-19-59Z.tar.gz   4m52s   Completed

参考上面安装高可用rancher2.6.5rancher安装命令hostname和A安装的保持一致

迁移完成后的配置修改

检查server-url是否符合实际情况,和高可用安装时候的hostname一致

image-20220819102625183

针对从docker安装迁移到集群高可用安装的情况,主要是agent的连接从IP变成了域名,所以还需要配置好下游集群的agent

image

  • 获取kubeconfig,参考

    对于导入集群不需要考虑,创建的集群分两种情况

    • 有从UI界面获取过kubeconfig,可以通过切换context

      1
      2
      kubectl config get-contexts
      kubectl config use-context [context-name]
      

      Example:

      1
      2
      3
      CURRENT   NAME                        CLUSTER                     AUTHINFO     NAMESPACE
      *         my-cluster                  my-cluster                  user-46tmn
                my-cluster-controlplane-1   my-cluster-controlplane-1   user-46tmn
      

      在本例中,当您使用kubectl第一个上下文时my-cluster,您将通过 Rancher 服务器进行身份验证。

      使用第二个上下文,my-cluster-controlplane-1您将使用授权的集群端点进行身份验证,直接与下游 RKE 集群通信。

    • 已经对开连接了,就下载不了,可以从下游集群具有controlplane的节点上生成kubeconfig。

      参考https://gist.github.com/superseb/b14ed3b5535f621ad3d2aa6a4cd6443b

      1
      2
      docker run --rm --net=host -v $(docker inspect kubelet --format '')/ssl:/etc/kubernetes/ssl:ro --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='' | tail -1) -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap -n kube-system full-cluster-state -o json | jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_"' > kubeconfig_admin.yaml
          
      

      需要安装jq,比较麻烦。可以把方法拆分

      1
      2
      3
      4
      5
      docker exec -it kube-apiserver bash
      export KUBECONFIG=/etc/kubernetes/ssl/kubecfg-kube-node.yaml
          
      kubectl get configmap -n kube-system full-cluster-state -o json > full-cluster-state.json
          
      

      拿到json后,在执行

      1
      2
       cat full-cluster-state.json|jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_" > kubeconfig_admin.yaml
          
      
  • 修改coredns

    1
    kubectl edit cm coredns -n kube-system
    

    修改hosts

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    data:
      Corefile: |
        .:53 {
            errors
            health {
              lameduck 5s
            }
            hosts {
              xxxx 201.xxxx.cn
              fallthrough
            }
            ready
            kubernetes cluster.local in-addr.arpa ip6.arpa {
              pods insecure
              fallthrough in-addr.arpa ip6.arpa
            }
            prometheus :9153
            forward . "/etc/resolv.conf"
            cache 30
            loop
            reload
            loadbalance
        } # STUBDOMAINS - Rancher specific change
    kind: ConfigMap
    

    delete 对应的coredns pod,让hosts起效

    1
    1
    kubectl rollout restart deployment  coredns -n kube-system
    
  • 修改agent配置

    1
    kubectl edit deploy cattle-cluster-agent -n cattle-system
    

    修改CATTLE_SERVER,改成新的域名,同时找到挂载的secret

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
          containers:
          - env:
           ...
            - name: CATTLE_IS_RKE
              value: "true"
            - name: CATTLE_SERVER
              value: https://xxx.xxx.cn  #修改
            .....
            image: rancher/rancher-agent:v2.6.5
              
            .....
          volumes:
          - name: cattle-credentials
            secret:
              defaultMode: 320
              secretName: cattle-credentials-cdcb52a  #复制
    

    执行

    1
    kubectl edit secret -n cattle-system cattle-credentials-cdcb52a
    

    修改url,填入CATTLE_SERVER值的base64编码

    1
    2
    3
    4
    5
    6
    7
    8
    apiVersion: v1
    data:
      namespace: xxxx
      token: xxx
      url: aHR0cHM6Ly8yMDEudWlpbi5jbg==  # 修改
    kind: Secret
    .....
    type: Opaque
    

    最后delete cattle-cluster-agent对应的pod

    1
    kubectl rollout restart deployment  cattle-cluster-agent -n cattle-system
    

升级rancher2.5.5-2.6.5

单机版

参考

主要使用--volumes-from实现容器间数据共享

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# 停止旧版本容器
docker stop <OLD_RANCHER_CONTAINER_NAME>

# 备份
docker create --volumes-from <OLD_RANCHER_CONTAINER_NAME> --name rancher-data rancher/rancher:<OLD_RANCHER_CONTAINER_TAG>

# 在当前位置生成rancher数据压缩包
docker run --volumes-from rancher-data -v $PWD:/backup busybox tar zcvf /backup/rancher-data-backup-<RANCHER_VERSION>-<DATE>.tar.gz /var/lib/rancher

# 启动新rancher server
docker run -d --privileged --volumes-from rancher-data \
  --restart=unless-stopped  --name rancher2.6.5 \
  -e SSL_CERT_DIR="/container/certs" -v /root/harbor/cert:/container/certs \
  -e CATTLE_SYSTEM_CATALOG=bundled \
  -p 1080:80 -p 1443:443 \
    rancher/rancher:v2.6.5



docker run -d --privileged --volumes-from rancher-data \
  --restart=unless-stopped  --name rancher2.6.5 \
  -e SSL_CERT_DIR="/container/certs" -v /root/harbor/cert:/container/certs \
  -e CATTLE_SYSTEM_CATALOG=bundled \
  --add-host harbor.xxx.cn:10.xx.xx.205 \
  -p 1080:80 -p 1443:443 \
    rancher/rancher:v2.6.5


docker run -d --privileged --volumes-from rancher-data \
  --restart=unless-stopped  --name rancher2.6.5 \
  -e SSL_CERT_DIR="/container/certs" -v /root/harbor/cert:/container/certs \
  -e CATTLE_SYSTEM_CATALOG=bundled \
  --add-host harbor.xxx.cn:10.81.25.149    --add-host gitlab.xxx.cn:10.41.31.1 \
  -p 1080:80 -p 1443:443 \
    rancher/rancher:v2.6.5

# 如果自定义证书
docker run -d --privileged --restart=unless-stopped \
    -p 80:80 -p 443:443 \
    -v /<CERT_DIRECTORY>/<FULL_CHAIN.pem>:/etc/rancher/ssl/cert.pem \
    -v /<CERT_DIRECTORY>/<PRIVATE_KEY.pem>:/etc/rancher/ssl/key.pem \
    -v /<CERT_DIRECTORY>/<CA_CERTS.pem>:/etc/rancher/ssl/cacerts.pem \
    -e CATTLE_SYSTEM_DEFAULT_REGISTRY=<REGISTRY.YOURDOMAIN.COM:PORT> \ # Set a default private registry to be used in Rancher
    -e CATTLE_SYSTEM_CATALOG=bundled \ #Available as of v2.3.0,use the packaged Rancher system charts
    <REGISTRY.YOURDOMAIN.COM:PORT>/rancher/rancher:<RANCHER_VERSION_TAG>
    
    
# 删除旧容器
docker rm -f <OLD_RANCHER_CONTAINER_NAME>

回滚

1
2
3
4
5
6
7
8
9
10
11
12
# 解压数据到rancher-data

docker run  --volumes-from rancher-data \
-v $PWD:/backup busybox sh -c "rm /var/lib/rancher/* -rf \
&& tar zxvf /backup/rancher-data-backup-<RANCHER_VERSION>-<DATE>.tar.gz"

# 运行旧版本rancher
docker run -d --volumes-from rancher-data \
 --restart=unless-stopped \
 -p 80:80 -p 443:443 \
 --privileged \
 rancher/rancher:<PRIOR_RANCHER_VERSION>

高可用

升级rancher,rancher依赖cert-manager的crd,所以最好先找到升级版本的rancher对应的cert-manager,cert-manager也要升级

例如rancher 2.6.5依赖cert-manager 1.7,参考Install/Upgrade Rancher on a Kubernetes Cluster

备份

备份可以参考上面,或者看https://docs.rancher.cn/docs/rancher2.5/backups/back-up-rancher/_index/

1
2
3
4
5
# 获取旧版本的values
helm get values rancher -n cattle-system -o yaml > values.yaml

# 卸载 Rancher
helm delete rancher -n cattle-system

升级cert-manager

参考

备份现有资源

1
2
kubectl get -o yaml --all-namespaces \
issuer,clusterissuer,certificates,certificaterequests > cert-manager-backup.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 卸载现有部署
helm uninstall cert-manager -n cert-manager
kubectl delete namespace cert-manager

# 删除旧crd
kubectl delete -f old-version-crd.yaml

# 安装新版cert-manager
kubectl apply -f new-verson-crds.yaml

helm install cert-manager cert-manager-v1.7.1.tgz \
  --namespace cert-manager \
  --create-namespace \
  --version v1.7.1

恢复备份资源

1
2
3
4
5
6
7
8
9
10
11
kubectl apply -f cert-manager-backup.yaml


# 验证
kubectl get pods --namespace cert-manager

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-5c6866597-zw7kh               1/1     Running   0          2m
cert-manager-cainjector-577f6d9fd7-tr77l   1/1     Running   0          2m
cert-manager-webhook-787858fcdb-nlzsq      1/1     Running   0          2m

升级rancher

1
2
# 安装新版rancher
helm install rancher rancher-2.6.5.tgz -n cattle-system -f values.yaml

自定义集群重新以导入方式纳管

根据文档描述,直接UI界面删除自定义集群,会造成k8s的组件也会被删除,所以不能直接界面操作

场景:A rancher的自定义下游集群想迁移到B上(AB可以一样)

  • B先要创建导入集群,得到部署命令行,拿到对应的yaml

  • kubectl delete -f xxx.yaml相当于删掉与A的连接,删除后kubectl会失效

    1
    2
    kubectl get pods -A
    error: You must be logged in to the server (Unauthorized)
    
  • 重新获取kubeconfig 参考这个里面关于从kube-apiserver容器里面获取kubeconfig

  • 使用新的kubeconfig,执行B的导入纳管命令

重新使用rke管理集群

参考这里获取kubeconfig,这里面有整个集群的信息

image-20220922120337027

使用key full-cluster-state的内容创建cluster.rkestate,用于rke命令操作集群

1
kubectl get configmap -n kube-system full-cluster-state -o jsonpath='{.data.full-cluster-state}' > cluster.rkestate

根据集群信息编写cluster.yml完整示例

1
2
3
4
5
6
7
8
9
10
nodes:
  - address: xxxx
    user: root
    role: ["controlplane", "etcd", "worker"]
    ssh_key_path: /root/.ssh/id_rsa
    port: 22
		hostname_override: "" # 参考节点名,缺了可能相同IP两个节点

kubernetes_version: v1.19.16-rancher1-5  # 参考cluster.rkestate的kubernetesVersion非常重要
cluster_name: xxxx  # 参考

尽量寻找支持该kubernetesVersion的rke版本,只要前面v1.19.16-rancher1一样就可以。

cluster.rkestate和cluster.yml要在同一个目录,处理完这两个文件后,使用rke来操作了。同时保证cluster.rkestate真实反映集群节点现状

1
rke up -config cluster.yml

遇到问题

  • Centos系统不能用root作为SSH用户

    1
    Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access the Docker socket (/var/run/docker.sock). Please check if the configured user can execute `docker ps` on the node, and if the SSH server version is at least version 6.7 or higher. If you are using RedHat/CentOS, you can't use the user `root`. Please refer to the documentation for more instructions. Error: ssh: rejected: administratively prohibited (open failed) 
    

    新增用户

    1
    2
    3
    4
    5
    6
    # 所有机器 新增rancher用户,添加到docker组(rke安全限制)
    useradd rancher -G docker
    echo "123456" | passwd --stdin rancher
      
    ssh-keygen
    ssh-copy-id -i ~/.ssh/id_rsa.pub rancher@192.168.0.22
    

    根据节点修改cluster.yml的ssh_key_path和user

    1
    2
    3
    4
    5
    6
    7
    nodes:
      - address: xxxx
        user: rancher
        role: ["controlplane", "etcd", "worker"]
        ssh_key_path: /home/rancher/.ssh/id_rsa
        port: 22
      
    

端口开放准备

rancher 端口准备

k3s

入站规则

Protocol Port Source Description
TCP 80 Load balancer/proxy that does external SSL termination Rancher UI/API when external SSL termination is used
TCP 443 server nodesagent nodeshosted/registered Kubernetesany source that needs to be able to use the Rancher UI or API Rancher agent, Rancher UI/API, kubectl
TCP 6443 K3s server nodes Kubernetes API
UDP 8472 K3s server and agent nodes Required only for Flannel VXLAN.
TCP 10250 K3s server and agent nodes kubelet

出站规则

Protocol Port Destination Description
TCP 22 Any node IP from a node created using Node Driver SSH provisioning of nodes using Node Driver
TCP 443 git.rancher.io Rancher catalog
TCP 2376 Any node IP from a node created using Node driver Docker daemon TLS port used by Docker Machine
TCP 6443 Hosted/Imported Kubernetes API Kubernetes API server

RKE

节点间流量规则

Protocol Port Description
TCP 443 Rancher agents
TCP 2379 etcd client requests
TCP 2380 etcd peer communication
TCP 6443 Kubernetes apiserver
TCP 8443 Nginx Ingress’s Validating Webhook
UDP 8472 Canal/Flannel VXLAN overlay networking
TCP 9099 Canal/Flannel livenessProbe/readinessProbe
TCP 10250 Metrics server communication with all nodes
TCP 10254 Ingress controller livenessProbe/readinessProbe

入站规则

Protocol Port Source Description
TCP 22 RKE CLI SSH provisioning of node by RKE
TCP 80 Load Balancer/Reverse Proxy HTTP traffic to Rancher UI/API
TCP 443 Load Balancer/Reverse ProxyIPs of all cluster nodes and other API/UI clients HTTPS traffic to Rancher UI/API
TCP 6443 Kubernetes API clients HTTPS traffic to Kubernetes API

出站规则

Protocol Port Source Description
TCP 443 35.160.43.145,35.167.242.46,52.33.59.17 Rancher catalog (git.rancher.io)
TCP 22 Any node created using a node driver SSH provisioning of node by node driver
TCP 2376 Any node created using a node driver Docker daemon TLS port used by node driver
TCP 6443 Hosted/Imported Kubernetes API Kubernetes API server
TCP Provider dependent Port of the Kubernetes API endpoint in hosted cluster Kubernetes API

RKE2

入站规则

Protocol Port Source Description
TCP 9345 RKE2 agent nodes Kubernetes API
TCP 6443 RKE2 agent nodes Kubernetes API
UDP 8472 RKE2 server and agent nodes Required only for Flannel VXLAN
TCP 10250 RKE2 server and agent nodes kubelet
TCP 2379 RKE2 server nodes etcd client port
TCP 2380 RKE2 server nodes etcd peer port
TCP 30000-32767 RKE2 server and agent nodes NodePort port range
TCP 5473 Calico-node pod connecting to typha pod Required when deploying with Calico
HTTP 8080 Load balancer/proxy that does external SSL termination Rancher UI/API when external SSL termination is used
HTTPS 8443 hosted/registered Kubernetesany source that needs to be able to use the Rancher UI or API Rancher agent, Rancher UI/API, kubectl. Not needed if you have LB doing TLS termination.

通常允许所有出站流量。

总结

常用端口

1
2
3
4
TCP Ports
22, 80, 443, 2376, 2379, 2380, 6443, 9099, 9796, 10250, 10254, 30000-32767
UDP Ports
8472, 30000-32767

网络插件端口,默认使用Canal

  • WAVE插件 TCP 6783 UDP 6783-6784

  • Calico插件 TCP 179,5473 UDP 4789

  • Cilium插件 TCP 8472,4240

端口开放检测脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash

TCP_PORTS="22088 22 80 443 2376 2379 2380 6443 9099 9796 10250 10254"
UDP_PORTS="8472"
REMOTE_HOST=$1
TIMEOUT_SEC=5
LOCAL_IP='hostname -I | cut -d' ' -f1'
function check() {
    res=$1
    PORT=$2
    proto=$3
    if [[ $res -eq 0 ]]
    then
        echo "$proto $PORT OPEN"
    elif [[ $res -eq 1 ]]
    then
        echo "$proto $PORT OPEN BUT NOT LISTEN"
    elif [[ $res -eq 124 ]]
    then
        echo "$proto $PORT NOT OPEN"
    else
        echo "$proto $PORT UNKONWN ERROR"
    fi
}
echo "check $LOCAL_IP -----> $REMOTE_HOST port"
for PORT in $TCP_PORTS
do
    timeout $TIMEOUT_SEC bash -c "</dev/tcp/$REMOTE_HOST/$PORT" &>/dev/null; res=$?
    check $res $PORT "tcp"
done

for PORT in $UDP_PORTS
do
    timeout $TIMEOUT_SEC bash -c "</dev/tcp/$REMOTE_HOST/$PORT" &>/dev/null; res=$?
    check $res $PORT "udp"
done




端口测试

测试命令

1
2
3
4
# 添加规则 禁止tcp/9099 被访问
iptables -A INPUT -p tcp --dport 9099 -j DROP
# 删除规则
iptables -D INPUT -p tcp --dport 9099 -j DROP
  • tcp/6443

    image-20220921151546035

  • tcp/2380,2379

    1
    2
    kubectl get pods -A
    Error from server: etcdserver: request timed out
    

    image-20220921151516344

  • udp/8472

    pod 正常,dns解析 访问pod ip都有问题

    1
    2
    3
    4
    5
    6
    7
    / # nslookup default-http-backend
    ;; connection timed out; no servers could be reached
      
      
    / # wget 10.43.85.213
    Connecting to 10.43.85.213 (10.43.85.213:80)
    wget: can't connect to remote host (10.43.85.213): Operation timed out
    
  • tcp/10254 nginx-ingress-controller 起不来

    1
    2
    3
    4
    5
    kubectl get pods -n  ingress-nginx
    NAME                                    READY   STATUS             RESTARTS   AGE
    default-http-backend-6db58c58cd-bfk2h   1/1     Running            0          4h27m
    nginx-ingress-controller-lrx64          1/1     Running            0          4h13m
    nginx-ingress-controller-ztww2          0/1     CrashLoopBackOff   6          4h27m
    
  • tcp/9099 canal 运行有问题

    1
    2
    3
    4
    5
    kubectl get pods -n kube-system
    NAME                                       READY   STATUS      RESTARTS   AGE
    calico-kube-controllers-5898bd695c-cgl6f   1/1     Running     0          4h35m
    canal-2mwgg                                1/2     Running     1          4h35m
    canal-ffn6k                                2/2     Running     1          4h21m
    
  • tcp/10250 Metrics server communication with all nodes

    节点CPU 内存会变成N/A rancher看log会看不了

    1
    2
    kubectl logs -f --tail 200 xxx-5768967f5c-wmc2k -n xxx
    Error from server: Get "https://xxx:10250/containerLogs/xxx/xxx-5768967f5c-wmc2k/xxx?follow=true&tailLines=200": dial tcp xxxxx:10250: connect: no route to host
    

rancher error 处理

Failed to pull image “xxx”: rpc error: code = Unknown desc = Error response from daemon: pull access denied for xxx, repository does not exist or may require ‘docker login’

pod莫名其妙拉不了镜像,宿主机可以拉,推测集群有问题,而集群靠kubelet来管理容器

1
2
3
cp ~/.docker/config.json /var/lib/kubelet/config.json

docker restart kubelet

template system-library-rancher-monitoring 和kubeversion 不匹配

template system-library-rancher-monitoring incompatible with rancher version or cluster’s [xxx] kubernetes version

image-20220713172222587

参考 https://github.com/rancher/rancher/issues/37039#issuecomment-1176320933

  • 迁移相关,看是不是开启了v1版本的monitor,新版rancher用的是v2

    1
    2
    3
    # 设置检查集群配置
    enable_cluster_alerting: false
    enable_cluster_monitoring: false
    

    或者使用这个脚本检查,是否存在迁移,自定义证书要加--insecure参数

    得到

    1
    The Monitoring V1 operator does not appear to exist in cluster *******. Migration to Monitoring V2 should be possible.
    

    应该没啥问题

  • 检查system-library-rancher-monitoring这个 cr的内容

    1
    kubectl edit catalogtemplates system-library-rancher-monitoring 
    

    修改versions的第一项

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    spec:
      catalogId: system-library
      defaultVersion: 0.3.2
      description: Provides monitoring for Kubernetes which is maintained by Rancher 2.
      displayName: rancher-monitoring
      folderName: rancher-monitoring
      icon: https://coreos.com/sites/default/files/inline-images/Overview-prometheus_0.png
      projectURL: https://github.com/coreos/prometheus-operator
      versions:
      - digest: 08fbaee28d5a0efb79db02d9372629e2
        externalId: catalog://?catalog=system-library&template=rancher-monitoring&version=0.3.2
        kubeVersion: < 1.22.0-0  # 这个地方 改成 '>=1.21.0-0'
        rancherMinVersion: 2.6.1-alpha1
        version: 0.3.2
        versionDir: charts/rancher-monitoring/v0.3.2
        versionName: rancher-monitoring
    

Template system-library-rancher-monitoring incompatible with rancher version or cluster’s [local] kubernetes version

最后我通过编辑clusters.management.cattle.io CR conditions里面的报错来修复它,因为即使我卸载了 system-monitor ,错误仍然存在。

Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
kubectl edit  clusters.management.cattle.io/local


conditions:
  - status: "True"
    type: Ready
  - lastUpdateTime: "2022-05-31T09:00:06Z"
    status: "True"
    type: BackingNamespaceCreated
  - lastUpdateTime: "2022-05-31T09:00:09Z"
    status: "True"
    type: DefaultProjectCreated
  - lastUpdateTime: "2022-05-31T09:00:09Z"
    status: "True"
    type: SystemProjectCreated
  - lastUpdateTime: "2022-05-31T09:00:07Z"
    status: "True"
    type: CreatorMadeOwner
  - lastUpdateTime: "2022-05-31T09:00:08Z"

从system-library-rancher-monitoring获取相关的参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
kubectl edit catalogtemplates system-library-rancher-monitoring -n cattle-global-data

  versions:
  - digest: 08fbaee28d5a0efb79db02d9372629e2
    externalId: catalog://?catalog=system-library&template=rancher-monitoring&version=0.3.2
    kubeVersion: =1.23.6+k3s1
    rancherMinVersion: 2.6.1-alpha1
    version: 0.3.2
    versionDir: charts/rancher-monitoring/v0.3.2
    versionName: rancher-monitoring
  - digest: 08fbaee28d5a0efb79db02d9372629e2
    externalId: catalog://?catalog=system-library&template=rancher-monitoring&version=0.3.2
    kubeVersion: '>=1.22.0-0'
    rancherMinVersion: 2.6.1-alpha1
    version: 0.3.2
    versionDir: charts/rancher-monitoring/v0.3.2
    versionName: rancher-monitoring

通过阅读源码,做了一些其他debug,其实是不可能会报错的,所以直接改cluster的cr也不会重新报错。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
func (m *Manager) LatestAvailableTemplateVersion(template *v3.CatalogTemplate, clusterName string) (*v32.TemplateVersionSpec, error) {
	versions := template.DeepCopy().Spec.Versions
	if len(versions) == 0 {
		return nil, errors.New("empty catalog template version list")
	}

	sort.Slice(versions, func(i, j int) bool {
		val1, err := semver.ParseTolerant(versions[i].Version)
		if err != nil {
			return false
		}

		val2, err := semver.ParseTolerant(versions[j].Version)
		if err != nil {
			return false
		}

		return val2.LT(val1)
	})

	for _, templateVersion := range versions {
		catalogTemplateVersion := &v3.CatalogTemplateVersion{
			Spec: templateVersion,
		}

		if err := m.ValidateChartCompatibility(catalogTemplateVersion, clusterName, ""); err == nil {
			return &templateVersion, nil
		}
	}

	return nil, errors.Errorf("template %s incompatible with rancher version or cluster's [%s] kubernetes version", template.Name, clusterName)
}

could not find tenant ID context deadline exceeded

image

创建微软云凭证时候报错,凭证有效能用,POST /meta/aksCheckCredentials时候,报错。

1
{"error":"could not find tenant ID: Request failed: subscriptions.Client#Get: Failure sending request: StatusCode=0 -- Original Error: context deadline exceeded"}

通过错误提示,找到并阅读源码,goCtx用于控制请求超时时间,报错也和时间有关。

1
2
3
4
5
func FindTenantID(ctx context.Context, env azure.Environment, subscriptionID string) (string, error) {
	goCtx, cancel := context.WithTimeout(ctx, findTenantIDTimeout)
	defer cancel()
....
}

image-20220716094424410

函数作用

  • 调用azure sdk访问azure http api,验证cred。

排查方向

  • 网络方面真的超时,通过debug,大概摸清访问的url,参数等,尝试请求发现可以访问,响应也快。
  • 通过debug怀疑到时区问题引发的问题

系统默认是UTC,通过把时区设置成当地时区,并且reboot机器问题解决,只是restart rancher pod不起效

1
2
3
4
5
6
7
8
9
10
11
root@xxx:~# timedatectl
                      Local time: Tue 2022-07-19 03:20:00 UTC
                  Universal time: Tue 2022-07-19 03:20:00 UTC
                        RTC time: Tue 2022-07-19 03:20:01
                       Time zone: UTC (UTC, +0000)
       System clock synchronized: yes
systemd-timesyncd.service active: yes
                 RTC in local TZ: no
                 
# 修改时区
sudo timedatectl set-timezone Asia/Shanghai

下游集群重新导入Cluster agent is not connected

agent和rancher完全没有具体报错,agent卡在Connecting to proxy

1
2
3
4
5
6
INFO: Environment: CATTLE_ADDRESS=10.42.1.43 
....
time="2022-09-28T08:22:36Z" level=info msg="Rancher agent version v2.6.5 is starting"
time="2022-09-28T08:22:36Z" level=info msg="Listening on /tmp/log.sock"
time="2022-09-28T08:22:36Z" level=info msg="Connecting to wss://rancher.xxx.cn:31628/v3/connect/register with token starting with 2b2ch2cnsp6wxkzdm5djjbttr55"
time="2022-09-28T08:22:36Z" level=info msg="Connecting to proxy" url="wss://rancher.xxx.cn:31628/v3/connect/register"

解决方法:修改rancher 集群信息AgentDeployed

1
2
# rancher 导入集群命令所用的yaml路径例子,其中c-zqpccj就是集群id
https://xxxx/v3/import/xxxx_c-zqpcc.yaml

修改集群信息

1
kubectl edit clusters.management.cattle.io c-zqpcc

重置agent部署状态AgentDeployed

1
2
3
4
5
6
7
  conditions:         
  ...
   - lastUpdateTime: "2022-02-16T07:16:07Z"                           
    status: "True"      # 改成False                                 
    type: AgentDeployed
    
    

清楚原来agent,然后重新运行导入命令就可以了

迁移到2.6.7

需要镜像

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
rancher/rancher-webhook:v0.2.6
rancher/shell:v0.1.18
rancher/rancher:v2.6.7
rancher/gitjob:v0.1.30
rancher/fleet:v0.3.10
rancher/fleet-agent:v0.3.10
rancher/rancher-agent:v2.6.7
rancher/mirrored-pause:3.6
rancher/rke-tools:v0.1.87
rancher/hyperkube:v1.24.2-rancher1
rancher/mirrored-coreos-etcd:v3.4.16-rancher1
rancher/mirrored-calico-cni:v3.23.1
rancher/mirrored-calico-pod2daemon-flexvol:v3.23.1
rancher/kube-api-auth:v0.1.8
rancher/mirrored-calico-node:v3.23.1
rancher/mirrored-flannelcni-flannel:v0.17.0
rancher/mirrored-cluster-proportional-autoscaler:1.8.5
rancher/mirrored-metrics-server:v0.6.1
rancher/mirrored-ingress-nginx-kube-webhook-certgen:v1.1.1
rancher/mirrored-coredns-coredns:1.9.3
rancher/mirrored-calico-kube-controllers:v3.23.1
rancher/nginx-ingress-controller:nginx-1.2.1-rancher1

helm升级

1
2
# helm
helm upgrade --reuse-values  rancher rancher-2.6.7.tgz -n cattle-system

docker升级

升级rancher2.5.5-2.6.5参考

升级后

重新纳管集群参考

修改agent配置,那块只需要改secret,不需要改agent deploy