Skip to content

The cloudstack-agent can not work when upgrade from 4.17.2.0 to 4.18.0.0 #7742

@xuanyuanaosheng

Description

@xuanyuanaosheng
ISSUE TYPE
  • Bug Report
COMPONENT NAME
  • 4.18.0.0
CLOUDSTACK VERSION
  1. cloudstack-management
# cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

# rpm -qa | grep cloudstack
cloudstack-management-4.18.0.0-1.el7.x86_64
cloudstack-common-4.18.0.0-1.el7.x86_64
  1. cloudstack-agent
# cat /etc/os-release 
NAME="Oracle Linux Server"
VERSION="8.8"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:8:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://github.com/oracle/oracle-linux"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.8
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.8

# rpm -qa | grep cloudstack
cloudstack-common-4.18.0.0-1.x86_64
cloudstack-agent-4.18.0.0-1.x86_64
SUMMARY

when I upgrade the cloudstack-agent from 4.17.2.0 to 4.18.0.0, the cloudstack-agent can not work fine.

The cloudstack agent status

# systemctl status cloudstack-agent
● cloudstack-agent.service - CloudStack Agent
   Loaded: loaded (/usr/lib/systemd/system/cloudstack-agent.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Fri 2023-06-30 12:16:48 CST; 3s ago
     Docs: http://www.cloudstack.org/
  Process: 113054 ExecStart=/usr/bin/java $JAVA_OPTS $JAVA_DEBUG -cp $CLASSPATH $JAVA_CLASS (code=exited, status=1/FAILURE)
 Main PID: 113054 (code=exited, status=1/FAILURE)

The details error log is

2023-07-13 16:40:59,072 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/versions.sh 
2023-07-13 16:40:59,074 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing while with timeout : 1800000
2023-07-13 16:40:59,091 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Execution is successful.
2023-07-13 16:40:59,102 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing: sudo grep InitiatorName= /etc/iscsi/initiatorname.iscsi 
2023-07-13 16:40:59,103 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing while with timeout : 3600000
2023-07-13 16:40:59,119 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Execution is successful.
2023-07-13 16:40:59,121 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: qemu:///system
2023-07-13 16:40:59,124 ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion
2023-07-13 16:40:59,124 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Opening a new libvirtd connection to: qemu:///system
2023-07-13 16:40:59,126 DEBUG [utils.script.Script] (Agent-Handler-1:null) (logid:) Executing: qemu-img --help 
2023-07-13 16:40:59,127 DEBUG [utils.script.Script] (Agent-Handler-1:null) (logid:) Executing while with timeout : 3600000
2023-07-13 16:40:59,131 DEBUG [utils.script.Script] (Agent-Handler-1:null) (logid:) Execution is successful.
2023-07-13 16:40:59,134 DEBUG [utils.script.Script] (Agent-Handler-1:null) (logid:) Executing: cryptsetup --usage 
2023-07-13 16:40:59,134 DEBUG [utils.script.Script] (Agent-Handler-1:null) (logid:) Executing while with timeout : 3600000
2023-07-13 16:40:59,138 DEBUG [utils.script.Script] (Agent-Handler-1:null) (logid:) Execution is successful.
2023-07-13 16:40:59,138 DEBUG [utils.script.Script] (Agent-Handler-1:null) (logid:) Usage: cryptsetup [-?Vvyrq] [-?|--help] [--usage] [-V|--version]
        [-v|--verbose] [--debug] [--debug-json] [-c|--cipher=STRING]
        [-h|--hash=STRING] [-y|--verify-passphrase] [-d|--key-file=STRING]
        [--master-key-file=STRING] [--dump-master-key] [-s|--key-size=BITS]
        [-l|--keyfile-size=bytes] [--keyfile-offset=bytes]
        [--new-keyfile-size=bytes] [--new-keyfile-offset=bytes]
        [-S|--key-slot=INT] [-b|--size=SECTORS] [--device-size=bytes]
        [-o|--offset=SECTORS] [-p|--skip=SECTORS] [-r|--readonly]
        [-q|--batch-mode] [-t|--timeout=secs] [--progress-frequency=secs]
        [-T|--tries=INT] [--align-payload=SECTORS]
        [--header-backup-file=STRING] [--use-random] [--use-urandom]
        [--shared] [--uuid=STRING] [--allow-discards] [--header=STRING]
        [--test-passphrase] [--tcrypt-hidden] [--tcrypt-system]
        [--tcrypt-backup] [--veracrypt] [--veracrypt-pim=INT]
        [--veracrypt-query-pim] [-M|--type=STRING] [--force-password]
        [--perf-same_cpu_crypt] [--perf-submit_from_crypt_cpus]
        [--perf-no_read_workqueue] [--perf-no_write_workqueue] [--deferred]
        [--serialize-memory-hard-pbkdf] [-i|--iter-time=msecs]
        [--pbkdf=STRING] [--pbkdf-memory=kilobytes]
        [--pbkdf-parallel=threads] [--pbkdf-force-iterations=LONG]
        [--priority=STRING] [--disable-locks] [--disable-keyring]
        [-I|--integrity=STRING] [--integrity-no-journal]
        [--integrity-no-wipe] [--integrity-legacy-padding] [--token-only]
        [--token-id=INT] [--key-description=STRING] [--sector-size=INT]
        [--iv-large-sectors] [--persistent] [--label=STRING]
        [--subsystem=STRING] [--unbound] [--json-file=STRING]
        [--luks2-metadata-size=bytes] [--luks2-keyslots-size=bytes]
        [--refresh] [--keyslot-key-size=BITS] [--keyslot-cipher=STRING]
        [--encrypt] [--decrypt] [--init-only] [--resume-only]
        [--reduce-device-size=bytes] [--hotzone-size=bytes]
        [--resilience=STRING] [--resilience-hash=STRING]
        [--active-name=STRING] [OPTION...] <action> <action-specific>

2023-07-13 16:40:59,141 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool 37ca62bd-24fd-4bd5-b569-0890fce8fe90 (Filesystem) in libvirt
2023-07-13 16:40:59,141 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: qemu:///system
2023-07-13 16:40:59,142 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Found existing defined storage pool 37ca62bd-24fd-4bd5-b569-0890fce8fe90, using it.
2023-07-13 16:40:59,143 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool 37ca62bd-24fd-4bd5-b569-0890fce8fe90 from libvirt
2023-07-13 16:40:59,143 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: qemu:///system
2023-07-13 16:40:59,156 DEBUG [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Successfully refreshed pool 37ca62bd-24fd-4bd5-b569-0890fce8fe90 Capacity: (49.98 GB) 53660876800 Used: (11.11 GB) 11928494080 Available: (38.87 GB) 41732382720
2023-07-13 16:40:59,203 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing: hostname 
2023-07-13 16:40:59,204 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing while with timeout : 500
2023-07-13 16:40:59,205 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Execution is successful.
2023-07-13 16:40:59,206 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing: hostname 
2023-07-13 16:40:59,206 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing while with timeout : 500
2023-07-13 16:40:59,207 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Execution is successful.
2023-07-13 16:40:59,228 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Sending Startup: Seq 0-0:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 1, [{"com.cloud.agent.api.StartupRoutingCommand":{"cpuSockets":"2","cpus":"56","speed":"2600","memory":"809151959040","dom0MinMemory":"1073741824","poolSync":"false","supportsClonedVolumes":"false","caps":"hvm,snapshot","pool":"/root","hypervisorType":"KVM","hostDetails":{"Host.OS.Kernel.Version":"5.4.17-2136.321.4.el8uek.x86_64","com.cloud.network.Networks.RouterPrivateIpStrategy":"HostLocal","Host.OS.Version":"8.8","host.volume.encryption":"true","secured":"true","Host.OS":"Oracle Linux Server"},"hostTags":[],"groupDetails":{},"type":"Routing","dataCenter":"10","pod":"9","cluster":"10","guid":"5d98d8a3-57ae-3c08-b2a3-6666a8fefc1e-LibvirtComputingResource","name":"ljzdckvm001.cn.prod","id":"0","version":"4.18.0.0","iqn":"iqn.1988-12.com.oracle:acefa7ff557d","publicIpAddress":"10.67.128.1","publicNetmask":"255.255.255.0","publicMacAddress":"dc:f4:01:dd:a9:8f","privateIpAddress":"10.67.128.1","privateMacAddress":"dc:f4:01:dd:a9:8f","privateNetmask":"255.255.255.0","storageIpAddress":"10.67.128.1","storageNetmask":"255.255.255.0","storageMacAddress":"dc:f4:01:dd:a9:8f","resourceName":"LibvirtComputingResource","gatewayIpAddress":"10.67.128.254","msHostList":"10.25.2.173@static","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.StartupStorageCommand":{"totalSize":"(0 bytes) 0","poolInfo":{"uuid":"37ca62bd-24fd-4bd5-b569-0890fce8fe90","host":"10.67.128.1","localPath":"/var/lib/libvirt/images","hostPath":"/var/lib/libvirt/images","poolType":"Filesystem","capacityBytes":"(49.98 GB) 53660876800","availableBytes":"(38.87 GB) 41732382720"},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"10","pod":"9","guid":"5d98d8a3-57ae-3c08-b2a3-6666a8fefc1e-LibvirtComputingResource","name":"ljzdckvm001.cn.prod","id":"0","version":"4.18.0.0","resourceName":"LibvirtComputingResource","msHostList":"10.25.2.173@static","wait":"0","bypassHostMaintenance":"false"}}] }
2023-07-13 16:40:59,228 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Startup task created
2023-07-13 16:40:59,311 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:) Request:Seq -1--1:  { Cmd , MgmtId: -1, via: -1, Ver: v1, Flags: 111, [{"com.cloud.agent.api.ReadyCommand":{"_details":"java.lang.IllegalArgumentException: Can't add host: 10.67.128.1 with hostOS: Oracle Linux Server into a cluster,in which there are Red hosts added","wait":"0","bypassHostMaintenance":"false"}}] }
2023-07-13 16:40:59,311 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Received response: Seq 0-0:  { Ans: , MgmtId: 345052215515, via: -1, Ver: v1, Flags: 100000, [{"com.cloud.agent.api.StartupAnswer":{"hostId":"0","pingInterval":"60","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2023-07-13 16:40:59,311 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:) Processing command: com.cloud.agent.api.ReadyCommand
2023-07-13 16:40:59,312 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Startup task cancelled
2023-07-13 16:40:59,312 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Process agent startup answer, agent id = 0
2023-07-13 16:40:59,312 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Set agent id 0
2023-07-13 16:40:59,312 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:) Not ready to connect to mgt server: java.lang.IllegalArgumentException: Can't add host: 10.67.128.1 with hostOS: Oracle Linux Server into a cluster,in which there are Red hosts added
2023-07-13 16:40:59,314 INFO  [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Stopping the agent: Reason = sig.kill
2023-07-13 16:40:59,315 DEBUG [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Sending shutdown to management server
2023-07-13 16:40:59,319 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Adding a watch list
2023-07-13 16:40:59,319 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Startup Response Received: agent id = 0
2023-07-13 16:40:59,323 DEBUG [kvm.resource.LibvirtComputingResource] (UgentTask-1:null) (logid:) Executing: /usr/share/cloudstack-common/scripts/vm/network/security_group.py get_rule_logs_for_vms 
2023-07-13 16:40:59,324 DEBUG [kvm.resource.LibvirtComputingResource] (UgentTask-1:null) (logid:) Executing while with timeout : 1800000
2023-07-13 16:40:59,398 DEBUG [kvm.resource.LibvirtComputingResource] (UgentTask-1:null) (logid:) Execution is successful.
2023-07-13 16:40:59,401 DEBUG [kvm.resource.LibvirtConnection] (UgentTask-1:null) (logid:) Looking for libvirtd connection at: qemu:///system
2023-07-13 16:40:59,404 DEBUG [cloud.agent.Agent] (UgentTask-1:null) (logid:) Sending ping: Seq 0-1:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{},"_gatewayAccessible":"true","_vnetAccessible":"true","hostType":"Routing","hostId":"0","wait":"0","bypassHostMaintenance":"false"}}] }

I checked the #3509 , the agent env is

# rpm -qa | grep libvirt
libvirt-daemon-driver-nodedev-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-config-nwfilter-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-lock-sanlock-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-driver-storage-gluster-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-docs-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-libs-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-driver-storage-logical-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-driver-storage-core-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-config-network-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-driver-storage-scsi-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-devel-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-gobject-3.0.0-1.el8.x86_64
libvirt-daemon-driver-interface-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-driver-storage-iscsi-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-client-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-driver-secret-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-driver-storage-mpath-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-nss-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
python3-libvirt-8.0.0-2.module+el8.8.0+20990+60c1530a.x86_64
libvirt-dbus-1.3.0-2.module+el8.8.0+20990+60c1530a.x86_64
libvirt-daemon-driver-network-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-driver-storage-disk-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-gconfig-3.0.0-1.el8.x86_64
libvirt-daemon-driver-storage-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-wireshark-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
pcp-pmda-libvirt-5.3.7-16.0.2.el8.x86_64
libvirt-daemon-driver-storage-iscsi-direct-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-driver-storage-rbd-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-glib-3.0.0-1.el8.x86_64
libvirt-daemon-driver-nwfilter-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-driver-qemu-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64
libvirt-daemon-kvm-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64

# # rpm -qa | grep qemu
qemu-guest-agent-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-ui-opengl-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-ui-spice-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-hw-usbredir-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-block-ssh-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-docs-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-block-rbd-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-common-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-block-curl-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-core-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-img-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-block-iscsi-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
qemu-kvm-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
ipxe-roms-qemu-20181214-11.git133f4c47.el8.noarch
qemu-kvm-block-gluster-6.2.0-32.module+el8.8.0+21044+01700444.x86_64
libvirt-daemon-driver-qemu-8.0.0-19.0.2.module+el8.8.0+21112+1cc1a24b.x86_64

# uname -a
Linux test001 5.4.17-2136.321.4.el8uek.x86_64 #2 SMP Mon Jun 26 18:17:37 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux

I restart the agent node and system, It still can not work.

I think it is a libvirtd and cloudstack connection problem, cloud you give some advices? I config the libvirtd using :https://www.sbarjatiya.com/notes_wiki/index.php/CentOS_8.x_Cloudstack_4.15_Setup_KVM_host

The unupgrade agent node (4.17.2.0) is still work fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions