You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue surfaced after introducing #3917. From the support bundle, the network status in the two cases was unstable. The replicas that were created or updated did not have the longhornvolume label, which prevented the listReplicas function from detecting and listing them and lead to the replica being unmanageable.
The underlying reason for the problem has yet to be identified. Although the root cause has not been determined, we can enhance resilience by
Validate the labels of resources as they play a critical role in the control plane
Always mutating the labels when updating resources
A clear and concise description of what you expected to happen.
Log or Support bundle
If applicable, add the Longhorn managers' log or support bundle when the issue happens.
You can generate a Support Bundle using the link at the footer of the Longhorn UI.
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
Number of management node in the cluster:
Number of worker node in the cluster:
Node config
OS type and version:
CPU per node:
Memory per node:
Disk type(e.g. SSD/NVMe):
Network bandwidth between the nodes:
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
Number of Longhorn volumes in the cluster:
Workaround
You can delete the stopped replicas whose label contains longhornnode: "null" by kubectl command.
These stopped replicas are not managed by longhorn system because of #5762. Due to the lack of the label, longhorn cannot recognize and delete them.