[BUG] Resources such as replicas are somehow not mutated when network is unstable 

## Describe the bug (🐛 if you encounter this issue)

There are two tickets, https://github.com/longhorn/longhorn/issues/5582 and https://github.com/longhorn/longhorn/issues/5613#issuecomment-1478347291, reporting the symptom. 

The issue surfaced after introducing https://github.com/longhorn/longhorn/issues/3917. From the support bundle, the network status in the two cases was unstable. The replicas that were created or updated did not have the `longhornvolume` label, which prevented the `listReplicas` function from detecting and listing them and lead to the replica being unmanageable. 

The underlying reason for the problem has yet to be identified. Although the root cause has not been determined, we can enhance resilience by
1. Validate the labels of resources as they play a critical role in the control plane
2. Always mutating the labels when updating resources

cc @weizhe0422 

## To Reproduce

Steps to reproduce the behavior:
1. Go to '...'
4. Click on '....'
5. Perform '....'
6. See error

## Expected behavior

A clear and concise description of what you expected to happen.

## Log or Support bundle

If applicable, add the Longhorn managers' log or support bundle when the issue happens. 
You can generate a Support Bundle using the link at the footer of the Longhorn UI.

## Environment

 - Longhorn version:
 - Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
 - Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
   - Number of management node in the cluster:
   - Number of worker node in the cluster:
 - Node config
   - OS type and version:
   - CPU per node:
   - Memory per node:
   - Disk type(e.g. SSD/NVMe):
   - Network bandwidth between the nodes:
 - Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
 - Number of Longhorn volumes in the cluster:

## Workaround

You can delete the **`stopped`** replicas whose label contains **`longhornnode: "null"`** by `kubectl` command.
These stopped replicas are not managed by longhorn system because of  https://github.com/longhorn/longhorn/issues/5762. Due to the lack of the label, longhorn cannot recognize and delete them.

- To avoid running into the issue again, you can 
  ```
  kubectl edit mutatingwebhookconfigurations longhorn-webhook-mutator
  ```
  Then, change `failurePolicy` from `Ignored` to `Fail`.

- For existing wrong replicas
You have to delete them manually https://github.com/longhorn/longhorn/issues/6179#issuecomment-1600977098.
One user provides a https://github.com/longhorn/longhorn/issues/6179#issuecomment-1660432656 for deleting them. You can check and use it.
## Additional context

Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Resources such as replicas are somehow not mutated when network is unstable #5762

Describe the bug (🐛 if you encounter this issue)

To Reproduce

Expected behavior

Log or Support bundle

Environment

Workaround

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Resources such as replicas are somehow not mutated when network is unstable #5762

Description

Describe the bug (🐛 if you encounter this issue)

To Reproduce

Expected behavior

Log or Support bundle

Environment

Workaround

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions