SUMMARY
Fixes#869
During an OpenShift installation, one of the checks to see that the cluster is ready to proceed with configuration is to check to ensure that the Cluster Operators are in an Available: True Degraded: False Progressing: False state. While you can currently use the k8s_info module to get a json response, the resulting json needs to be iterated over several times to get the appropriate status.
This PR adds functionality into waiter.py which loops over all resource instances of the cluster operators. If any of them is not ready, waiter returns False and the task false. If the task returns, you can assume that all the cluster operators are healthy.
ISSUE TYPE
Feature Pull Request
COMPONENT NAME
waiter.py
ADDITIONAL INFORMATION
A simple playbook will trigger the waiter.py to watch the ClusterOperator object
---
- name: get operators
hosts: localhost
gather_facts: false
tasks:
- name: Get cluster operators
kubernetes.core.k8s_info:
api_version: v1
kind: ClusterOperator
kubeconfig: "/home/ocp/one/auth/kubeconfig"
wait: true
wait_timeout: 30
register: cluster_operators
This will produce the simple response if everything is functioning properly:
PLAY [get operators] *************************************************************************************************
TASK [Get cluster operators] *****************************************************************************************
ok: [localhost]
PLAY RECAP ***********************************************************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
If the timeout is reached:
PLAY [get operators] *************************************************************************************************
TASK [Get cluster operators] *****************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible_collections.kubernetes.core.plugins.module_utils.k8s.exceptions.CoreException: Failed to gather information about ClusterOperator(s) even after waiting for 30 seconds
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failed to gather information about ClusterOperator(s) even after waiting for 30 seconds"}
PLAY RECAP ***********************************************************************************************************
localhost : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
UNSOLVED: How to know which Operators are failing
Reviewed-by: Mandar Kulkarni <mandar242@gmail.com>
Reviewed-by: Bikouo Aubin
* Cleanup gha
* test by removing matrix excludes
* Rename sanity tests
* trigger integration tests
* Fix ansible-lint workflow
* Fix concurrency
* Add ansible-lint config
* Add ansible-lint config
* Fix integration and lint issues
* integration wf
* fix yamllint issues
* fix yamllint issues
* update readme and add ignore-2.16.txt
* fix ansible-doc
* Add version
* Use /dev/random to generate random data
The GHA environment has difficultly generating entropy. Trying to read
from /dev/urandom just blocks forever. We don't care if the random data
is cryptographically secure; it's just garbage data for the test. Read
from /dev/random, instead. This is only used during the k8s_copy test
target.
This also removes the custom test module that was being used to generate
the files. It's not worth maintaining this for two task that can be
replaced with some simple command/shell tasks.
* Fix saniry errors
* test github_action fix
* Address review comments
* Remove default types
* review comments
* isort fixes
* remove tags
* Add setuptools to venv
* Test gh changes
* update changelog
* update ignore-2.16
* Fix indentation in inventory plugin example
* Update .github/workflows/integration-tests.yaml
* Update integration-tests.yaml
---------
Co-authored-by: Mike Graves <mgraves@redhat.com>
Co-authored-by: Bikouo Aubin <79859644+abikouo@users.noreply.github.com>
This primarily moves the diff and wait logic from the various service
methods to perform_action to eliminate code duplication. I also moved
the diff_objects function out of the service object and moved most of
the find_resource logic to a new resource client method. We ended up
with several modules creating a service object just to use one of these
methods, so it seemed to make sense to make these more accessible.
K8sService class
SUMMARY
This refactors the perform_action() logic from common.py into a separate K8sService class.
TODO:
Unit tests.
ISSUE TYPE
New Module Pull Request
COMPONENT NAME
service.py
Reviewed-by: Abhijeet Kasurde <None>
Reviewed-by: Mike Graves <mgraves@redhat.com>
Reviewed-by: Alina Buzachis <None>
Reviewed-by: None <None>
Add new waiter
SUMMARY
This refactors the waiter logic from common.py into a separate module.
ISSUE TYPE
COMPONENT NAME
ADDITIONAL INFORMATION
Reviewed-by: None <None>
Reviewed-by: Alina Buzachis <None>
Reviewed-by: None <None>