StarlingX/Containers/FAQ

History

 * February 11, 2019: Initial FAQ Setup: WIP until cut-over

What services are being containerized?
See Containerizing the StarlingX Infrastructure Initiative for information on the implementation. This describes the containerization infrastructure and what you can expect to see running in pods under Kubernetes (K8S). These pods are deployed via service specific Helm charts as described by an Armada "stx-openstack application" manifest.

Where can I get additional information on how to interact with the underlying technologies used to deploy the containerized services?

 * Docker: Use the docker command line
 * Kubernetes: kubectl Cheat Sheet
 * Helm: Using the Helm package manager
 * Armada: Armada documentation

How to Add New Armada App in StarlingX?
https://wiki.openstack.org/wiki/StarlingX/Containers/HowToAddNewArmadaAppInSTX

Armada App Code Structure
https://wiki.openstack.org/wiki/StarlingX/Containers/ArmadaAppCodeStructure

How to Add New FluxCD App in StarlingX?
https://wiki.openstack.org/wiki/StarlingX/Containers/HowToAddNewFluxCDAppInSTX

Converting Armada Applications to FluxCD
https://wiki.openstack.org/wiki/StarlingX/Containers/ConvertingArmadaAppsToFluxCD

Generate the stx-openstack application tarball
There are currently several application tarballs for dev/stable and latest/versioned. Unless you have a good reason to use something else, you should probably be using the "stable/versioned" tarball. The stx-openstack application tarballs are generated with each build on the CENGN mirror.

Alternatively, in a development environment, run the following command to construct the application tarballs. $MY_REPO_ROOT_DIR/cgcs-root/build-tools/build-helm-charts.sh The resulting tarballs can be found under $MY_WORKSPACE/std/build-helm/stx.

If the build-helm-charts.sh command is unable to find the charts, run "build-pkgs" to build the chart rpms and re-run the build-helm-charts.sh command.

Stage application for deployment
Transfer the stx-openstack-1.0-16-centos-stable-versioned.tgz application tarball onto your active controller.

Use sysinv to upload the application tarball.

source /etc/platform/openrc system application-upload stx-openstack stx-openstack-1.0-16-centos-stable-versioned.tgz system application-list

Bring Up Services
Use sysinv to apply the application. system application-apply stx-openstack

How do I monitor the application actions and look for problems?
After executing an application-xxxx action from the CLI, perform the one or more following to help determine if the requested application action is taking effect.

Watch for application state changes
You can monitor the progress of the application action by watching system application-list watch -n 5 system application-list

Observe the system inventory logs directly
For more detailed information than what is reported via the CLI, run the following: tail -f /var/log/sysinv.log

Observe the Armada logs
Examine the Armada execution logs.

On the host with: tail -f /var/log/armada/stx-openstack-apply.log

Inside the container with: sudo docker exec armada_service tail -f /logs/stx-openstack-apply.log

Watch for pod changes
Watch for pod state changes with: kubectl get pods --all-namespaces -o wide -w This will provide basic information with regards to pod state transitions from various Init states to Running/Completed and will help identify those pods which are in a CrashLoopBackOff

What to I do if I can't re-run an application action?
This can happen due to an unexpected error (usually on bleeding edge code in the development stream on master). Typically, this means that the application action failed and the failure was not communicated correctly leaving the sysinv DB out of sync with reality. First, try to re-apply the application. You may see: [wrsroot@controller-0 ~(keystone_admin)]$ system application-apply stx-openstack Application-apply rejected: install/update is already in progress. Then try to remove the application: [wrsroot@controller-0 ~(keystone_admin)]$ system application-remove stx-openstack Application-remove rejected: operation is not allowed while the current status is applying. If this occurs, you will need to reset the application state to uploaded. [wrsroot@controller-0 ~(keystone_admin)]$ sudo -u postgres psql -d sysinv -c"update kube_app set status='apply-failed' where name='stx-openstack';" UPDATE 1 With this state reset an application-apply can be executed again. If the application was successfully applied previously, then no changes should occur on the re-apply.

How to I change a configuration parameter in a deployed service?
To update a parameter associated with a given deployed service, use the system helm-overrides-xxx commands. For example, to update the number of glance workers, you would execute: system helm-override-update glance openstack --set conf.glance.DEFAULT.workers=2

and then re-apply the application: system application-apply stx-openstack Execute the following kubectl command to observe the glance pods restarting: kubectl get pods --all-namespaces -o wide -w | grep glance

What is the order of precedence for helm chart overrides in StarlingX?
There are four locations from which a given helm chart for a service can have values specified. If the values occur in more than one location an order of precedence is applied. helm repo update helm inspect starlingx/glance helm inspect starlingx/glance | less
 * User supplied (Highest)
 * Established via the system helm-override-xxx commands
 * Allows the user to override existing values and add new values previously not specified. Known values for a deployment can be seen with system helm-override-show
 * Dynamic overrides
 * Generated by sysinv and based on the contents of the system inventory
 * Resulting files are located in /opt/platform/helm/19.01/
 * Static overrides
 * These are defined in the application's armada manifest located in /opt/platform/armada/19.01/
 * These are the optimal operational values based on the testing done across all the supported StarlingX provisioned platforms
 * Chart values.yaml (Lowest)
 * These are the values provided by the helm chart.
 * These charts are packaged with the application and installed on the controller helm repo.
 * They can be examined by executing:

What are the current set of platform workarounds needed to deploy the services?
Any platform workarounds are contained in deployment instructions for the specific platform configurations. See:
 * One node configuration  ( AIO-SX )
 * All in One Duplex configuration  ( AIO-DX )
 * Standard, non storage, configuration ( Standard 2+2 )
 * Standard, storage, configuration ( Standard 2+2+2 )

As compared to the previously running native services, what changes in behavior can be expected?
The following items currently do not work or are not supported:
 * Neutron agent rescheduling
 * There are two separate Horizon instances, one for the platform on bare-metal and one running containerized for OpenStack.
 * The database operations for OpenStack services are now handled by a containerized mysqld server running as part of a galera cluster.

How do I check the health of service pods?
After executing system application-apply stx-openstack you should check the health of your deployed pods in the K8S cluster.

A healthy deployment will have all pods in either a Running or a Completed state. This can be checked with: kubectl get pods --all-namespaces -o wide

What should I do if I see a pod is not in a Running state?
(Note that it's normal for initialization pods to be in the "Completed" state.)

First check the pod events to see if a dependency may not have been met. For example, to check the events of an ailing nova compute pod, run the following command and examine the contents of the Events: section. Note: The pod name will be unique to your deployed system.

kubectl describe pods -n openstack nova-compute-compute-0-75ea0372-nmtz2

Then check the logs for that pod with: kubectl logs -n openstack nova-compute-compute-0-75ea0372-nmtz2

Based on data observed from these commands, you can typically start your debugging investigation which may require you to update overrides and redeploy the application.

How do I access the logs for the service pods?
The logs for a given pods can be checked with kubectl logs -n openstack

The above command allows you to access logs running on any host in the cluster. As an alternative, you can ssh to a given host and examine the logs in /var/log/pods and /var/log/containers. These will contain log information specific to pods and containers running only on that host.

How do I gain shell access to a pod so I can examine the contents of the deployed container?
Execute the following command: kubectl exec -it -n openstack -- bash

Note: This typically works for most images, but depending on how the docker image is built this may not be supported. All StarlingX images will support this as do most non-StarlingX images that are pulled by the helm charts.

How do I make changes to the code or configuration in a pod for debugging purposes?
Sometimes you may want to make code or configuration changes in a pod for debugging purposes, without rebuilding the images from source. This can be done by modifying a running container, saving a new image and then updating the application to use the new image.

'''The following must be done from a controller. '''

First connect to a shell in a running container: kubectl -n openstack exec -it -- bash kubectl -n openstack exec -it nova-compute-compute-0-31b0f4b3-2rqgf -- bash From the shell, make whatever config file or source code (e.g. python) changes you like. Then exit the shell with CTRL-D.
 * 1) For example:

Find the container ID for the container you just modified (look for the Container ID associated with the container): kubectl -n openstack describe pod kubectl -n openstack describe pod nova-compute-compute-0-31b0f4b3-2rqgf
 * 1) For example:

'''The following must be done from the host where the container is running, logged in as the root user. '''

Create a new image from the container you just modified: docker commit  : docker commit 12341234 stx-nova:test-1
 * 1) For example:

Now tag the image for upload to the registry on the controller: docker tag : registry.local:9001/ : docker tag stx-nova:test-1 registry.local:9001/stx-nova:test-1
 * 1) For example:

Then push the image to the registry on the controller: docker login registry.local:9001
 * 1) Log in to docker registry (user: admin, password: system admin password)

docker push registry.local:9001/ : docker push registry.local:9001/stx-nova:test-1
 * 1) Push updated image
 * 1) For example:

'''The following must be done from the active controller. '''

Update the helm overrides to point to the new image: system helm-override-show system helm-override-show stx-openstack nova openstack
 * 1) First show the existing overrides to view the existing image tags
 * 1) For example:

system helm-override-update  --set =registry.local:9001/ : system helm-override-update stx-openstack nova openstack --set images.tags.nova_compute=registry.local:9001/stx-nova:test-1
 * 1) Then update the override for the image
 * 1) For example:

Finally, re-apply the application to restart the pods with the new image: system application-apply stx-openstack

Is there a way to do a quick-and-dirty modification of a single running container?
Sometimes you just want to make a little change in a container for debugging purposes, without going through all the steps to create a new image. This can often be done by modifying a running container and restarting it.

'''The following must be done from a controller. '''

Get a shell in the container: kubectl exec -it -n openstack [-c ] -- bash

Then edit the desired files in the container and exit.

Now find the container ID for the container you just modified (look for the Container ID associated with the container): kubectl -n openstack describe pod kubectl -n openstack describe pod nova-compute-compute-0-31b0f4b3-2rqgf
 * 1) For example:

'''The following must be done from the host where the container is running, logged in as the root user. '''

Restart the container you just modified: docker restart  docker restart 12341234
 * 1) For example:

Is there a way to look at container logs if all you've got is collected tarballs?
Sometimes if you're investigating a reported bug all you've got are the collected tarballs for each node but you want to see the logs for a particular container. The logs for the various containers that ran on each node are available in the captured logs under "var/log/containers". The captured logs are named according to the following pattern:

_ __.log