Jump to: navigation, search

StarlingX/Containers/FAQ

< StarlingX‎ | Containers
Revision as of 20:41, 25 June 2019 by Cbf123 (talk | contribs) (Stage application for deployment)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

History

  • February 11, 2019: Initial FAQ Setup: WIP until cut-over

FAQ: General Overview

What services are being containerized?

See Containerizing the StarlingX Infrastructure Initiative for information on the implementation. This describes the containerization infrastructure and what you can expect to see running in pods under Kubernetes (K8S). These pods are deployed via service specific Helm charts as described by an Armada "stx-openstack application" manifest.

Where can I get additional information on how to interact with the underlying technologies used to deploy the containerized services?

FAQ: Openstack Application Life-cycle

How do I start the Containerized services?

Generate the stx-openstack application tarball

There are currently several application tarballs for dev/stable and latest/versioned. Unless you have a good reason to use something else, you should probably be using the "stable/versioned" tarball. The stx-openstack application tarballs are generated with each build on the CENGN mirror.

Alternatively, in a development environment, run the following command to construct the application tarballs.

$MY_REPO_ROOT_DIR/cgcs-root/build-tools/build-helm-charts.sh

The resulting tarballs can be found under $MY_WORKSPACE/std/build-helm/stx.

If the build-helm-charts.sh command is unable to find the charts, run "build-pkgs" to build the chart rpms and re-run the build-helm-charts.sh command.

Stage application for deployment

Transfer the stx-openstack-1.0-16-centos-stable-versioned.tgz application tarball onto your active controller.

Use sysinv to upload the application tarball.

source /etc/platform/openrc
system application-upload stx-openstack stx-openstack-1.0-16-centos-stable-versioned.tgz
system application-list
Bring Up Services

Use sysinv to apply the application.

system application-apply stx-openstack

How do I monitor the application actions and look for problems?

After executing an application-xxxx action from the CLI, perform the one or more following to help determine if the requested application action is taking effect.

Watch for application state changes

You can monitor the progress of the application action by watching system application-list

watch -n 5 system application-list
Observe the system inventory logs directly

For more detailed information than what is reported via the CLI, run the following:

tail -f /var/log/sysinv.log
Observe the Armada logs

Examine the Armada execution logs.

On the host with:

tail -f /var/log/armada/stx-openstack-apply.log

Inside the container with:

sudo docker exec armada_service tail -f /logs/stx-openstack-apply.log
Watch for pod changes

Watch for pod state changes with:

kubectl get pods --all-namespaces -o wide -w 

This will provide basic information with regards to pod state transitions from various Init states to Running/Completed and will help identify those pods which are in a CrashLoopBackOff

What to I do if I can't re-run an application action?

This can happen due to an unexpected error (usually on bleeding edge code in the development stream on master). Typically, this means that the application action failed and the failure was not communicated correctly leaving the sysinv DB out of sync with reality. First, try to re-apply the application. You may see:

[wrsroot@controller-0 ~(keystone_admin)]$ system application-apply stx-openstack
Application-apply rejected: install/update is already in progress.

Then try to remove the application:

[wrsroot@controller-0 ~(keystone_admin)]$ system application-remove stx-openstack
Application-remove rejected: operation is not allowed while the current status is applying.

If this occurs, you will need to reset the application state to uploaded.

[wrsroot@controller-0 ~(keystone_admin)]$ sudo -u postgres psql -d sysinv -c"update kube_app set status='apply-failed' where name='stx-openstack';"
UPDATE 1

With this state reset an application-apply can be executed again. If the application was successfully applied previously, then no changes should occur on the re-apply.

How to I change a configuration parameter in a deployed service?

To update a parameter associated with a given deployed service, use the system helm-overrides-xxx commands. For example, to update the number of glance workers, you would execute:

system helm-override-update glance openstack --set conf.glance.DEFAULT.workers=2

and then re-apply the application:

system application-apply stx-openstack

Execute the following kubectl command to observe the glance pods restarting:

kubectl get pods --all-namespaces -o wide -w | grep glance

What is the order of precedence for helm chart overrides in StarlingX?

There are four locations from which a given helm chart for a service can have values specified. If the values occur in more than one location an order of precedence is applied.

  • User supplied (Highest)
    • Established via the system helm-override-xxx commands
    • Allows the user to override existing values and add new values previously not specified. Known values for a deployment can be seen with system helm-override-show
  • Dynamic overrides
    • Generated by sysinv and based on the contents of the system inventory
    • Resulting files are located in /opt/platform/helm/19.01/
  • Static overrides
    • These are defined in the application's armada manifest located in /opt/platform/armada/19.01/
    • These are the optimal operational values based on the testing done across all the supported StarlingX provisioned platforms
  • Chart values.yaml (Lowest)
    • These are the values provided by the helm chart.
    • These charts are packaged with the application and installed on the controller helm repo.
    • They can be examined by executing:
helm repo update
helm inspect starlingx/glance
helm inspect starlingx/glance | less
    

What are the current set of platform workarounds needed to deploy the services?

Any platform workarounds are contained in deployment instructions for the specific platform configurations. See:

As compared to the previously running native services, what changes in behavior can be expected?

The following items currently do not work or are not supported:

  • Neutron agent rescheduling
  • There are two separate Horizon instances, one for the platform on bare-metal and one running containerized for OpenStack.
  • The database operations for OpenStack services are now handled by a containerized mysqld server running as part of a galera cluster.

FAQ: Service Debugging

How do I check the health of service pods?

After executing system application-apply stx-openstack you should check the health of your deployed pods in the K8S cluster.

A healthy deployment will have all pods in either a Running or a Completed state. This can be checked with:

kubectl get pods --all-namespaces -o wide

What should I do if I see a pod is not in a Running state?

(Note that it's normal for initialization pods to be in the "Completed" state.)

First check the pod events to see if a dependency may not have been met. For example, to check the events of an ailing nova compute pod, run the following command and examine the contents of the Events: section. Note: The pod name will be unique to your deployed system.

kubectl describe pods -n openstack nova-compute-compute-0-75ea0372-nmtz2

Then check the logs for that pod with:

kubectl logs -n openstack nova-compute-compute-0-75ea0372-nmtz2

Based on data observed from these commands, you can typically start your debugging investigation which may require you to update overrides and redeploy the application.

How do I access the logs for the service pods?

The logs for a given pods can be checked with

kubectl logs -n openstack <pod name>

The above command allows you to access logs running on any host in the cluster. As an alternative, you can ssh to a given host and examine the logs in /var/log/pods and /var/log/containers. These will contain log information specific to pods and containers running only on that host.

How do I gain shell access to a pod so I can examine the contents of the deployed container?

Execute the following command:

kubectl exec -it -n openstack <pod name> -- bash

Note: This typically works for most images, but depending on how the docker image is built this may not be supported. All StarlingX images will support this as do most non-StarlingX images that are pulled by the helm charts.

How do I make changes to the code or configuration in a pod for debugging purposes?

Sometimes you may want to make code or configuration changes in a pod for debugging purposes, without rebuilding the images from source. This can be done by modifying a running container, saving a new image and then updating the application to use the new image.

The following must be done from a controller.

First connect to a shell in a running container:

kubectl -n openstack exec -it <pod name> -- bash
# For example:
kubectl -n openstack exec -it nova-compute-compute-0-31b0f4b3-2rqgf -- bash

From the shell, make whatever config file or source code (e.g. python) changes you like. Then exit the shell with CTRL-D.

Find the container ID for the container you just modified (look for the Container ID associated with the container):

kubectl -n openstack describe pod <pod name>
# For example:
kubectl -n openstack describe pod nova-compute-compute-0-31b0f4b3-2rqgf

The following must be done from the host where the container is running, logged in as the root user.

Create a new image from the container you just modified:

docker commit <container ID> <image name>:<tag>
# For example:
docker commit 12341234 stx-nova:test-1

Now tag the image for upload to the registry on the controller:

docker tag <image name>:<tag> <controller management IP>:9001/<image name>:<tag>
# For example:
docker tag stx-nova:test-1 192.168.204.2:9001/stx-nova:test-1

Then push the image to the registry on the controller:

# Log in to docker registry (user: admin, password: system admin password)
docker login <controller management IP>:9001
# For example:
docker login 192.168.204.2:9001

# Push updated image
docker push <controller management IP>:9001/<image name>:<tag>
# For example:
docker push 192.168.204.2:9001/stx-nova:test-1

The following must be done from the active controller.

Update the helm overrides to point to the new image:

# First show the existing overrides to view the existing image tags
system helm-override-show <chart name> <namespace>
# For example:
system helm-override-show nova openstack

# Then update the override for the image
system helm-override-update <chart name> <namespace> --set <override>=<controller management IP>:9001/<image name>:<tag>
# For example:
system helm-override-update nova openstack --set images.tags.nova_compute=192.168.204.2:9001/stx-nova:test-1

Finally, re-apply the application to restart the pods with the new image:

system application-apply stx-openstack

Is there a way to do a quick-and-dirty modification of a single running container?

Sometimes you just want to make a little change in a container for debugging purposes, without going through all the steps to create a new image. This can often be done by modifying a running container and restarting it.

The following must be done from a controller.

Get a shell in the container:

kubectl exec -it -n openstack <pod name> [-c <container_name>] -- bash

Then edit the desired files in the container and exit.

Now find the container ID for the container you just modified (look for the Container ID associated with the container):

kubectl -n openstack describe pod <pod name>
# For example:
kubectl -n openstack describe pod nova-compute-compute-0-31b0f4b3-2rqgf

The following must be done from the host where the container is running, logged in as the root user.

Restart the container you just modified:

docker restart <container ID>
# For example:
docker restart 12341234

Is there a way to look at container logs if all you've got is collected tarballs?

Sometimes if you're investigating a reported bug all you've got are the collected tarballs for each node but you want to see the logs for a particular container. The logs for the various containers that ran on each node are available in the captured logs under "var/log/containers". The captured logs are named according to the following pattern:

<pod_name>_<namespace>_<container_name>_<container_ID>.log