Heat/TroubleShooting

= Instances can't connect to the internet =

If your instances can't connect to the internet, ensure you have the following nova configuration settings:

flat_interface = em1 public_interface = em1

In this example, the "em1" interface is being used for an all-in-one openstack install, on Fedora, using the wired interface em1.

It may be necessary to adjust the interface name (e.g to wlan0 or eth0 depending on your OS and network configuration)

These config file options are required, or nova won't make the required iptables rules for the instance when it is created, so it won't be able to access the internet or other network resources.

Also ensure IP forwarding is enabled (make this persistent e.g via /etc/sysctl.conf)

echo 1 > /proc/sys/net/ipv4/ip_forward

= OpenStack installation reports error of "unable to write random state" =

Ensure that if you are executing the openstack script as a non-root user (designed to be) that ~/.rnd is owned by that user.

= jeos_create fails with a timeout error during customization: =

The developers have found that running oz a bunch of times will eventually wedge the libvirt network interface in some way. Oz needs a working network to update the vm with cloud init and various other updates. See libvirt bug [#813853](https://bugzilla.redhat.com/show_bug.cgi?id=813853). One workaround while upstream fixes the bug is to restart the network interface for libvirt

sudo virsh net-destroy default

sudo virsh net-start default

If that above doesn't work, you might also check to see if there are zombied dnsmasq processes that need to be cleaned up.

Note using virsh one can log into the VM during oz customization using the credentials root / ozrootpw (unless a specific rootpw has been defined in the tdl).

= I didn't set a parameter correctly in heat and now the template I ran can't be deleted. =

Unfortunately the error checking on current heat needs a bit of work. Because of a bug in heat, the templates are stored in the database before they are executed. This makes sense conceptually, however, it causes problems when there are exceptions on create. We will be fixing this bug shortly but in the meantime, it is necessary to drop the heat database and recreate it:

killall -9 heat-api

killall -9 heat-engine

tools/heat-db-drop -r

heat-manage db_sync

As alternative to dropping the whole DB, you can try marking as COMPLETE the resource showing DELETE_FAILED from 'heat resource-list ', then re-try deleting the stack (the SELECT below will show the type of resource and its uuid, so that you can manually delete it).

At the mysql host, do: mysql -uroot -r heat
 * 1) Connect to the heat DB

-- At the mysql> prompt, do: BEGIN; SELECT distinct s.name as s_name, r.name as resource_type, r.nova_instance as physical_resource_id, r.uuid as r_uuid, r.action as r_action, r.status as r_status from resource join (stack as s, resource as r) on r.stack_id=s.id where s.action='DELETE' and s.status='FAILED' and r.action='DELETE' and r.status='FAILED'; UPDATE resource set status='COMPLETE' where action='DELETE' and status='FAILED'; -- Verify that the number of updated rows matches the output from: heat resource-list, then: COMMIT;

heat resource-list heat stack-delete
 * 1) Retry deleting the problematic stack:

= I get a vhost-net error when running jeos_create =

An example of the failure we have seen:

sudo -E heat jeos_create F16 x86_64 cfntools Creating JEOS image (F16-x86_64-cfntools) - this takes approximately 10 minutes. ERROR: internal error process exited while connecting to monitor: qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: vhost-net support is not compiled in qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: vhost-net requested but could not be initialized qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: Device 'tap' could not be initialized (use -d3 to get the full backtrace) oz-install did not create the image, check your oz installation.

This is caused when virtualization is not enabled in the BIOS.

= OZ takes 30 minutes to create a JEOS = OZ does take awhile to run. Fortunately it only has to be run once. But if you're a developer, this may be irritating, especially as the JEOS image changes. To speed up OZ operation, it is safe to add some directives to /etc/oz.cfg file. Note these directives will cause more disk usage by the system.

[cache] original_media = yes modified_media = yes jeos = yes

= You get "Quota exceeded: code=InstanceLimitExceeded (HTTP 413)" = First make sure there are no un-deleted resources:

nova list nova volume-list

Then if that is not the problem you might just need to increase your quota limits. To display you current quotas:

nova-manage project quota admin

To increase the number of instances:

nova-manage project quota admin --key=instances --value=100

= Endpoint not found for heat = If you receive an error as follows

[root@bigiron .openstack]# heat list ERROR:Failed to list. Got error: ERROR:Response from Keystone does not contain a Heat endpoint.

This problem indicates a problem with Keystone configuration. This can be caused by not running heat-keystone-create (for F16/F17), not running heat-keystone-create-devstack (for U12), or not having sourced the keystone credentials before running those two scripts.

= Non-specific error with backtrace from heat-engine = If you receive the error

[root@bigiron tools]# heat list ERROR:Failed to list. Got error: ERROR:Internal Server error: Internal Server Error ERROR:

With a backtrace that looks like

File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 80, in connect return dialect.connect(*cargs, **cparams) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 281, in connect return self.dbapi.connect(*cargs, **cparams) File "/usr/lib64/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect return Connection(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__ super(Connection, self).__init__(*args, **kwargs2) OperationalError: (OperationalError) (1045, "Access denied for user 'heat'@'localhost' (using password: YES)") None None .

This problem indicates the heat-db-setup script was not run.

= Malformed query response KeyName not registered = If after creating a template with heat create, you receive the following error:

DEBUG:Debug level logging enabled   Malformed Query Response {'Error': 'Provided KeyName is not registered with nova'}   

This problem indicates the SSH key specified in the create command was not registered with nova. Have a look at the quickstart guide for registration instructions.

= I edited a template and it now doesn't work =

It's easy to introduce JSON syntax errors when editing templates, so this can be useful to identify what/where is broken:

cat foo.template | python -m json.tool Expecting, delimiter: line 107 column 20 (char 4579)

= Nova starts creating instances which immediately go to ERROR state =

Scheduler problem & workaround
If you suddenly find instances aren't being created and the nova list output indicates ERROR state, check the scheduler log:

> /var/log/nova/scheduler.log <
2012-08-02 15:29:34 WARNING nova.scheduler.manager [req-f7ea2e26-3c92-49a4-9610-c59216bb8111 af787dc6ab8a48a392aa5ddbbef38073 bf80a27b120e46bda2cb64e0123fea27] Failed to schedule_run_instance: No valid host was found. 2012-08-02 15:29:34 WARNING nova.scheduler.manager [req-f7ea2e26-3c92-49a4-9610-c59216bb8111 af787dc6ab8a48a392aa5ddbbef38073 bf80a27b120e46bda2cb64e0123fea27] Setting instance 18165ff9-25ae-4d01-8761-f414c86a0a64 to ERROR state.

The workaround seems to be to add "scheduler_default_filters=AllHostsFilter" to /etc/nova/nova.conf See : https://answers.launchpad.net/nova/+question/192511

Mysterious OOM behavior
If you see an error like this in the nova compute logs, and the instances go straight to ERROR state, it means that qemu failed to launch the instance. In my case it was due to insufficient memory, but this is not made at all obvious by nova:

==> /var/log/nova/compute.log <== 2012-08-02 16:18:18 TRACE nova.rpc.amqp libvirtError: Unable to read from monitor: Connection reset by peer

[root@heatlt heat]# tail -n2 /var/log/libvirt/qemu/instance-00000003.log Failed to allocate 17179869184 B: Cannot allocate memory 2012-08-02 15:18:18.101+0000: shutting down

= Yum update fails with dependency problems related to the "oz" package =

If you built the git version of oz as described in the getting started guide, you may find that yum update will fail with dependency problems when the OS python packages are updated. This is because the locally built oz RPM needs updating to match the new python version.

Workaround for this problem is to remove the oz package, update, then rebuild the oz package against the updated python version:

sudo yum remove oz sudo yum update

cd ~/git/oz/ git pull rm -f ~/rpmbuild/RPMS/noarch/oz-* make rpm
 * 1) rebuild OZ as detailed in the getting started guide

sudo yum localinstall ~/rpmbuild/RPMS/noarch/oz-*.rpm

= qpidd fails to start =

As of qpid-cpp-server 0.16-5, the service scripts have been moved into the qpid-cpp-server-daemon package.

If you "yum update" to a from an earlier qpid-cpp-server version, starting openstack (via tools/openstack) will fail with an error like this:

[root@heatlt heat]# ./tools/openstack restart Failed to issue method call: Unit qpidd.service failed to load: No such file or directory. See system logs and 'systemctl status qpidd.service' for details.

The fix is to install qpid-cpp-server-daemon and restart openstack

yum install qpid-cpp-server-daemon tools/openstack restart

= Openstack daemons can't connect to qpidd = error:

2012-10-31 22:54:11   DEBUG [qpid.messaging.io.raw] OPEN[216d758]: localhost:5672 2012-10-31 22:54:11 WARNING [qpid.messaging] recoverable error[attempt 1]: [Errno -9] Address family for hostname not supported 2012-10-31 22:54:11 WARNING [qpid.messaging] sleeping 1 seconds

edit /etc/hosts and comment out "::1" It seems the lo interface doesn't have a v6 address

= Ubuntu guests can't receive dhcp assignments from Fedora/RHEL hosts = An iptables rule may be required for some guests to receive their DHCP assignments, according to Checksum correction for older DHCP clients

iptables -A POSTROUTING -t mangle -p udp --dport 68 -j CHECKSUM --checksum-fill