Heat/TroubleShooting
<<TableOfContents()>>
Instances can't connect to the internet
If your instances can't connect to the internet, ensure you have the following nova configuration settings:
flat_interface = em1 public_interface = em1
In this example, the "em1" interface is being used for an all-in-one openstack install, on Fedora, using the wired interface em1.
It may be necessary to adjust the interface name (e.g to wlan0 or eth0 depending on your OS and network configuration)
These config file options are required, or nova won't make the required iptables rules for the instance when it is created, so it won't be able to access the internet or other network resources.
Also ensure IP forwarding is enabled (make this persistent e.g via /etc/sysctl.conf)
echo 1 > /proc/sys/net/ipv4/ip_forward
OpenStack installation reports error of "unable to write random state"
Ensure that if you are executing the openstack script as a non-root user (designed to be) that ~/.rnd is owned by that user.
jeos_create fails with a timeout error during customization:
The developers have found that running oz a bunch of times will eventually wedge the libvirt network interface in some way. See libvirt bug [#813853](https://bugzilla.redhat.com/show_bug.cgi?id=813853). One workaround while upstream fixes the bug is to restart the network interface for libvirt
virsh net-destroy default virsh net-start default
If that above doesn't work, you might also check to see if there are zombied dnsmasq processes that need to be cleaned up.
Note using virsh one can log into the VM during oz customization using the credentials root / ozrootpw (unless a specific rootpw has been defined in the tdl).
I didn't set a parameter correctly in heat and now the template I ran can't be deleted.
Unfortunately the error checking on current heat needs a bit of work. Because of a bug in heat, the templates are stored in the database before they are executed. This makes sense conceptually, however, it causes problems when there are exceptions on create. We will be fixing this bug shortly but in the meantime, it is necessary to drop the heat database and recreate it:
killall -9 heat-api killall -9 heat-engine tools/heat-db-drop /usr/bin/heat-db-setup-fedora
I get a vhost-net error when running jeos_create
An example of the failure we have seen:
sudo -E heat jeos_create F16 x86_64 cfntools Creating JEOS image (F16-x86_64-cfntools) - this takes approximately 10 minutes. ERROR: internal error process exited while connecting to monitor: qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: vhost-net support is not compiled in qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: vhost-net requested but could not be initialized qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: Device 'tap' could not be initialized (use -d3 to get the full backtrace) oz-install did not create the image, check your oz installation.
This is caused when virtualization is not enabled in the BIOS.
OZ takes 30 minutes to create a JEOS
OZ does take awhile to run. Fortunately it only has to be run once. But if you're a developer, this may be irritating, especially as the JEOS image changes. To speed up OZ operation, it is safe to add some directives to /etc/oz.cfg file. Note these directives will cause more disk usage by the system.
[cache] original_media = yes modified_media = yes jeos = yes
You get "Quota exceeded: code=InstanceLimitExceeded (HTTP 413)"
First make sure there are no un-deleted resources:
nova list nova volume-list
Then if that is not the problem you might just need to increase your quota limits. To display you current quotas:
nova-manage project quota admin
To increase the number of instances:
nova-manage project quota admin --key=instances --value=100
Endpoint not found for heat
If you receive an error as follows
[root@bigiron .openstack]# heat list ERROR:Failed to list. Got error: ERROR:Response from Keystone does not contain a Heat endpoint.
This problem indicates a problem with Keystone configuration. This can be caused by not running heat-keystone-create (for F16/F17), not running heat-keystone-create-devstack (for U12), or not having sourced the keystone credentials before running those two scripts.
Non-specific error with backtrace from heat-engine
If you receive the error
[root@bigiron tools]# heat list ERROR:Failed to list. Got error: ERROR:Internal Server error: Internal Server Error ERROR:
With a backtrace that looks like
File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 80, in connect return dialect.connect(*cargs, **cparams) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 281, in connect return self.dbapi.connect(*cargs, **cparams) File "/usr/lib64/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect return Connection(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__ super(Connection, self).__init__(*args, **kwargs2) OperationalError: (OperationalError) (1045, "Access denied for user 'heat'@'localhost' (using password: YES)") None None . ----------------------------------------
This problem indicates the heat-db-setup script was not run.
Malformed query response KeyName not registered
If after creating a template with heat create, you receive the following error:
DEBUG:Debug level logging enabled <CreateStackResult> <ValidateTemplateResult> <Description>Malformed Query Response {'Error': 'Provided KeyName is not registered with nova'}</Description> <Parameters/> </ValidateTemplateResult> </CreateStackResult>
This problem indicates the SSH key specified in the create command was not registered with nova. Have a look at the quickstart guide for registration instructions.
I edited a template and it now doesn't work
It's easy to introduce JSON syntax errors when editing templates, so this can be useful to identify what/where is broken:
cat foo.template | python -m json.tool Expecting , delimiter: line 107 column 20 (char 4579)
Nova starts creating instances which immediately go to ERROR state
Scheduler problem & workaround
If you suddenly find instances aren't being created and the nova list output indicates ERROR state, check the scheduler log:
==> /var/log/nova/scheduler.log <== 2012-08-02 15:29:34 WARNING nova.scheduler.manager [req-f7ea2e26-3c92-49a4-9610-c59216bb8111 af787dc6ab8a48a392aa5ddbbef38073 bf80a27b120e46bda2cb64e0123fea27] Failed to schedule_run_instance: No valid host was found. 2012-08-02 15:29:34 WARNING nova.scheduler.manager [req-f7ea2e26-3c92-49a4-9610-c59216bb8111 af787dc6ab8a48a392aa5ddbbef38073 bf80a27b120e46bda2cb64e0123fea27] Setting instance 18165ff9-25ae-4d01-8761-f414c86a0a64 to ERROR state.
The workaround seems to be to add "scheduler_default_filters=AllHostsFilter" to /etc/nova/nova.conf
See : https://answers.launchpad.net/nova/+question/192511
Mysterious OOM behavior
If you see an error like this in the nova compute logs, and the instances go straight to ERROR state, it means that qemu failed to launch the instance. In my case it was due to insufficient memory, but this is not made at all obvious by nova:
==> /var/log/nova/compute.log <== 2012-08-02 16:18:18 TRACE nova.rpc.amqp libvirtError: Unable to read from monitor: Connection reset by peer
[root@heatlt heat]# tail -n2 /var/log/libvirt/qemu/instance-00000003.log Failed to allocate 17179869184 B: Cannot allocate memory 2012-08-02 15:18:18.101+0000: shutting down
If you built the git version of oz as described in the getting started guide, you may find that yum update will fail with dependency problems when the OS python packages are updated. This is because the locally built oz RPM needs updating to match the new python version.
Workaround for this problem is to remove the oz package, update, then rebuild the oz package against the updated python version:
sudo yum remove oz sudo yum update # rebuild OZ as detailed in the getting started guide cd ~/git/oz/ git pull rm -f ~/rpmbuild/RPMS/noarch/oz-* make rpm sudo yum localinstall ~/rpmbuild/RPMS/noarch/oz-*.rpm
qpidd fails to start
As of qpid-cpp-server 0.16-5, the service scripts have been moved into the qpid-cpp-server-daemon package.
If you "yum update" to a from an earlier qpid-cpp-server version, starting openstack (via tools/openstack) will fail with an error like this:
[root@heatlt heat]# ./tools/openstack restart Failed to issue method call: Unit qpidd.service failed to load: No such file or directory. See system logs and 'systemctl status qpidd.service' for details.
The fix is to install qpid-cpp-server-daemon and restart openstack
yum install qpid-cpp-server-daemon tools/openstack restart
Openstack daemons can't connect to qpidd
error:
2012-10-31 22:54:11 DEBUG [qpid.messaging.io.raw] OPEN[216d758]: localhost:5672 2012-10-31 22:54:11 WARNING [qpid.messaging] recoverable error[attempt 1]: [Errno -9] Address family for hostname not supported 2012-10-31 22:54:11 WARNING [qpid.messaging] sleeping 1 seconds
edit /etc/hosts and comment out "::1" It seems the lo interface doesn't have a v6 address