Jump to: navigation, search

Difference between revisions of "Heat/TroubleShooting"

(Remove outdated content)
Line 1: Line 1:
= Instances can't connect to the internet =
If your instances can't connect to the internet, ensure you have the following nova configuration settings:
flat_interface = em1
public_interface = em1
In this example, the "em1" interface is being used for an all-in-one openstack install, on Fedora, using the wired interface em1. 
It may be necessary to adjust the interface name (e.g to wlan0 or eth0 depending on your OS and network configuration)
These config file options are required, or nova won't make the required iptables rules for the instance when it is created, so it won't be able to access the internet or other network resources.
Also ensure IP forwarding is enabled (make this persistent e.g via /etc/sysctl.conf)
echo 1 > /proc/sys/net/ipv4/ip_forward
= [[OpenStack]] installation reports error of "unable to write random state" =
Ensure that if you are executing the openstack script as a non-root user (designed to be) that ~/.rnd is owned by that user.
= jeos_create fails with a timeout error during customization: =
The developers have found that running oz a bunch of times will eventually wedge the libvirt network interface in some way.  Oz needs a working network to update the vm with cloud init and various other updates.  See libvirt bug [#813853](https://bugzilla.redhat.com/show_bug.cgi?id=813853).  One workaround while upstream fixes the bug is to restart the network interface for libvirt
sudo virsh net-destroy default
sudo virsh net-start default
If that above doesn't work, you might also check to see if there are zombied dnsmasq processes that need to be cleaned up.
Note using virsh one can log into the VM during oz customization using the credentials root / ozrootpw (unless a specific rootpw has been defined in the tdl).
= I didn't set a parameter correctly in heat and now the template I ran can't be deleted. =
Unfortunately the error checking on current heat needs a bit of work.  Because of a bug in heat, the templates are stored in the database before they are executed.  This makes sense conceptually, however, it causes problems when there are exceptions on create.  We will be fixing this bug shortly but in the meantime, it is necessary to drop the heat database and recreate it:
killall -9 heat-api
killall -9 heat-engine
tools/heat-db-drop -r <mysql root password>
heat-manage db_sync
As alternative to dropping the whole DB, you can try marking as COMPLETE the resource showing DELETE_FAILED from 'heat resource-list <stackname>', then re-try deleting the stack (the SELECT below will show the type of resource and its uuid, so that you can manually delete it).
At the mysql host, do:
# Connect to the heat DB
mysql -uroot -r<mysql root password> heat
-- At the mysql> prompt, do:
SELECT distinct s.name as s_name, r.name as resource_type, r.nova_instance as physical_resource_id, r.uuid as r_uuid, r.action as r_action, r.status as r_status from resource join (stack as s, resource as r) on r.stack_id=s.id where s.action='DELETE' and s.status='FAILED' and r.action='DELETE' and r.status='FAILED';
UPDATE resource set status='COMPLETE' where action='DELETE' and status='FAILED';
-- Verify that the number of updated rows matches the output from: heat resource-list <stackname>, then:
# Retry deleting the problematic stack:
heat resource-list <stackname>
heat stack-delete <stackname>
= I get a vhost-net error when running jeos_create =
An example of the failure we have seen:
sudo -E heat jeos_create F16 x86_64 cfntools
Creating JEOS image (F16-x86_64-cfntools) - this takes approximately 10 minutes.
ERROR: internal error process exited while connecting to monitor: qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: vhost-net support is not compiled in
qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: vhost-net requested but could not be initialized
qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: Device 'tap' could not be initialized
(use -d3 to get the full backtrace)
oz-install did not create the image, check your oz installation.
This is caused when virtualization is not enabled in the BIOS.
= OZ takes 30 minutes to create a JEOS =
OZ does take awhile to run.  Fortunately it only has to be run once.  But if you're a developer, this may be irritating, especially as the JEOS image changes.  To speed up OZ operation, it is safe to add some directives to /etc/oz.cfg file.  Note these directives will cause more disk usage by the system.
original_media = yes
modified_media = yes
jeos = yes
= You get "Quota exceeded: code=[[InstanceLimitExceeded]] (HTTP 413)" =
First make sure there are no un-deleted resources:
nova list
nova volume-list
Then if that is not the problem you might just need to increase your quota limits.
To display you current quotas:
nova-manage project quota admin
To increase the number of instances:
nova-manage project quota admin --key=instances --value=100
= Endpoint not found for heat =
If you receive an error as follows
[root@bigiron .openstack]# heat list
ERROR:Failed to list. Got error:
ERROR:Response from Keystone does not contain a Heat endpoint.
This problem indicates a problem with Keystone configuration.  This can be caused by not running heat-keystone-create (for F16/F17), not running heat-keystone-create-devstack (for U12), or not having sourced the keystone credentials before running those two scripts.
= Non-specific error with backtrace from heat-engine =
If you receive the error
[root@bigiron tools]# heat list
ERROR:Failed to list. Got error:
ERROR:Internal Server error: Internal Server Error
With a backtrace that looks like
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 80, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 281, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect
    return Connection(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
OperationalError: (OperationalError) (1045, "Access denied for user 'heat'@'localhost' (using password: YES)") None None
This problem indicates the heat-db-setup script was not run.
= Malformed query response [[KeyName]] not registered =
If after creating a template with heat create, you receive the following error:
DEBUG:Debug level logging enabled
    <Description>Malformed Query Response {'Error': 'Provided KeyName is not registered with nova'}</Description>
This problem indicates the SSH key specified in the create command was not registered with nova.  Have a look at the quickstart guide for registration instructions.
= I edited a template and it now doesn't work =
It's easy to introduce JSON syntax errors when editing templates, so this can be useful to identify what/where is broken:
cat foo.template | python -m json.tool
Expecting , delimiter: line 107 column 20 (char 4579)
= Nova starts creating instances which immediately go to ERROR state =
== Scheduler problem & workaround ==
If you suddenly find instances aren't being created and the nova list output indicates ERROR state, check the scheduler log:
==> /var/log/nova/scheduler.log <==
2012-08-02 15:29:34 WARNING nova.scheduler.manager [req-f7ea2e26-3c92-49a4-9610-c59216bb8111 af787dc6ab8a48a392aa5ddbbef38073 bf80a27b120e46bda2cb64e0123fea27] Failed to schedule_run_instance: No valid host was found.
2012-08-02 15:29:34 WARNING nova.scheduler.manager [req-f7ea2e26-3c92-49a4-9610-c59216bb8111 af787dc6ab8a48a392aa5ddbbef38073 bf80a27b120e46bda2cb64e0123fea27] Setting instance 18165ff9-25ae-4d01-8761-f414c86a0a64 to ERROR state.
The workaround seems to be to add "scheduler_default_filters=[[AllHostsFilter]]" to /etc/nova/nova.conf
See : https://answers.launchpad.net/nova/+question/192511
== Mysterious OOM behavior ==
If you see an error like this in the nova compute logs, and the instances go straight to ERROR state, it means that qemu failed to launch the instance.  In my case it was due to insufficient memory, but this is not made at all obvious by nova:
    ==> /var/log/nova/compute.log <==
    2012-08-02 16:18:18 TRACE nova.rpc.amqp libvirtError: Unable to read from monitor: Connection reset by peer
[root@heatlt heat]# tail -n2 /var/log/libvirt/qemu/instance-00000003.log
Failed to allocate 17179869184 B: Cannot allocate memory
2012-08-02 15:18:18.101+0000: shutting down
= Yum update fails with dependency problems related to the "oz" package =
If you built the git version of oz as described in the getting started guide, you may find that yum update will fail with dependency problems when the OS python packages are updated.  This is because the locally built oz RPM needs updating to match the new python version.
Workaround for this problem is to remove the oz package, update, then rebuild the oz package against the updated python version:
sudo yum remove oz
sudo yum update
# rebuild OZ as detailed in the getting started guide
cd ~/git/oz/
git pull
rm -f ~/rpmbuild/RPMS/noarch/oz-*
make rpm
sudo yum localinstall ~/rpmbuild/RPMS/noarch/oz-*.rpm
= qpidd fails to start =
As of qpid-cpp-server 0.16-5, the service scripts have been moved into the qpid-cpp-server-daemon package.
If you "yum update" to a from an earlier qpid-cpp-server version, starting openstack (via tools/openstack) will fail with an error like this:
[root@heatlt heat]# ./tools/openstack restart
Failed to issue method call: Unit qpidd.service failed to load: No such file or directory. See system logs and 'systemctl status qpidd.service' for details.
The fix is to install qpid-cpp-server-daemon and restart openstack
yum install qpid-cpp-server-daemon
tools/openstack restart
= Openstack daemons can't connect to qpidd =
2012-10-31 22:54:11    DEBUG [qpid.messaging.io.raw] OPEN[216d758]: localhost:5672
2012-10-31 22:54:11  WARNING [qpid.messaging] recoverable error[attempt 1]: [Errno -9] Address family for hostname not supported
2012-10-31 22:54:11  WARNING [qpid.messaging] sleeping 1 seconds
edit /etc/hosts and comment out "::1"
It seems the lo interface doesn't have a v6 address
= Ubuntu guests can't receive dhcp assignments from Fedora/RHEL hosts =
An iptables rule may be required for some guests to receive their DHCP assignments, according to 
[https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/ch11s02.html#id3067547 Checksum correction for older DHCP clients]
iptables -A POSTROUTING -t mangle -p udp --dport 68 -j CHECKSUM --checksum-fill

Latest revision as of 17:31, 29 May 2019