Jump to: navigation, search

Difference between revisions of "Heat/TroubleShooting"

Line 12: Line 12:
  
  
In this example, the "em1" interface is being used for an all-in-one openstack install, on Fedora, using the wired interface em1.  It may be necessary to adjust the interface name (e.g to wlan0 or eth0 depending on your OS and network configuration)
+
In this example, the "em1" interface is being used for an all-in-one openstack install, on Fedora, using the wired interface em1.   
  
On folsom, it seems to be necessary to add some iptables rules to allow traffic to be correctly forwarded back to the instance, previously I think this was only required for EIP to work.
+
It may be necessary to adjust the interface name (e.g to wlan0 or eth0 depending on your OS and network configuration)
  
These config file options are required, or nova won't make the required iptables rules for the instance when it is created.
+
These config file options are required, or nova won't make the required iptables rules for the instance when it is created, so it won't be able to access the internet or other network resources.
  
 
Also ensure IP forwarding is enabled (make this persistent e.g via /etc/sysctl.conf)
 
Also ensure IP forwarding is enabled (make this persistent e.g via /etc/sysctl.conf)

Revision as of 16:17, 2 January 2013

<<TableOfContents()>>

Instances can't connect to the internet

If your instances can't connect to the internet, ensure you have the following nova configuration settings:


flat_interface = em1
public_interface = em1


In this example, the "em1" interface is being used for an all-in-one openstack install, on Fedora, using the wired interface em1.

It may be necessary to adjust the interface name (e.g to wlan0 or eth0 depending on your OS and network configuration)

These config file options are required, or nova won't make the required iptables rules for the instance when it is created, so it won't be able to access the internet or other network resources.

Also ensure IP forwarding is enabled (make this persistent e.g via /etc/sysctl.conf)


echo 1 > /proc/sys/net/ipv4/ip_forward


OpenStack installation reports error of "unable to write random state"

Ensure that if you are executing the openstack script as a non-root user (designed to be) that ~/.rnd is owned by that user.

jeos_create fails with a timeout error during customization:

The developers have found that running oz a bunch of times will eventually wedge the libvirt network interface in some way. See libvirt bug [#813853](https://bugzilla.redhat.com/show_bug.cgi?id=813853). One workaround while upstream fixes the bug is to restart the network interface for libvirt


virsh net-destroy default

virsh net-start default

If that above doesn't work, you might also check to see if there are zombied dnsmasq processes that need to be cleaned up.

Note using virsh one can log into the VM during oz customization using the credentials root / ozrootpw (unless a specific rootpw has been defined in the tdl).

I didn't set a parameter correctly in heat and now the template I ran can't be deleted.

Unfortunately the error checking on current heat needs a bit of work. Because of a bug in heat, the templates are stored in the database before they are executed. This makes sense conceptually, however, it causes problems when there are exceptions on create. We will be fixing this bug shortly but in the meantime, it is necessary to drop the heat database and recreate it:


killall -9 heat-api

killall -9 heat-engine

tools/heat-db-drop 

/usr/bin/heat-db-setup-fedora


I get a vhost-net error when running jeos_create

An example of the failure we have seen:


sudo -E heat jeos_create F16 x86_64 cfntools
 
Creating JEOS image (F16-x86_64-cfntools) - this takes approximately 10 minutes.
 
ERROR: internal error process exited while connecting to monitor: qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: vhost-net support is not compiled in
qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: vhost-net requested but could not be initialized
qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: Device 'tap' could not be initialized
 
 
(use -d3 to get the full backtrace)
 
oz-install did not create the image, check your oz installation.


This is caused when virtualization is not enabled in the BIOS.

OZ takes 30 minutes to create a JEOS

OZ does take awhile to run. Fortunately it only has to be run once. But if you're a developer, this may be irritating, especially as the JEOS image changes. To speed up OZ operation, it is safe to add some directives to /etc/oz.cfg file. Note these directives will cause more disk usage by the system.

[cache]
original_media = yes
modified_media = yes
jeos = yes


You get "Quota exceeded: code=InstanceLimitExceeded (HTTP 413)"

First make sure there are no un-deleted resources:

nova list
nova volume-list

Then if that is not the problem you might just need to increase your quota limits. To display you current quotas:

nova-manage project quota admin

To increase the number of instances:

nova-manage project quota admin --key=instances --value=100


Endpoint not found for heat

If you receive an error as follows

[root@bigiron .openstack]# heat list
ERROR:Failed to list. Got error:
ERROR:Response from Keystone does not contain a Heat endpoint.


This problem indicates a problem with Keystone configuration. This can be caused by not running heat-keystone-create (for F16/F17), not running heat-keystone-create-devstack (for U12), or not having sourced the keystone credentials before running those two scripts.

Non-specific error with backtrace from heat-engine

If you receive the error

[root@bigiron tools]# heat list
ERROR:Failed to list. Got error:
ERROR:Internal Server error: Internal Server Error
ERROR:


With a backtrace that looks like


  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 80, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 281, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect
    return Connection(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
OperationalError: (OperationalError) (1045, "Access denied for user 'heat'@'localhost' (using password: YES)") None None
.
----------------------------------------

This problem indicates the heat-db-setup script was not run.

Malformed query response KeyName not registered

If after creating a template with heat create, you receive the following error:

DEBUG:Debug level logging enabled
<CreateStackResult>
  <ValidateTemplateResult>
    <Description>Malformed Query Response {'Error': 'Provided KeyName is not registered with nova'}</Description>
    <Parameters/>
  </ValidateTemplateResult>
</CreateStackResult>

This problem indicates the SSH key specified in the create command was not registered with nova. Have a look at the quickstart guide for registration instructions.

I edited a template and it now doesn't work

It's easy to introduce JSON syntax errors when editing templates, so this can be useful to identify what/where is broken:


cat foo.template | python -m json.tool
Expecting , delimiter: line 107 column 20 (char 4579)


Nova starts creating instances which immediately go to ERROR state

Scheduler problem & workaround

If you suddenly find instances aren't being created and the nova list output indicates ERROR state, check the scheduler log:


==> /var/log/nova/scheduler.log <==
2012-08-02 15:29:34 WARNING nova.scheduler.manager [req-f7ea2e26-3c92-49a4-9610-c59216bb8111 af787dc6ab8a48a392aa5ddbbef38073 bf80a27b120e46bda2cb64e0123fea27] Failed to schedule_run_instance: No valid host was found. 
2012-08-02 15:29:34 WARNING nova.scheduler.manager [req-f7ea2e26-3c92-49a4-9610-c59216bb8111 af787dc6ab8a48a392aa5ddbbef38073 bf80a27b120e46bda2cb64e0123fea27] Setting instance 18165ff9-25ae-4d01-8761-f414c86a0a64 to ERROR state.


The workaround seems to be to add "scheduler_default_filters=AllHostsFilter" to /etc/nova/nova.conf See : https://answers.launchpad.net/nova/+question/192511

Mysterious OOM behavior

If you see an error like this in the nova compute logs, and the instances go straight to ERROR state, it means that qemu failed to launch the instance. In my case it was due to insufficient memory, but this is not made at all obvious by nova:


    ==> /var/log/nova/compute.log <==
    2012-08-02 16:18:18 TRACE nova.rpc.amqp libvirtError: Unable to read from monitor: Connection reset by peer


[root@heatlt heat]# tail -n2 /var/log/libvirt/qemu/instance-00000003.log 
Failed to allocate 17179869184 B: Cannot allocate memory
2012-08-02 15:18:18.101+0000: shutting down


Yum update fails with dependency problems related to the "oz" package

If you built the git version of oz as described in the getting started guide, you may find that yum update will fail with dependency problems when the OS python packages are updated. This is because the locally built oz RPM needs updating to match the new python version.

Workaround for this problem is to remove the oz package, update, then rebuild the oz package against the updated python version:


sudo yum remove oz
sudo yum update

# rebuild OZ as detailed in the getting started guide
cd ~/git/oz/
git pull
rm -f ~/rpmbuild/RPMS/noarch/oz-*
make rpm

sudo yum localinstall ~/rpmbuild/RPMS/noarch/oz-*.rpm


qpidd fails to start

As of qpid-cpp-server 0.16-5, the service scripts have been moved into the qpid-cpp-server-daemon package.

If you "yum update" to a from an earlier qpid-cpp-server version, starting openstack (via tools/openstack) will fail with an error like this:


[root@heatlt heat]# ./tools/openstack restart
Failed to issue method call: Unit qpidd.service failed to load: No such file or directory. See system logs and 'systemctl status qpidd.service' for details.


The fix is to install qpid-cpp-server-daemon and restart openstack


yum install qpid-cpp-server-daemon
tools/openstack restart


Openstack daemons can't connect to qpidd

error:

2012-10-31 22:54:11    DEBUG [qpid.messaging.io.raw] OPEN[216d758]: localhost:5672
2012-10-31 22:54:11  WARNING [qpid.messaging] recoverable error[attempt 1]: [Errno -9] Address family for hostname not supported
2012-10-31 22:54:11  WARNING [qpid.messaging] sleeping 1 seconds

edit /etc/hosts and comment out "::1" It seems the lo interface doesn't have a v6 address