Jump to: navigation, search

Difference between revisions of "Heat/TroubleShooting"

(Remove outdated content)
 
(7 intermediate revisions by 6 users not shown)
Line 1: Line 1:
__NOTOC__
 
<<[[TableOfContents]]()>>
 
= Instances can't connect to the internet =
 
  
If your instances can't connect to the internet, ensure you have the following nova configuration settings:
 
 
 
<pre><nowiki>
 
flat_interface = em1
 
public_interface = em1
 
</nowiki></pre>
 
 
 
In this example, the "em1" interface is being used for an all-in-one openstack install, on Fedora, using the wired interface em1.  It may be necessary to adjust the interface name (e.g to wlan0 or eth0 depending on your OS and network configuration)
 
 
On folsom, it seems to be necessary to add some iptables rules to allow traffic to be correctly forwarded back to the instance, previously I think this was only required for EIP to work.
 
 
These config file options are required, or nova won't make the required iptables rules for the instance when it is created.
 
 
Also ensure IP forwarding is enabled (make this persistent e.g via /etc/sysctl.conf)
 
 
 
<pre><nowiki>
 
echo 1 > /proc/sys/net/ipv4/ip_forward
 
</nowiki></pre>
 
 
 
= [[OpenStack]] installation reports error of "unable to write random state" =
 
 
Ensure that if you are executing the openstack script as a non-root user (designed to be) that ~/.rnd is owned by that user.
 
 
= jeos_create fails with a timeout error during customization: =
 
 
The developers have found that running oz a bunch of times will eventually wedge the libvirt network interface in some way.  See libvirt bug [#813853](https://bugzilla.redhat.com/show_bug.cgi?id=813853).  One workaround while upstream fixes the bug is to restart the network interface for libvirt
 
 
 
<pre><nowiki>
 
virsh net-destroy default
 
 
virsh net-start default
 
</nowiki></pre>
 
 
If that above doesn't work, you might also check to see if there are zombied dnsmasq processes that need to be cleaned up.
 
 
Note using virsh one can log into the VM during oz customization using the credentials root / ozrootpw (unless a specific rootpw has been defined in the tdl).
 
 
= I didn't set a parameter correctly in heat and now the template I ran can't be deleted. =
 
 
Unfortunately the error checking on current heat needs a bit of work.  Because of a bug in heat, the templates are stored in the database before they are executed.  This makes sense conceptually, however, it causes problems when there are exceptions on create.  We will be fixing this bug shortly but in the meantime, it is necessary to drop the heat database and recreate it:
 
 
 
<pre><nowiki>
 
killall -9 heat-api
 
 
killall -9 heat-engine
 
 
tools/heat-db-drop
 
 
/usr/bin/heat-db-setup-fedora
 
</nowiki></pre>
 
 
 
= I get a vhost-net error when running jeos_create =
 
 
An example of the failure we have seen:
 
 
 
<pre><nowiki>
 
sudo -E heat jeos_create F16 x86_64 cfntools
 
 
Creating JEOS image (F16-x86_64-cfntools) - this takes approximately 10 minutes.
 
 
ERROR: internal error process exited while connecting to monitor: qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: vhost-net support is not compiled in
 
qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: vhost-net requested but could not be initialized
 
qemu-system-x86_64: -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31: Device 'tap' could not be initialized
 
 
 
(use -d3 to get the full backtrace)
 
 
oz-install did not create the image, check your oz installation.
 
</nowiki></pre>
 
 
 
This is caused when virtualization is not enabled in the BIOS.
 
 
= OZ takes 30 minutes to create a JEOS =
 
OZ does take awhile to run.  Fortunately it only has to be run once.  But if you're a developer, this may be irritating, especially as the JEOS image changes.  To speed up OZ operation, it is safe to add some directives to /etc/oz.cfg file.  Note these directives will cause more disk usage by the system.
 
 
<pre><nowiki>
 
[cache]
 
original_media = yes
 
modified_media = yes
 
jeos = yes
 
</nowiki></pre>
 
 
 
= You get "Quota exceeded: code=[[InstanceLimitExceeded]] (HTTP 413)" =
 
First make sure there are no un-deleted resources:
 
 
<pre><nowiki>
 
nova list
 
nova volume-list
 
</nowiki></pre>
 
 
Then if that is not the problem you might just need to increase your quota limits.
 
To display you current quotas:
 
 
<pre><nowiki>
 
nova-manage project quota admin
 
</nowiki></pre>
 
 
To increase the number of instances:
 
 
<pre><nowiki>
 
nova-manage project quota admin --key=instances --value=100
 
</nowiki></pre>
 
 
 
= Endpoint not found for heat =
 
If you receive an error as follows
 
 
<pre><nowiki>
 
[root@bigiron .openstack]# heat list
 
ERROR:Failed to list. Got error:
 
ERROR:Response from Keystone does not contain a Heat endpoint.
 
</nowiki></pre>
 
 
 
This problem indicates a problem with Keystone configuration.  This can be caused by not running heat-keystone-create (for F16/F17), not running heat-keystone-create-devstack (for U12), or not having sourced the keystone credentials before running those two scripts.
 
 
= Non-specific error with backtrace from heat-engine =
 
If you receive the error
 
 
<pre><nowiki>
 
[root@bigiron tools]# heat list
 
ERROR:Failed to list. Got error:
 
ERROR:Internal Server error: Internal Server Error
 
ERROR:
 
</nowiki></pre>
 
 
 
With a backtrace that looks like
 
 
<pre><nowiki>
 
 
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 80, in connect
 
    return dialect.connect(*cargs, **cparams)
 
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 281, in connect
 
    return self.dbapi.connect(*cargs, **cparams)
 
  File "/usr/lib64/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect
 
    return Connection(*args, **kwargs)
 
  File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__
 
    super(Connection, self).__init__(*args, **kwargs2)
 
OperationalError: (OperationalError) (1045, "Access denied for user 'heat'@'localhost' (using password: YES)") None None
 
.
 
----------------------------------------
 
</nowiki></pre>
 
 
This problem indicates the heat-db-setup script was not run.
 
 
= Malformed query response [[KeyName]] not registered =
 
If after creating a template with heat create, you receive the following error:
 
 
<pre><nowiki>
 
DEBUG:Debug level logging enabled
 
<CreateStackResult>
 
  <ValidateTemplateResult>
 
    <Description>Malformed Query Response {'Error': 'Provided KeyName is not registered with nova'}</Description>
 
    <Parameters/>
 
  </ValidateTemplateResult>
 
</CreateStackResult>
 
</nowiki></pre>
 
 
This problem indicates the SSH key specified in the create command was not registered with nova.  Have a look at the quickstart guide for registration instructions.
 
 
= I edited a template and it now doesn't work =
 
 
It's easy to introduce JSON syntax errors when editing templates, so this can be useful to identify what/where is broken:
 
 
 
<pre><nowiki>
 
cat foo.template | python -m json.tool
 
Expecting , delimiter: line 107 column 20 (char 4579)
 
</nowiki></pre>
 
 
 
= Nova starts creating instances which immediately go to ERROR state =
 
 
== Scheduler problem & workaround ==
 
 
If you suddenly find instances aren't being created and the nova list output indicates ERROR state, check the scheduler log:
 
 
 
<pre><nowiki>
 
==> /var/log/nova/scheduler.log <==
 
2012-08-02 15:29:34 WARNING nova.scheduler.manager [req-f7ea2e26-3c92-49a4-9610-c59216bb8111 af787dc6ab8a48a392aa5ddbbef38073 bf80a27b120e46bda2cb64e0123fea27] Failed to schedule_run_instance: No valid host was found.
 
2012-08-02 15:29:34 WARNING nova.scheduler.manager [req-f7ea2e26-3c92-49a4-9610-c59216bb8111 af787dc6ab8a48a392aa5ddbbef38073 bf80a27b120e46bda2cb64e0123fea27] Setting instance 18165ff9-25ae-4d01-8761-f414c86a0a64 to ERROR state.
 
</nowiki></pre>
 
 
 
The workaround seems to be to add "scheduler_default_filters=[[AllHostsFilter]]" to /etc/nova/nova.conf
 
See : https://answers.launchpad.net/nova/+question/192511
 
 
== Mysterious OOM behavior ==
 
 
If you see an error like this in the nova compute logs, and the instances go straight to ERROR state, it means that qemu failed to launch the instance.  In my case it was due to insufficient memory, but this is not made at all obvious by nova:
 
 
 
<pre><nowiki>
 
    ==> /var/log/nova/compute.log <==
 
    2012-08-02 16:18:18 TRACE nova.rpc.amqp libvirtError: Unable to read from monitor: Connection reset by peer
 
</nowiki></pre>
 
 
 
 
<pre><nowiki>
 
[root@heatlt heat]# tail -n2 /var/log/libvirt/qemu/instance-00000003.log
 
Failed to allocate 17179869184 B: Cannot allocate memory
 
2012-08-02 15:18:18.101+0000: shutting down
 
</nowiki></pre>
 
 
 
= Yum update fails with dependency problems related to the "oz" package =
 
 
If you built the git version of oz as described in the getting started guide, you may find that yum update will fail with dependency problems when the OS python packages are updated.  This is because the locally built oz RPM needs updating to match the new python version.
 
 
Workaround for this problem is to remove the oz package, update, then rebuild the oz package against the updated python version:
 
 
 
<pre><nowiki>
 
sudo yum remove oz
 
sudo yum update
 
 
# rebuild OZ as detailed in the getting started guide
 
cd ~/git/oz/
 
git pull
 
rm -f ~/rpmbuild/RPMS/noarch/oz-*
 
make rpm
 
 
sudo yum localinstall ~/rpmbuild/RPMS/noarch/oz-*.rpm
 
</nowiki></pre>
 
 
 
= qpidd fails to start =
 
 
As of qpid-cpp-server 0.16-5, the service scripts have been moved into the qpid-cpp-server-daemon package.
 
 
If you "yum update" to a from an earlier qpid-cpp-server version, starting openstack (via tools/openstack) will fail with an error like this:
 
 
 
<pre><nowiki>
 
[root@heatlt heat]# ./tools/openstack restart
 
Failed to issue method call: Unit qpidd.service failed to load: No such file or directory. See system logs and 'systemctl status qpidd.service' for details.
 
</nowiki></pre>
 
 
 
The fix is to install qpid-cpp-server-daemon and restart openstack
 
 
 
<pre><nowiki>
 
yum install qpid-cpp-server-daemon
 
tools/openstack restart
 
</nowiki></pre>
 
 
 
= Openstack daemons can't connect to qpidd =
 
error:
 
 
<pre><nowiki>
 
2012-10-31 22:54:11    DEBUG [qpid.messaging.io.raw] OPEN[216d758]: localhost:5672
 
2012-10-31 22:54:11  WARNING [qpid.messaging] recoverable error[attempt 1]: [Errno -9] Address family for hostname not supported
 
2012-10-31 22:54:11  WARNING [qpid.messaging] sleeping 1 seconds
 
</nowiki></pre>
 
 
edit /etc/hosts and comment out "::1"
 
It seems the lo interface doesn't have a v6 address
 

Latest revision as of 17:31, 29 May 2019