Difference between revisions of "Documentation/HypervisorTuningGuide"

Revision as of 04:15, 12 November 2015

About the Hypervisor Tuning Guide

The goal of the Hypervisor Tuning Guide (HTG) is to provide cloud operators with detailed instructions and settings to get the best performance out of their hypervisors.

This guide is broken into four major sections:

CPU
Memory
Network
Disk

Each section has tuning information for the following areas:

Symptoms of being (CPU, Memory, Network, Disk) bound
General hardware recommendations
Operating System configuration
Hypervisor configuration
OpenStack configuration
Instance and Image configuration
Validation, benchmarking, and reporting

How to Contribute

Simply add your knowledge to this wiki page! The HTG does not yet have a formal documentation repository. It's still very much in initial stages.

Understanding Your Workload

I imagine this section to be the most theoretical / high level out of the entire guide.

References

https://docs.mirantis.com/openstack/fuel/fuel-6.1/planning-guide.html#hardware-calculation

CPU

Introduction about CPU.

Symptoms of Being CPU Bound

Raw CPU, past 80%
Idle percentage is less than 20
When load is very high, it's usually a disk IO and not CPU
load can be very tricky to figure out
steal time: when high on the guest, indication that the hypervisor is busy

General Hardware Recommendations

Hyperthreading

Virtual router application is better with HT turned off (network-specific workloads?)
thread policies can also be important (prefer/avoid) - hopefully a mitaka enhancement
NUMA?
- http://docs.openstack.org/developer/nova/testing/libvirt-numa.html
CPU pinning

Notable CPU flags

nested cpu for virtualization within a guest
- may have issues with older kernel version: nested vms would lock up

Operating System Configuration

Linux

exclude cores, dedicate cores / cpus specifically for certain OS tasks
- iso cpu
- see rh blog post below
reasonable increase in performance by compiling own kernels
turn off cpu scaling - run at full frequency

Windows

virtio drivers

Hypervisor Configuration

KVM / libvirt

Xen

VMWare

Hyper-V

has numa spanning enabled by default, should be disabled for performance, caveat with restarting instance

OpenStack Configuration

host-passthrough is always faster than host-model or custom
- This needs to have a warning that migrations will be impossible if non-identical compute nodes are added later

Overcommitting

Generally, it's safe to overcommit CPUs. It has been reported that the main reason not to overcommit CPU is because of not overcommitting memory.
RAM overcommit, particularly with KSM, has a CPU hit as well

Instance and Image Configuration

CPU quotas and shares
- Reported use-case: default of 80% on all flavors, if workloads are very cpu heavy, don't do.

Hyper-v enlightenment features
Hyper-v gen 2 vms are seen to be faster than gen 1, reason?

Validation, Benchmarking, and Reporting

General Tools

top
vmstat
htop

Benchmarking Tools

phoronix

Metrics

System

CPU: user, system, iowait, irq, soft irq

Instance

nova diagnostics
Do not record per-process stats - explain why
overlaying cputime vs allocated cpu

Memory

Symptoms of Being Memory Bound

OOM Killer
Out of swap

General Hardware Recommendations

ensure numa distribution is balanced
memory speeds, vary by chip

Operating System Configuration

Linux

Kernel Tunables

Transparent Hugepages can go either way depending on workload

KSM

Might often cause performance (CPU) problems, better to turn it off

Windows

Hypervisor Configuration

KVM / libvirt

nova enables ballooning but doesn't actually use it
- nova would need something doing the equivalent of MOM in oVirt to "exercise" the balloon:
- http://www.ovirt.org/MoM
reserved_host_memory_mb (defaults: 512 mb which is too low for the real world)
Turn on/off EPT (see blog post)

Xen

VMWare

Hyper-V

OpenStack Configuration

Overcommitting =

Memory Overcommit & the cost of swapping

Instance and Image Configuration

ensure ballooning is enabled / available
guests cannot see memory speed - not exposed like cpu flags are

Validation, Benchmarking, and Reporting

General Tools

free

Benchmarking

stream

Metrics

System

page in, page out, page scans per second, `free`

Instance

nova diagnostics
virsh

Network

Symptoms of Being Network Bound

from guest: soft irq will be high
high io wait for network-based instance disk
discards on switch

General Hardware Recommendations

Bonding
- LACP vs balance-tlb vs balance-alb
VXLAN offload

Operating System Configuration

Linux

pin send/recv to specific cores
ip forwarding: disable GRO on kernel module (nic driver)
PCI Passthrough
SR-IOV?
- NUMA locality of SR-IOV (and passthrough) devices (pretty much get this for free if you are using NUMATopologyFilter and have a chipset that has locality)
Jumbo frames? 9000 MTU https://paste.fedoraproject.org/284011/14459359/ - for VLANs - source https://access.redhat.com/solutions/1417133

Kernel Tunables

net.ipv4.tcp_keepalive_time, net.core.somaxconn, net.nf_conntrack_max
Different queue algos: FQ_CODEL, etc

Windows

Hypervisor Configuration

KVM / libvirt

vhost-net (on by default on most modern distros?)
virtio
- virtio multiqueue
ovs acceleration (dpdk)

Xen

VMWare

Hyper-V

OpenStack Configuration

Instance and Image Configuration

PCI pass-through
Network IO quotas and shares
- not advanced enough
- instead, using libvirt hooks
1500 MTU
Make sure the instance is actually using vhost-net (load the kernel module)

Validation, Benchmarking, and Reporting

General Tools

iftop

Benchmarking

iperf

Metrics

System

bytes in/out, packets in/out, irqs, pps
/proc/net/protocols

Instance

nova diagnostics
virsh
virtual nic stats

Disk

Symptoms of Being Disk Bound

Artificially high load with high CPU idle
iowait

General Hardware Recommendations

Spindle vs SSD

separate ssd for logs
too many faulty ssd disks
SSD: TRIM, trim requests from guest aren't passed to hypervisor
bcache with ssd
- dmcache was less good

Hardware RAID, Software RAID, no RAID?

raid0 individual disks, pass through
durability: hardware raid5
ensure writes match stripe size of raid, on the filesystem level
raid1 for OS, JBOD for ephemeral
battery backup, will switch from back to through, performance hit at this time

Operating System Configuration

Linux

xfs barriers, turn off for performance, not for database
xfs on raid, tunables
xfs or ext4
LVM?
cfq instead of deadline - workload specific
tuned
File system recommendations and benefits
Caching and in-memory file systems?
bcache, see notes above

Kernel Tunables

Windows

Hypervisor Configuration

KVM / libvirt

ignore sync calls from guest - dangerous, but fast
write-through, write-back
defaults are usually safe

Xen

VMWare

Hyper-V

OpenStack Configuration

Base images, copy on write

Image Formats

qcow: smaller, cow

Overcommit

for ephemeral
for migration

Instance and Image Configuration

tuned
ide vs scsi, scsi didn't have a performance increase
ioschedule: noop
Disk IO quotas and shares
- yes on cinder
- question on how to effectively use
turn off mlocate, prelinking

Validation, Benchmarking, and Reporting

Benchmarking

fio (extensive)
bonnie++ (quick)

Metrics

System

iowait
iops
iostats
vmstat
sysstat (sar metrics)

Instance

nova diagnostics
virsh

References

RedHat guides from Steve Gordon
- http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/
- http://redhatstackblog.redhat.com/2015/09/15/driving-in-the-fast-lane-huge-page-support-in-openstack-compute/

Docs from distributions
- KVM
  - https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html
- Xen
  - http://wiki.xenproject.org/wiki/Tuning_Xen_for_Performance

CERN Tuning for high throughput computing

Previous Etherpads
- https://etherpad.openstack.org/p/YVR-ops-hypervisor-tuning
- https://etherpad.openstack.org/p/PAO-ops-hypervisor-tuning

@@ Line 45: / Line 45: @@
 === General Hardware Recommendations ===
-* host-passthrough is always faster than host-model or custom
-** This needs to have a warning that migrations will be impossible if non-identical compute nodes are added later
 ==== Hyperthreading ====
@@ Line 82: / Line 80: @@
 === OpenStack Configuration ===
-==== CPU Overcommit ====
+* host-passthrough is always faster than host-model or custom
+** This needs to have a warning that migrations will be impossible if non-identical compute nodes are added later
+==== Overcommitting ====
 * Generally, it's safe to overcommit CPUs. It has been reported that the main reason not to overcommit CPU is because of not overcommitting memory.
 * RAM overcommit, particularly with KSM, has a CPU hit as well
 === Instance and Image Configuration ===
 * CPU quotas and shares
 ** Reported use-case: default of 80% on all flavors, if workloads are very cpu heavy, don't do.
@@ Line 106: / Line 104: @@
 ==== Metrics ====
-* System: user, system, iowait, irq, soft irq
+===== System =====
-* Per-instance (nova diagnostics)
+* CPU: user, system, iowait, irq, soft irq
-** overlaying cputime vs allocated cpu
+===== Instance =====
+* nova diagnostics
+* Do not record per-process stats - explain why
+* overlaying cputime vs allocated cpu
 == Memory ==
 === Symptoms of Being Memory Bound ===
-** OOM Killer
+* OOM Killer
-** Out of swap
+* Out of swap
 === General Hardware Recommendations ===
@@ Line 122: / Line 124: @@
 === Operating System Configuration ===
 ==== Linux ====
-* Applicable kernel tunables
+===== Kernel Tunables =====
-** Transparent Hugepages can go either way depending on workload
+* Transparent Hugepages can go either way depending on workload
-* KSM
-** Might often cause performance (CPU) problems, better to turn it off
+===== KSM =====
+* Might often cause performance (CPU) problems, better to turn it off
 ==== Windows ====
@@ Line 141: / Line 145: @@
 === OpenStack Configuration ===
+==== Overcommitting =====
 * Memory Overcommit & the cost of swapping
 === Instance and Image Configuration ===
 * ensure ballooning is enabled / available
 * guests cannot see memory speed - not exposed like cpu flags are
 === Validation, Benchmarking, and Reporting ===
@@ Line 154: / Line 161: @@
 ==== Metrics ====
+===== System =====
+* page in, page out, page scans per second, `free`
-* System
+===== Instance =====
-** page in, page out, page scans per second, `free`
+* nova diagnostics
-* Per-Instance
+* virsh
-** nova diagnostics
-** virsh
 == Network ==
@@ Line 177: / Line 184: @@
 * pin send/recv to specific cores
 * ip forwarding: disable GRO on kernel module (nic driver)
-* Kernel Tunables
-** net.ipv4.tcp_keepalive_time, net.core.somaxconn, net.nf_conntrack_max
-** Different queue algos: FQ_CODEL, etc
 * PCI Passthrough
 * SR-IOV?
 ** NUMA locality of SR-IOV (and passthrough) devices (pretty much get this for free if you are using NUMATopologyFilter and have a chipset that has locality)
 * Jumbo frames? 9000 MTU https://paste.fedoraproject.org/284011/14459359/ - for VLANs - source https://access.redhat.com/solutions/1417133
+===== Kernel Tunables =====
+* net.ipv4.tcp_keepalive_time, net.core.somaxconn, net.nf_conntrack_max
+* Different queue algos: FQ_CODEL, etc
 ==== Windows ====
 === Hypervisor Configuration ===
 ==== KVM / libvirt ====
@@ Line 198: / Line 204: @@
 ==== VMWare ====
 ==== Hyper-V ====
 === OpenStack Configuration ===
@@ Line 218: / Line 223: @@
 ==== Metrics ====
-* System
+===== System =====
-** bytes in/out, packets in/out, irqs, pps
+* bytes in/out, packets in/out, irqs, pps
-** /proc/net/protocols
+* /proc/net/protocols
-* Per-Instance
-** nova diagnostics
+===== Instance =====
-** virsh
+* nova diagnostics
-** virtual nic stats
+* virsh
+* virtual nic stats
 == Disk ==
 === Symptoms of Being Disk Bound ===
+* Artificially high load with high CPU idle
+* iowait
 === General Hardware Recommendations ===
+==== Spindle vs SSD ====
+* separate ssd for logs
+* too many faulty ssd disks
+* SSD: TRIM, trim requests from guest aren't passed to hypervisor
+* bcache with ssd
+** dmcache was less good
+==== Hardware RAID, Software RAID, no RAID? ====
+* raid0 individual disks, pass through
+* durability: hardware raid5
+* ensure writes match stripe size of raid, on the filesystem level
+* raid1 for OS, JBOD for ephemeral
+* battery backup, will switch from back to through, performance hit at this time
 === Operating System Configuration ===
 ==== Linux ====
+* xfs barriers, turn off for performance, not for database
+* xfs on raid, tunables
+* xfs or ext4
+* LVM?
+* cfq instead of deadline - workload specific
+* tuned
+* File system recommendations and benefits
+* Caching and in-memory file systems?
+* bcache, see notes above
+===== Kernel Tunables =====
 ==== Windows ====
 === Hypervisor Configuration ===
 ==== KVM / libvirt ====
+* ignore sync calls from guest - dangerous, but fast
+* write-through, write-back
+* defaults are usually safe
 ==== Xen ====
 ==== VMWare ====
@@ Line 241: / Line 279: @@
 === OpenStack Configuration ===
+* Base images, copy on write
+==== Image Formats ====
+* qcow: smaller, cow
+==== Overcommit ====
+* for ephemeral
+* for migration
 === Instance and Image Configuration ===
+* tuned
+* ide vs scsi, scsi didn't have a performance increase
+* ioschedule: noop
+* Disk IO quotas and shares
+** yes on cinder
+** question on how to effectively use
+* turn off mlocate, prelinking
 === Validation, Benchmarking, and Reporting ===
+==== Benchmarking ====
+* fio (extensive)
+* bonnie++ (quick)
+==== Metrics ====
+===== System =====
+* iowait
+* iops
+* iostats
+* vmstat
+* sysstat (sar metrics)
+===== Instance =====
+* nova diagnostics
+* virsh
 == References ==

Difference between revisions of "Documentation/HypervisorTuningGuide"

Revision as of 04:15, 12 November 2015

Contents

About the Hypervisor Tuning Guide

How to Contribute

Understanding Your Workload

References

CPU

Symptoms of Being CPU Bound

General Hardware Recommendations

Hyperthreading

Notable CPU flags

Operating System Configuration

Linux

Windows

Hypervisor Configuration

KVM / libvirt

Xen

VMWare

Hyper-V

OpenStack Configuration

Overcommitting

Instance and Image Configuration

Validation, Benchmarking, and Reporting

General Tools

Benchmarking Tools

Metrics

System

Instance

Memory

Symptoms of Being Memory Bound

General Hardware Recommendations

Operating System Configuration

Linux

Kernel Tunables

KSM

Windows

Hypervisor Configuration

KVM / libvirt

Xen

VMWare

Hyper-V

OpenStack Configuration

Overcommitting =

Instance and Image Configuration

Validation, Benchmarking, and Reporting

General Tools

Benchmarking

Metrics

System

Instance

Network

Symptoms of Being Network Bound

General Hardware Recommendations

Operating System Configuration

Linux

Kernel Tunables

Windows

Hypervisor Configuration

KVM / libvirt

Xen

VMWare

Hyper-V

OpenStack Configuration

Instance and Image Configuration

Validation, Benchmarking, and Reporting

General Tools

Benchmarking

Metrics

System

Instance

Disk

Symptoms of Being Disk Bound

General Hardware Recommendations

Spindle vs SSD

Hardware RAID, Software RAID, no RAID?

Operating System Configuration

Linux

Kernel Tunables

Windows