Archive

Archive for the ‘VMware’ Category

CPU Performance monitoring on ESX and VMs

October 11th, 2010 5 comments

One of the key resource of VM is CPU performance. If the performance is a bottleneck it can be attributed to storage, network, memory and CPU.

Physical CPU load is generated by

  1. Guest OS running inside the VM
  2. Application running in the VM
  3. Agents running on ESX Server on pCPU0

Majority of the load on the ESX/ESXi servers are attributed to the above conditions. A good understanding of applications and how to use VI Client OR Esxtop will go a long way to troubleshoot any performance related problems.

Factors that Affects CPU Performance

  1. CPU Affinity – Using CPU affinity has a positive effect for VM being pinned to a vCPU. However for the entire system as a whole, CPU affinity constrains the scheduler and can cause an improperly balance load.
  2. Idle Virtual Machines
  3. SMP virtual machines
  4. If CPU contention exists, the scheduler forces vCPUs of lower priority VM to queue their CPU requests in deference to higher-priority VMs.

CPU Metrics to Monitor

  1. Host CPU Usage –  Amount of time that the host's CPU (physical CPU) was used during normal sampling period
  2. VM CPU used –  Amount of time that the virtual machine’s CPU (vCPU) was actively using the physical CPU. For virtual SMP virtual machines, this can be displayed as an aggregate of all vCPUs in the virtual machine or per vCPU.
  3. VM CPU Ready Time –  Amount of time that the virtual machine’s CPU (vCPU) was ready but could not get scheduled to run on the physical CPU. CPU ready time is dependent on the number of virtual machines on the host and their CPU loads.

What is a World

A VM is a collection of worlds, with some worlds being virtual CPUs (vCPUs) and other thread doing additional work. For example, a VM consists of a world that controls the mouse, keyboard and screen (MKS). The VM also has a world for its virtual machine monitor (VMM).There are non–virtual machine worlds as well. These non–virtual machine worlds are VMkernel worlds and are used to perform various system tasks. Examples of these non–virtual machine worlds include the idle, driver, and vmotion worlds.

ESXTOP

We can either use the esxtop from the ESX console OR via RCLi i.e resxtop. They both perform in the similar manner. By default, the vSphere Client uses a sampling interval of 20 seconds and resxtop/esxtop uses a sampling interval of 5 seconds.To convert time values to percentages, divide the time value by the sample interval. For the vSphere Client, the default sampling interval is 20 seconds (or 20,000 milliseconds).

To start ESXTOP from the cmd line in ESX Server type :–  esxtop

By default the display starts with CPU resource utilization. It is bit cryptic, knowledge about Physical hardware in terms of CPU/memory will be helpful here but it is not necessary.

Esxtop display

The above image gives us interesting information about the CPU utilization of this ESX Server

  1. The current time on the system
  2. The number of days since this ESX box is rebooted
  3. Number of Worlds on this box. (A world is an ESX Server VMkernel schedulable entity, similar to a process or thread in other operating systems. A group contains multiple worlds.) 
  4. The Avg CPU load in last 5, 10 and 15 mins. A value of 1 in any of these three fields indicates that the physical CPU is fully utilized and a value greater than 1 indicates that the host requires more physical CPU capacity than is available. For example a value of 2 means that host potentially needs twice the amount of physical CPU resources that were available at the indicated time
  5. The PCPU (%) shows utilization for the Physical processor on the host by core. It provides a real time view of the host's Physical CPU utilization. If these figures are consistently showing between 90% and 100% for prolonged periods of time, this indicates that the CPUs on the host are probably oversubscribed
  6. %USED –  CPU Utilization.  The virtual machines are reported here with the names specified at the time they were created. As with the idle world’s row, utilization for each virtual machine can exceed 100 percent. A virtual machine that uses two virtual CPUs, for example, can show up to 200 percent CPU utilization
  7. %SYS –  Percentage of time in the ESX VMkernel on behalf of the world/resource pool to process interrupts and to perform other system activities
  8. %RDY –  Percentage of time the group was ready to run but was not provided CPU resources on which to execute
  9. %WAIT –  Percentage of time the group spent in the blocked or busy wait state. This includes the percentage of time the group was idle.
  10. %CSTP –  Percentage of time the vCPUS of a VM spent in the co-stopped state, waiting to be co-started. This gives an indication of the co-scheduling overhead incurred by the VM. If this value is low, then any performance problems should be attributed to other issues and not to the co-scheduling of the VM's vCPU
  11. %MLMTD –  Percentage of time the VMkernel did not run the resource pool/world because that would violate the resource/pool/worlds limit setting
  12. NWLD –  Number of worlds associated with a given group

Expanding the Group (Showing worlds associated with each VM)

You can expand the group data for a VM you want to examine in more detail. To do so, press “e” then enter the group ID number (shown in the GID column) for the VM.

For ex in the below image, we will expand the VM with GID –  16

Gid 16

After expanding gid 16

Expand gid 16

The above screen capture includes following information. Each VM will compromise of at least the following basic worlds.

  1. vCPU-# –  This world assists the VM with its vCPU. One vCPU world is allocated for each vCPU assigned to the VM. For example, a dual vCPU VM would have two VMX worlds, vcpu-o and vcpu-1.
  2. VMware-VMX –  The vmx world assists the vCPU world outlined in the preceding point
  3. MKS –  This world assists in mouse, keyboard and screen
  4. VMAssistant –  This is a new world that handles memory scanning and TPS (Transparent Page Sharing)

To View CPU metrics per VM

If you want to view CPU metrics per VM –  Type “V” 

V display VM

Basic Troubleshooting Flow for CPU Saturation

  1. Check for Host CPU Saturation
  2. Check for Guest CPU Saturation
  3. Check for using only one vCPU in an SMP VM
  4. Check for low guest CPU utilization
  5. Check for high utilization on PCPU0

ESXTOP values for Ready time

Using resxtop/esxtop values for ready time, here is how to interpret the value:

  1. If ready time <= 5 percent, this is normal. Very small single-digit numbers result in minimal impact to users.
  2. If ready time is between 5 and 10 percent, ready time is starting to be worth watching.
  3. If ready time is > 10 percent, though some systems continue to meet expectations, double-digit ready time percentages often mean action is required to address performance issues.

Spotting CPU Overcommitment

Remember, high %RDY values and High %Used values are sure indicator that your CPU resources are overcommited

Guest CPU saturation

Guest CPU saturation occurs when the application and the OS running within a VM use all of the CPU resources that the ESX host is providing to that VM. The occurrence of guest CPU saturation does not necessarily indicate that a performance problem exists, Compute-intensive applications commonly use all available CPU resources.

There are two approach to solve this performance problem

  1. Increase the CPU resources provided to the application, by adding  more vCPU
  2. Increase the efficiency with which the VM uses CPU resources

Single vCPU vs Multiple vCPUs

Take a cautious approach to allocating multiple vCPUs to a VM, as you can create extra overhead by scheduling two or more vCPUs in a VM, which would be detrimental to the performance of the VM.

To determine if multiple vCPUs assigned to your VM is causing poor performance:

  1. Open a console prompt on the ESX host or initiate an SSH connection to it. .
  2. Type esxtop and press Enter.
  3. On the CPU screen, check the %CSTP value. If this number is higher than 100, the performance issues may be caused by the vCPU count. Try lowering the vCPU count of the virtual machine by 1.

    Note: The %CSTP value represents the amount of time a virtual machine with multiple virtual CPUs is waiting to be scheduled on multiple cores on the physical host. The higher the value, the longer it waits and the worse its performance.  Lowering the number of vCPUs reduces the scheduling wait time.

To lower the vCPU count:
Note: The virtual machine must be powered off to perform these steps.
  1. Right-click on the virtual machine and click Edit Settings.
  2. Click CPUs.
  3. Use the Number of virtual processor drop-down to lower the vCPU count by 1.
  4. Click OK
  5. If your virtual machine still experiences performance issues, and if its kernel or HAL can handle switching to a single vCPU, lower the vCPU count to 1.

    Warning: If your virtual machine's kernel or HAL cannot handle switching to a single vCPU, unexpected behaviour may occur.

                          The above information is from VMware's website –  http://kb.vmware.com/kb/1005362

Low Guest CPU Utilization

If the average usage < 75  percent, guest CPU utilization is low, there can be various reasons behind this

  1. High Storage response time
  2. Poor application OR OS tuning
  3. Application pinned to cores in guest OS
  4. Too many configures vCPUs
  5. Restrictive resource allocations

High utilization on PCPU0

If usage on PCPU0 > 75 percent and it is more than 20 percent greater than overall host usage, then there is  possible high utilization on PCPU0. To solve this

  1. Reduce the number of management agents
  2. Reduce the amount of work performed by management agents
  3. Ensure management agents are up to date

 Most of the above information is gathered from VMware's Knowledge base, VMworld and from forums

Share
Categories: Esxtop, VMware, vSphere Tags:

ROI on your Virtual Environment

August 18th, 2010 No comments

Recently, I came across a nice tool on VMWare’s website which can help the management to get a good understanding on their ROI.

Link –  VMware’s Web site

Roi calculator

Tags:
Share
Categories: VMware Tags:

Check actual disk space usage for Thindisk

August 11th, 2010 No comments

If, we have a Thin disk and we want to understand the actual space used inside the disk then we will have to the “du” cmd from the ESX. I am sure there are going to be other ways also. But I like to do things from cmd line

du -h thinkdisk.vmdk

The above cmd will give us disk space usage.

 

Tags:
Share
Categories: VMware, vSphere Tags:

Check Disk (Thin/Thick from cmd line)

August 11th, 2010 No comments

I was wondering if there is no vCenter access server and all I have is a ESX  console access, how am I going to find whether a VM has a thick disk OR thin disk. After some research I found this which I would like to share it with everybody.

The descriptor file for the VM has a entry which states that the VM is thin OR thick.

Thin disk

The same VM, when we convert to thick disk via cmd line OR via VI Client will not have that thinProvisioned = “1” parameter.

Thick disk

 

 

Share
Categories: VMware Tags:

Microsoft Clusters in Virtual Environment

August 9th, 2010 6 comments

Clustering Virtual machines can reduce the hardware costs as compared to traditional costing on Physical machines.

Cluster Components

   1) Virtual Machines

   2) Network card

      a. One Private card for cluster communication

      b. One Public

   3) Shared disks

   4) Supported Operating System

   5) Supported disk format

Types of Cluster

There are three types of Cluster configuration supported by VMware.

   1) Clustering Virtual Machine on a single Host

Cluster in a host

Figure1: This figure is taken from VMware’s MSCS Guide

   2) Clustering Virtual Machines Across Physical Hosts

VM clustered across hosts

Figure2: This figure is taken from VMware’s MSCS Guide

   3) Clustering Physical Machines with Virtual Machines

Clustering Physical and Virtual Machines

Figure3: This figure is taken from VMware’s MSCS Guide

 

Hardware, Software & Storage requirement for MSCS

No matter which type of Cluster you use there are few common settings that you have to do to make VMs work in a cluster environment.
 

Component Requirement
Virtual SCSI Adapter LSI Logic Parallel for Windows 2003
  LSI Logic SAS for Windows 2008
Operating System Windows 2003 SP2
  Windows 2008 R2
Virtual NIC Use the default type for all guest OS. Add this before adding of the other disks used in Cluster
I/O timeout        

Set to 60 seconds or more. Modify HKEY_LOCAL_MACHINE\System\CurrentControlSet
\Services\Disk\TimeOutValue

If you reconfigure your cluster the system might reset this I/O timeout value. You will have to reset the value in that case.

Disk format Use eagerzeroedthich format either by selecting Support clustering features such as Fault Tolerance OR use vmkfstools cmd
NTP Server Synchronize domain controllers and cluster nodes with a common NTP server, and disable host based time synchronization when using clustering in the guest.
VM Hardware Version                                 7
Adding additional disks for Cluster Always use a new SCSI controller for example (SCSI (1:0)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Storage Requirement

Storage Type

Cluster in a Box Cluster Across Boxes Standby Clustering
Virtual Disks Yes (recommended) No No
Pass-through RDM (Physical Compatibility mode) No Yes (recommended) Yes
Non-Pass-through RDM (Virtual Compatibility mode) Yes Yes No

MSCS Setup Limitations

The following environments and functions are not supported for MSCS setups with this release of vSphere:

  1. Clustering on iSCSI, FCoE, and NFS disks.
  2. Use of MSCS in conjunction with VMware Fault Tolerance.
  3. Migration with vMotion of clustered virtual machines.
  4. Use of MSCS in conjunction with VMware Fault Tolerance.
  5. N-Port ID Virtualization (NPIV)
  6. With native multipathing (NMP), clustering is not supported when the path policy is set to round robin.
  7. You must use hardware version 7 with ESX/ESXi 4.1.
     

Caution

  1. if you try to clone a VM with an RDM setup the cloning process converts the RDMs to virtual disks. Unmap all RDMs before cloning, and remap them after cloning is complete.
  2. Add networking before disks. Refer kb – http://kb.vmware.com/kb/1513
  3. If you are using Cluster across Boxes and the OS us Windows 2008 – VMware will not support virtual compatibility mode for RDM’s.
  4. Use the STORport Miniport driver for the Fibre Channel (FC) HBA (QLogic or Emulex) in the physical Windows machine on Standby Host clustering
  5. Do not run multipathing software in the physical or virtual machines on standby host clustering
  6. Use only a single physical path from the host to the storage arrays in standby host configurations.
     

HA/DRS Configuration in MSCS

  1. All hosts that are running MSCS virtual machines can be part of a vCenter Server cluster with both VMware HA and DRS enabled.
  2. For MSCS virtual machines in a cluster, you must create VM-VM affinity or anti-affinity rules.
  3. VM-VM affinity rules specify which virtual machines should be kept together on the same host (for example, a cluster of MSCS virtual machines on one physical host).
  4. VM-VM anti-affinity rules specify which virtual machines should be kept apart on different physical hosts (for example, a cluster of MSCS virtual machines across physical hosts).
     

Steps to Enable strict Enforcement of Affinity Rules

  1. In the vSphere Client, right-click the cluster in the inventory and select Edit Settings.
  2. In the left pane of the Cluster Settings dialog box, select VMware DRS.
  3. Click Advanced Options.
  4. In the Option column, type ForceAffinePoweron.
  5. In the Value column, type 1.
  6. Click OK.

MSCS Checklist

  1. Requirement for Cluster Disks

       2.  Other Clustering Requirements and Recommendations

 

      3. Required info for Technical support

Check these sites for additional information

  1. http://exchangeexchange.com/blogs/bkeane/archive/2007/07/30/mscs-clustering-in-vmware.aspx  All though this site is not meant exclusively for vSphere4.1 but it still gives good info about the MSCS.
  2. http://communities.vmware.com/message/1491688?tstart=0
  3. http://www.sysprobs.com/setup-windows-2008-cluster-vmware-workstation Setup Win2k8 R2 Failover Cluster in VMware workstation
  4. http://www.virtuallifestyle.nl/2010/01/recommended-detailed-material-on-rdms/

Disclaimer: Most of the above information is taken from VMware’s MSCS guide.

Share
Categories: VMware Tags: , ,
# this line was added by me, vcpguy to make the Syntaxhighlighter evolved work !!!