Dell EqualLogic Best Practices Series Sizing and Best Practices for Deploying Citrix XenDesktop on VMware vSphere with Dell EqualLogic Storage A Dell Technical Whitepaper Storage Infrastructure and Solutions Engineering Dell Product Group January 2012
THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. © 2012 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell.
Table of contents 1 Introduction ........................................................................................................................................................ 1 1.1 2 Virtual Desktop Infrastructures ....................................................................................................................... 2 2.1 3 4 Addressing VDI storage challenges with EqualLogic SANs ...............................................................
5.4.4 Citrix infrastructure layer: PVS and DDC performance ............................................................ 24 5.4.5 User Experience: application performance ................................................................................ 25 5.4.6 Write cache utilization on virtual desktops ................................................................................. 25 5.5 Task worker – 20% Pre-boot .....................................................................................
Appendix B Folder redirection and roaming profiles.................................................................................. 44 Appendix C Network Design and VLAN configuration ............................................................................... 45 VDI Network ......................................................................................................................................................... 45 Infrastructure network.................................................
Acknowledgements This whitepaper was produced by the PG Storage Infrastructure and Solutions of Dell Inc.
1 Introduction Virtual Desktop Infrastructure (VDI) products such as Citrix® XenDesktop® can provide organizations with significant cost savings, streamlined implementation, and ease of desktop management. In order to achieve these VDI benefits and to ensure optimal user experience, storage infrastructure design and sizing considerations need to be addressed carefully.
2 Virtual Desktop Infrastructures VDI helps IT organizations simplify administration and reduce costs while enhancing security and regulatory compliance, increasing IT flexibility and business agility, and strengthening business continuity and disaster recovery. However, a VDI deployment must be carefully designed to ensure that it delivers the performance and scalability needed to support an enterprise-wide client community.
becomes hot, it is automatically moved from the SAS tier to the SSD tier. This automatic tiering function makes the hybrid EqualLogic SAN a very cost efficient option for VDI environments where the peak load from hundreds of desktops during login storm is concentrated on the relatively small capacity base image volume (the hot data blocks).
3 Citrix XenDesktop solution infrastructure Citrix XenDesktop is a comprehensive desktop virtualization solution that includes the capabilities required to deliver desktops, applications, and data securely to every user in an enterprise. The solution incorporates a variety of components to accomplish these various capabilities. 3.1 Test infrastructure: Component design details The core Citrix XenDesktop infrastructure components used in our test configuration are shown in Figure 1.
Figure 1 Citrix XenDesktop core infrastructure components Overview of the core infrastructure components (see Figure 1): • • BP1018 We used two provisioning servers (PVS) and two Desktop Delivery Controllers (DDC) for streaming the master disk image to virtual desktops. Two servers were used for high availability and better performance. PVS and DDC servers were virtualized and hosted on two separate ESXi servers for high availability.
• • • • vDisk was configured using standard mode and it was made read-only so that the two provisioning servers can access it simultaneously. Any modifications made to the streamed image by the virtual desktop were stored in the write cache. Each virtual desktop consumed 3 GB of storage capacity – 2 GB of write cache (that includes 1 GB page file) and 1 GB of RAM. The “Cache on Device Hard Drive” option provided by Citrix Provisioning Services was used to mount the write cache.
Component VMware ESXi Enterprise Plus VMware vCenter Description The smaller footprint version of ESXi that does not include the ESXi service console. Centralized management interface for VMware vSphere environment. Login VSI Workload Generator Component Login VSI Description A third-party benchmarking tool from Login Consultants that is used to simulate a real-world XenDesktop VDI workload. A Login VSI launcher is a Windows system that launches desktop sessions on target virtual desktop machines.
3. The Desktop delivery controller authenticates the user using Active Directory (D) and starts the VM (C) on the Hypervisor. 4. The VM (C) contacts the DHCP server (D) to find an IP address and the location of the boot image. 5. The VM (C) starts using the boot image received from the Provisioning server (G) through the network. 6. The Desktop delivery controller (E) assigns the user a VM after verifying the license through the license server (B) and connects the user to the VM through ICA. 7.
Figure 3 Virtual desktop disk composition in Citrix XenDesktop VDI environment As shown in Figure 3, the shared read-only vDisk and temporary write cache are served from EqualLogic PS6010XVS array(s). The user data is kept separate by using a separate CIFS share. 3.4 EqualLogic storage array configuration We used a single EqualLogic PS6010XVS array to store the vDisk of the PVS and also the write cache for all virtual desktops. Only 2.
Table 2 EqualLogic volume layout on a single PS6010XVS array Volume name vDisk-vol VDIVOL1 VDIVOL2 VDIVOL3 VDIVOL4 • • • • Reported size 150 GB 500 GB 500 GB 500 GB 500 GB Capacity Total member capacity: 2.48 TB Capacity used by volumes: 2.1 TB (84.8%) Free member space: 384.87 GB (15.2%) The volume named “vDisk-vol” was used to host the master vDisk image. This volume was accessed by two provisioning servers using Windows Guest iSCSI initiators.
Figure 4 Server LAN and iSCSI SAN connectivity • • • BP1018 Each PowerEdge M610 server has one on-board Broadcom NetXtreme 5709 dual port NIC mezzanine card and this was assigned to Fabric A. The M610 servers had two additional dualport Broadcom NetXtreme 57711 10 GbE NIC mezzanine cards and they were assigned to Fabric B and Fabric C. Port B1 (Fabric B) and Port C1 (Fabric C) on each server was used for iSCSI SAN connectivity. Similarly, ports B2 and C2 were dedicated for VDI network.
• • 3.5.2 Two PowerConnect 8024F switches were used as external SAN switches. We used four 10 GbE SFP+ uplink modules from M8024 switches on Fabric B1 and Fabric C1 to the two PC8024F switches which are connected via a LAG of 4x10 GbE links. The other two ports on Fabric B and Fabric C (Ports B2 and C2) were dedicated for VDI traffic. The two M6220 switches on Fabric B2 and C2 were uplinked to two external PowerConnect 6248 switches using two 10 GbE SFP+ uplink modules.
Figure 5 vSwitches and VLANs The infrastructure network was further divided into three VLANs to segregate network traffic into different classes. The three VLANs used in the test configuration were: - Management VLAN Infrastructure VLAN vMotion® VLAN For more detailed information on how each of these different networks was set up and how the VLANs were configured, please refer to Appendix C .
4 Citrix XenDesktop test methodology 4.1 Test objectives The primary objectives of our testing were: • • • • Develop best practices and sizing guidelines for a Citrix XenDesktop Provisioning Services based VDI solution deployed on Dell EqualLogic PS6010XVS series storage, Dell PowerEdge blade servers, and Dell PowerConnect switches with VMware vSphere 4.1 as the server virtualization platform.
4.3.1 Load generation Login VSI is a VDI benchmarking tool which can be used to determine the maximum number of desktops that can be run on a physical or virtual server. Login VSI simulates a realistic VDI workload using the AutoIt script within each desktop session to automate the execution of generic applications. The tool’s “Light” workload was used to simulate the task worker workload.
The typical industry standard latency limit for storage disk I/O is around 20 ms. Maintaining this limit will ensure good user application response times when there are no other bottlenecks at the infrastructure layer 4.4.2 System resource utilization on the hypervisor infrastructure The primary focus of our testing was storage, and we ensured that no other component in the VDI stack became a bottleneck while conducting these storage characterization tests.
5 Citrix XenDesktop test results and analysis This section describes the different XenDesktop VDI characterization tests conducted and also the key findings from each test. The Task worker user type represents the majority of VDI users in the industry today and we focused our testing on this workload profile. For all single array tests, we used eight M610 blade servers to host 630 virtual desktops.
As explained in section 3.3, the key components of a VDI hosted using PVS are vDisk, write cache, and the CIFS share. The vDisk and write cache components are the most critical components and have a huge impact on the underlying storage. CIFS share is a recommended way to separate user data from system data because it helps simplify the user data management. We analyzed the I/O activity on vDisk, write cache, and CIFS share for all of the above test scenarios.
Figure 6 Total IOPS on vDisk volume As shown in Figure 6, we observed very minimal READ I/O activity (122 IOPS) on the vDisk volume during the login storm. Each PVS server had 16 GB of RAM and as a result, most of the data which needed to be streamed was cached in the server RAM. Once the disk image blocks got cached, further read requests from virtual desktops were served directly from PVS RAMs which significantly reduced the storage IOPS later on.
Figure 7 Total IOPS on PS6510E array hosting CIFS Share – 1270 virtual desktops As shown in Figure 7, the total IOPS requirement on the CIFS share was not significant. Redirecting user profiles and shares helped to reduce the IOPS requirement on the main PS6010XVS array used for storing and streaming desktop images along with the desktop write caches. It also allows better manageability of the virtual desktops in a VDI environment.
Login VSI workload was used to simulate the task worker workload profile and detailed storage and infrastructure performance metrics were captured. All virtual desktops were pre-booted before the Login VSI workload started. Some of the key performance metrics are analyzed below. Volume level IOPS We observed around 1100 to 1200 IOPS on each volume and both average read and write latencies were well below 2 ms.
write cache activity on each desktop included all temporary OS writes such as paging. The OS disk image reads were satisfied via network streaming through the provisioning servers and the vDisk image was stored in a separate volume. Figure 9 Total IOPS at member level As you can see in Table 4, most of the IOPS were handled by high performance SSD drives and there was very little I/O on the SAS drives.
Table 4 Detailed view of total IOPS at disk level Member Pool Disk Description XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS XenVDI-XVS VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool VDI-Pool 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SSD 100 GB SATA-II SSD 100 GB SATA-II SSD 100 GB SATA-II SSD 100 G
Figure 10 IOPS monitored using ‘Live View’ feature of SAN HQ during login storm – 630 VMs The storage array disk latency never went beyond 5 to 6 ms even at the peak I/O during the login storm. We observed an increase in CPU and network resource utilization at peak login, but it was well within the acceptable limits. 5.4.3 Hypervisor layer: ESXi host performance During the test, we measured CPU, memory, network, and disk performance on all ESXi servers hosting the virtual desktops.
- During login, there is usually extra processing at the DDC to authenticate and allocate a specific user to one of the available pre-booted desktops. There is also increased processing on the PVS layer because there is an increased amount of activity related to reading the master disk image and streaming it to all virtual desktops.
Figure 11 Total IOPS at member level – 20% Pre-boot scenario In this scenario, the time required to login to 630 virtual desktops was almost 100 minutes. In the 100% Pre-Boot scenario it was just 20 to 25 minutes. This is because 80% of the desktops had to be booted before logon and this resulted in increased logon duration. In this scenario, a user will experience considerably longer login duration. The read IOPS on storage was low due to streaming of the disk image from provisioning servers.
Figure 12 Total IOPS on the EqualLogic pool We verified that system resource utilization at the ESXi layer and Citrix infrastructure layer were within the acceptable limits as defined in the section 4.4. More than 94% of the desktops were represented as ‘GOOD” performing desktops by Stratusphere UX. The Stratusphere UX scatter plot is shown in Appendix D . The test results for both one array and two array configurations are summarized in Table 5.
Figure 13 IOPS on one of the EqualLogic array during login storm – 1270 VMs Note: The above SAN HQ chart shows the peak IOPS of 6200 during the login storm on one of the array members. Almost the same number of IOPS was observed on the other EqualLogic member. The storage array disk latency never went beyond 5 to 6 ms even at the peak I/O during the login storm. We observed an increase in CPU and network resource utilization at peak login, but it was well within the acceptable limits. 5.
Figure 14 Total IOPS and average latency on VDI Pool All 1270 desktops were booted within 10 to 12 minutes. The storage arrays were able to handle this spike in IOPS and the average latency remained below 10 ms through the test. As shown in Figure 14, the boot storm consisted of both read and write I/O. Read I/O is a smaller proportion compared to the write I/O due to the disk image streaming from provisioning servers.
Figure 15 Read and write IOPS on an individual volume We did not observe any bottlenecks on the ESXi servers with respect to CPU or memory resources during the boot operation. The peak CPU utilization observed was around 60% during the boot process. The system resources on the Citrix Infrastructure ESXi servers hosting provisioning servers and DDCs were hardly used during the boot storm. The CPU resource utilization on each of the Citrix Infrastructure ESXi servers while booting VMs was less than 10%.
• • • • • BP1018 None of the system resources on the ESXi servers hosting virtual desktops reached maximum utilization levels at any point in time. In the 100% Pre-boot scenario, the login completed within 20 to 25 minutes. In the 20% Pre-boot scenario, the login duration increased to 100 minutes. During boot storm simulation, we observed nearly 15000 IOPS on the two array configuration. The Stratusphere UX tool represented 100% of the virtual desktops in ‘GOOD” category on the one array configuration.
6 Sizing guidelines for EqualLogic SANs Virtual desktop usage in enterprise environments follows predictable I/O patterns. For example, at the beginning of the workday most employees login into their desktops within a relatively short timeframe. After the login storm, periods of high and low steady state application activities will occur. For example, high user activity on their workstations during morning and afternoon hours and low activity during break hours would be expected.
For more information regarding the different Provisioning Services vDisk modes, please refer to the following link: http://support.citrix.com/proddocs/topic/provisioning-56/pvs-technology-overviewvDisk-modes.html 6.1.2 Write cache capacity requirements The write cache is a temporary disk on each virtual desktop which contains the modified blocks from each user session. The write cache file is deleted during each reboot cycle.
Peak IOPS during boot storm 10000 15000 Peak IOPS during steady state 4500 9000 Capacity buffer 15% 18% Based on our test results, I/O characteristics observed for the task worker type of workload profile during the lifecycle of a virtual desktop can be summarized as shown in Table 7. Table 7 I/O characteristics during the lifecycle of a VM Type of I/O Steady state IOPS per desktop 7-8 Read/Write ratio 0.2% / 99.
In this formula, we use 15% headroom for the array capacity and 25% headroom for VM RAM and the write cache allocated on the virtual desktop. The following table describes the formula components and provides a sample calculation. Table 8 Formula components Variable Description NumVM Number of virtual desktops to be deployed. Amount of RAM allocated on each virtual desktop. Size of the master OS disk image. Number of vDisks Size of the Write cache. Total capacity of the PS6010XVS array in GB.
7 Best practices 7.1 Desktop profiles and I/O storms 7.1.1 Implement roaming profiles and folder redirection Using a separate storage array and redirecting user profile/folders to a file server using that array is highly recommended for better management and increased performance of VDI environments. Implementing roaming profiles and folder redirection helped in reducing the performance impact during user logon and also allowed user data to be persistent across boot.
7.3 Network configuration The key recommendations related to network design are listed below. • • • • • We recommend using at least two dedicated physical NICs per server for each of the networks listed below: o VDI network o Management, Infrastructure, and vMotion network o iSCSI SAN Use VLANs to segregate different types of network traffic such as the Management network, Infrastructure network, and the vMotion network. This helps in improving manageability, performance, and security.
7.4.2 Write cache recommendations Storing write cache on the target devices (virtual desktops) is recommended for better performance because the file creation is local to the target and not streamed to any servers, which avoids additional network traffic. Restarting virtual desktops clears the cache file and it is highly recommended to restart the desktops at least once per day or more frequently if possible.
Based on our test results, we recommend adding additional EqualLogic array members to the same pool to scale the number of virtual desktops. We also recommend keeping at least 15% capacity headroom on the storage array to accommodate any future requirements and for optimal array performance.
8 Conclusions The life cycle of a virtual desktop in a VDI environment includes many stages such as boot, login, steady state, and logoff. Each of these stages results in varied amount of IOPS and can create a huge impact on the underlying storage. The test configuration used in our tests could support up to 630 desktops with one array and 1270 desktops with two arrays and there were no bottlenecks observed related to storage performance.
Appendix A Citrix XenDesktop solution configurations Solution configuration - Hardware components Virtual Desktops Provisioning Services Servers and Desktop Delivery Controllers (DDC) File server BP1018 • • • 16 x Dell PowerEdge M610 Servers: o ESXi 4.1 o BIOS Version: 3.0.0 o 2 x Quad Core Intel® Xeon® X5687 3.
GbE NIC, Dual-Port Citrix XenDesktop Infrastructure Servers XenDesktop VDI clients (Login VSI Launchers) • • VDI Workload Generator Network • 1 x Dell PowerEdge R710 servers: o ESXi 4.1 o BIOS Version: 3.0.0 o 2 x Quad Core Intel® Xeon® X5690 3.46 Ghz Processors o 96 GB RAM o 2 x 146 GB 10K SAS internal disk drives o 1 x Quad-port Broadcom 5709 1 GbE NIC (LAN on motherboard) o 2 x Broadcom NetXtreme II 57711 10 GbE NIC, Dual-Port 3 x Dell PowerEdge R710 servers: o ESXi 4.
o • Performance Monitoring Firmware: 5.1.1 (R189834) (H2) 1 x Dell EqualLogic PS6510E: o 48 x 1TB 7.2K SATA disks o Dual 2 port 10 GbE controllers o Firmware: 5.1.1 (R189834) (H2) • SAN HeadQuarters – 2.1.
Appendix B Folder redirection and roaming profiles User settings and user files are typically stored in the local user profile, under the Users folder. These files usually reside on the local hard drive which makes it difficult for users who use more than one computer to work with their data and synchronize settings between multiple computers.
Appendix C Network design and VLAN configuration VDI Network The servers listed below are part of the VDI network: • • • • The M610 blade servers hosting virtual desktops The ESXi servers hosting PVS and DDC The ESXi server used for hosting the file server and presenting CIFS share The ESXi servers used for launching Login VSI workload The network architecture block diagram for this VLAN from the blade servers is shown in Figure 16.
Figure 17 Infrastructure, Management, and vMotion VLAN connectivity The infrastructure network was divided into three VLANs: • • • BP1018 Infrastructure VLAN The infrastructure server which hosts Citrix Web Interface, Active Directory, Microsoft SQL Server, License server, and VMware vCenter was configured with a dedicated VLAN. Management VLAN A dedicated VLAN was configured for management traffic of all the ESXi servers and virtual machines shown in Figure 17.
Appendix D Liquidware Labs Stratusphere UX The Stratusphere UX Scatter plot for the single array test (630 Virtual Desktops) is shown in Figure 18. Figure 18 Stratusphere UX Scatter plot for the single array test Note: Only 626 VMs were deployed in this test as Login VSI failed to launch the workload on four VMs. The Stratusphere UX Scatter plot for the two array test (1270 Virtual Desktops) is shown in Figure 19.
Figure 19 Stratusphere UX Scatter plot for the two array test Note: The performance was monitored on 1200 virtual desktops. This shows more than 94% of the desktops performing as “GOOD” and the remaining desktops showed “FAIR” performance. None of the desktops showed “POOR” performance.
Related publications The following Dell publications are referenced in this document or are recommended sources for additional information. • PS Series Array Network Performance Guidelines: http://www.equallogic.com/resourcecenter/assetview.aspx?id=5229 The following Citrix publications are referenced in this document or are recommended sources for additional information. • • • • • XenDesktop Planning Guide: Image Delivery: http://support.citrix.