Scalability Factors for VSS and Exchange 2003 32-bit OS
This article describes the scalability factors and limitations of Volume Shadow Copy Service (VSS) when used on an Exchange 2003 32-bit operating system (OS) identified by Microsoft , and provides resources and recommendations for resolving them.
Microsoft has identified resource challenges for Exchange 2003 on Windows 32-bit OS when Microsoft Volume Shadow Copy Service (VSS) is creating and/or holding open a volume snapshot. Since Replay uses the VSS technology, when our backup runs, the same constraints exist. The limitation of available paged/non-paged pool memory in a 32-bit system is noted by Microsoft. Microsoft Technet has provided a document titled “Scalability Factors for Shadow Copies” to cover this topic. There are several ways to reduce resource usage, which will be covered in these helpful links and the body of this document.
32-bit Resource Limitation
Search Microsoft Technet for the following article for more information about the constraints in this environment and some possible ways to reduce resource consumption:
“Scalability Factors for Shadow Copies”
Microsoft Support noted some of the symptoms of resource exhaustion in the following article concerning the impact on IIS:
“Users receive a “The page cannot be displayed” error message, and “Connections_refused” entries are logged in the Httperr.log file on a server that is running Windows Server 2003, Exchange 2003, and IIS 6.0”
The above article identifies an error to look for in the IIS Httperr.log file that indicate there are no more possible connections available. There may be multiple occurrences of entries that resemble the following:
If Exchange is refusing connections, this is an indicator that pool memory has been exhausted.
Unfortunately, it is difficult to predict or detect without actually pushing the system into the actual conditions that may cause an outage with Exchange. To determine if there is a scalability issue, perform any of the following procedures.
If you determine that there is a scalability issue, follow the Recommendation provided.
Running vshadow from Microsoft can help determine the load that VSS will generate when a volume shadow copy is created without using Replay. Vshadow is a sample application for developers provided by Micorosoft in their VSS Software Developer’s Kit. Replay installs a copy of this tool which allows the shadow copy it creates to persist (meaning once the volume shadow copy is created, it will remain open for a configurable amount of time).
To run vshadow:
- Locate Vshadow.exe in the Utils folder of the Replay Agent.
- Use vshadow at the command prompt to take vshadow snaps of all the volumes while monitoring memory usage during production hours.
- When running the tool, use the “p” option to make the volume shadow copy persistent.
- Let it exist for about 5-15 minutes to simulate the time it may be open to perform a large transfer of data to the Replay Core.
- Optionally, you can run it in a loop during peak hours for a short period of time to simulate frequent snapshots.
If the system is challenged for resources, there will be a risk that Exchange connectivity is compromised if the page/non-paged memory pool is exhausted. However, it is one simple test to verify if the VSS technology is going to be problematic when implementing an ongoing backup solution involving VSS.
- If there is a failure, then any application that leverages the Microsoft VSS technology may hit the same failure. Follow some of the recommendations provided in this article to improve scalability. And retry these tests. More information about Microsoft vshadow is located here: http://msdn.microsoft.com/en-us/library/bb530725(VS.85).aspx.
Monitoring Pool Usage
You can monitor pool usage in various ways prior to implementation of a VSS backup strategy.
To monitor pool usage:
- During peak production hours of the Exchange Server, monitor pool usage to get some level of required memory for Exchange operations.
- During off hours, use Replay or vshadow to take a snapshot or create a volume shadow copy in order to determine how much resources are consumed for this operation.
- Compare the results and create a sum of the two events to determine where usage may land if the two operations were performed together.
- Use Memory Pool Monitor (Poolmon.exe) from the Microsoft Windows Resource Kit or Pooltag (which has a front end GUI) to monitor the memory usage.
- A description of Poolmon can be found on the Microsoft Support page by searching for “How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernal mode memory leaks.” The application can be obtained by downloading the Windows XP Service Pack 2 Support Tools from the Microsoft Download Center.
- Pooltag is an OSR alternative to Poolmon and can be downloaded from www.osronline.com.
- Monitor memory usage when vshadow creates a persistent shadow copy volume and then again when the Replay Agent creates a persistent volume shadow copy by snapping the volume.
This will provide a view of the additional overhead of Replay vs. vshadow since both call down to Microsoft VSS to create the volume shadow copy.
The best recommendation is to move to a 64-bit operating system which does not have the aggressive paged/non-paged pool memory limitation which is found in a 32-bit operation system. This may become a good opportunity to provision a new 64-bit Windows 2008/R2 OS and upgrade Exchange to 2007 or 2010. Even migrating the existing Exchange 2003 server to a Windows 2003 64-bit server can resolve the issue. It is also important to note that VSS technology is stated to be the method used for protecting Exchange 2010 by Microsoft. Backing up on a mailbox level is not the method desired for Exchange 2010. Given this direction, it may also be a good point to evaluate upgrading the OS or Exchange version instead of investing in other non-VSS alternatives as a backup solution.
If migrating to a 64-bit operating system is not an option, refer to the following suggested workarounds.
Before you begin, adhere to the following prerequisites:
- If you need to operate within the 32-bit OS, even for a short term, verify that all the required Microsoft hot-fixes and service packs are installed.
- Install the hot-fixes noted in the following articles on Microsoft Support:
- 940349 – “Availability of a Volume Shadow Copy Service (VSS) update rollup package for Windows Server 2003 to resolve some VSS snapshot issues”
- 916841 – “The Exchange Information Store Service stops responding when the user tries to back up Exchange Server 2003 by using a third-party program”
If Replay is in place, an immediate alleviation to the issue may be found in the configuration changes noted below:
- Only snap during non-peak hours. This may be managed through the Backup Window feature of Replay to pause snaps during a specific time range during the day.
- Consider taking snaps less frequently. Each time a snap is created, a Mountability Check will occur (default setting). This operation will mount the Recovery Point. This operation of mounting the Recovery Point is memory intensive.
- Reduce the scale of the issue by breaking up the protection groups so less volumes are being snapped simultaneously. This will reduce the required memory to create the volume shadow copy.
- Configure the mailstore database, log files, and system files do not share volumes with other mailstore’s database, log or system files.
- After the files are split, retest creating a volume shadow copy with vshadow by only running for the remaining dependent volume(s). For example, if there is a mailstore database on D:, E:, F:, and G: with log files for all mailstores on L:, then all drives are within the same protection group and must be snapped together. However, if the log files for D: and E: are moved to drive M:, then there would be two different sets of dependent volumes, D:, E:, and M: as set one and F:, G: and L: as set two. The volume shadow copy would only need to be created for 3 volumes at once instead of 6.
CAUTION: When disabling mountability checks, the snapshots are not validated for database integrity. This does not invalidate the backup, but if there was an issue, there may be no indication of a problem. If performing this workaround, it is advised to run nightly jobs every night instead of weekly and perform detailed integrity checks before log truncation.
“Windows Small Business Server Best Practices Analyzer” (http://www.appassure.com/support/KB/4130334/)
NOTE: Some exchange 2k3 systems may recommend using this to boost exchange performance. If this is the case, there may be a trade off involved in considering this option.