New Pool HA Features in Lync2013 27 Nov 2012

Lync2013 introduces some exciting new disaster recovery abilities where two or more Front End pools across two geographically dispersed sites can be "Paired" together for redundant failover. Each site can contain a Front End pool which is paired with a corresponding Front End pool in the other site. Both sites are active, and the new Lync2013 Server Backup Service provides real-time data replication to keep the pools synchronized. The Backup Service is a new feature in Lync Server 2013, designed to support the disaster recovery solution.
It is installed on a Front End pool when you pair the pool with another Front End pool. There is no restriction on the distance between two data centers that are to include Front End pools paired with each other but high-speed links between them are recommended.
If the pool in one site fails, you can fail over the users from that pool to the pool in the other site, which then provides services to all the users in both pools. In addition to providing disaster recovery ability, two paired pools serve as the backup Registrars for each other. In Lync Server 2013, backup Registrar relationships between Front End pools are always 1:1 and reciprocal. This is a change from Lync Server 2010, in which Front End pool backup relationships could be many to one. Even though backup relationships between two Front End pools must be 1:1 and symmetrical, each Front End pool can still also be the backup registrar for any number of Survivable Branch Appliances, just as in Lync Server 2010.
This article explores in detail how to setup this pairing relationship and puts the failover capabilities to the test in a lab environment.
It is installed on a Front End pool when you pair the pool with another Front End pool. There is no restriction on the distance between two data centers that are to include Front End pools paired with each other but high-speed links between them are recommended.
If the pool in one site fails, you can fail over the users from that pool to the pool in the other site, which then provides services to all the users in both pools. In addition to providing disaster recovery ability, two paired pools serve as the backup Registrars for each other. In Lync Server 2013, backup Registrar relationships between Front End pools are always 1:1 and reciprocal. This is a change from Lync Server 2010, in which Front End pool backup relationships could be many to one. Even though backup relationships between two Front End pools must be 1:1 and symmetrical, each Front End pool can still also be the backup registrar for any number of Survivable Branch Appliances, just as in Lync Server 2010.
This article explores in detail how to setup this pairing relationship and puts the failover capabilities to the test in a lab environment.
Recap: Lync2010 Pool Failover
To recap, Lync Server 2010 did provide primary and backup registrar pools but this was primarily to assure voice resiliency in the event of a central site failure. The primary Registrar pool must have a single designated backup Registrar pool located at another site (although same site pools are also allowed but serves no real DR purpose). The backup can be configured by using Topology Builder resiliency settings. Assuming a resilient WAN link between the two sites, users whose primary Registrar pool is no longer available are automatically directed to the backup Registrar pool. the picture below shows a Lync2010 in resiliency mode where its primary pool is down and it has registered with the backup pool:
In this mode, most voice related functionality is still available such as:
- PSTN Inbound outbound calls (carrier provided) - Intra-Site and Inter site calls - Hold, Retrieve, Transfer - 2 Party Intra Site IM and Audio/Video - Call Detail Records (CDR) - Call Forwarding, Simultaneous Ringing, Delegation, Team-call - Join conferences scheduled by users homed on other pool. What's not available would be: - Schedule IM, A/V & Web Conferences - Presence and Do Not Disturb (DND) based routing - Updating Call Forwarding settings - Response Group Service & Call Park - Voicemail Deposit (Redirect to Exchange UM in the DC) - Voicemail Retrieve (through PSTN) An excellent blog post has been written regarding this at this blog. |
Lync2013 Pool Failover/Failback time objectives
For pool failover and pool failback, the engineering target for RTO is 15 minutes – the time required for the failover to happen, after administrators have determined that there was a disaster and initiated failover procedures. For pool failover and pool failback, the engineering target for RPO is 15 minutes – the time measure of data that could be lost due to the disaster, due to replication latency of the Backup Service. All RTO and RPO numbers assume that the two data centers are located within the same world region with high-speed, low-latency transport between the two sites
Setting up Lync2013 Pool Pairing
To setup Lync2013 Pool Pairing, we need to have two Lync2013 pools. If you haven't already installed Lync2013, check out my previous blog post here. In this article, I have already installed two Standard Edition FE pools as shown in the topology builder below:
Configuring the two FE Pools above to be backup for each other is relatively straightforward. From the Topology Builder, right-click one of the FE servers (any one will do) and select "Edit Properties..." In the properties window, select the "Resliency" tab on the left. Then select the "Associated backup Pool" checkbox and from the drop down list, select the other FE server as shown below:
Note that the default intervals for is 300s and 600s. For this test, we will set both to 30s. Go ahead and select the "Automatic failover and failback for Voice:" checkbox and enter 30 in both boxes as shown below:
Once the first FE server is configured, the other paired FE server will also be automatically configure the first FE server to be its backup registrar, thereby creating a 1:1 reciprocal relationship. We can verify this by looking at the properties of the other FE server. Proceed to publish the topology and verify that it was successful. Following this, we need to run local setup on each of the Front End Servers and then restart the servers just to be sure. Finally, start the CS Backup Service by running the powershell cmdlet Invoke-CsBackupServiceSync on one of the FE server. If you encounter some WCF permissions error running this cmdlet, you need to add yourself to the RTCUniversalServerAdmins” group, re-login to windows and rerun the cmdlet. A sucessful result looks like the screen capture below:
Configuring and Monitoring the Backup Service
One of the key components of the Pool HA capability is the Backup Service that runs automaticaly on the front end pools that are in a paired relationship with another pool. The Backup Service ensure that all the necessary information in one pool pair is replicated to the other pool pair. In the event of failover, the active pool will contain the necessary database information required to hosts users from the failed pool.
The following new Powershell commands are for managing the Backup Service:
Get–CsBackupServiceConfiguration (Default sync interval is 2 minutes)
Set-CsBackupServiceConfiguration –SyncInterval 00:03:00
Get-CSBackupServiceStatus –PoolFQDN (Get service stats)
Get-CSPoolBackupRelationship –PoolFQDN (Who am I paired with?)
Invoke-CSBackupServiceSync –PoolFQDN [-BackupModule {All|PresenceFocus|DataConf|CMSMaster}] (Force Sync)
The following new Powershell commands are for managing the Backup Service:
Get–CsBackupServiceConfiguration (Default sync interval is 2 minutes)
Set-CsBackupServiceConfiguration –SyncInterval 00:03:00
Get-CSBackupServiceStatus –PoolFQDN (Get service stats)
Get-CSPoolBackupRelationship –PoolFQDN (Who am I paired with?)
Invoke-CSBackupServiceSync –PoolFQDN [-BackupModule {All|PresenceFocus|DataConf|CMSMaster}] (Force Sync)
Central Management store failover
The Central Management store contains configuration data about servers and services in a Lync 2013 deployment. It provides robust, schematized storage of the data needed to define, set up, maintain, administer, describe, and operate a Lync 2013 deployment. It also validates the data to ensure configuration consistency. Each Lync deployment includes one Central Management store, which is hosted by the Back End Server of one Front End pool. If the pool hosting the Central Management store fails over, the Central Management store is failed over as well.
When you establish a pool pairing that includes the pool hosting the Central Management store, a backup Central Management store database is set up in the backup pool, and Central Management store services are installed in both pools. At any point in time, one of the two Central Management store databases is the active master, and the other is a standby. The content is replicated by the Backup Service from the active master to the standby.
During a pool failover that involves the pools hosting the Central Management store, the administrator must fail over the Central Management store before failing over the Front End pool. After the disaster is repaired, it is not necessary to fail back the Central Management store. After repair, the Central Management store in the original backup pool can remain as the active master.
When you establish a pool pairing that includes the pool hosting the Central Management store, a backup Central Management store database is set up in the backup pool, and Central Management store services are installed in both pools. At any point in time, one of the two Central Management store databases is the active master, and the other is a standby. The content is replicated by the Backup Service from the active master to the standby.
During a pool failover that involves the pools hosting the Central Management store, the administrator must fail over the Central Management store before failing over the Front End pool. After the disaster is repaired, it is not necessary to fail back the Central Management store. After repair, the Central Management store in the original backup pool can remain as the active master.
Simulating a Pool Failure
Before initiating a Pool Failover test, let's have 2 Lync users make P2P video call under normal conditions. In this instance, both users are on the Lync2013 FE Pool which is configured with the Lync15 FE Pool as the backup pair. Below shows the screen capture of the call in progress.
At this stage, let's shutdown the FE server Lync2013.apbeta.local and see what happens to the Lync client. After a few minutes, the client will show that
there is an outage and that limited functionality is available. Presence is no longer available but the current call is still connected and ongoing without any issues. |
Initiating a Pool Failover
It's now time to test the pool failover scenario. Since we will be performing a pool failover that involves the pool hosting the Central Management store, in this case Lync2013.apbeta.local, we must fail over the Central Management store before failing over the Front End pool. to do this, we run the Invoke-CsManagementServerFailover cmdlet on the still surviving FE Pool Lync15.apbeta.local:
Now that the CMS database has failed over to the other FE pool, we can initiate the actual Pool Failover using the Invoke-CsPoolFailover -PoolFQDN Lync2013.apbeta.local -DisasterMode cmdlet:
Once the pool failover has suceeded, the Lync clients are restored to normal functionality with presence enabled, all the while the video call is still ongoing and did not drop at any time during the test:
Pool Failback
Assuming now that the first FE Pool Lync2013 has come back online and we wish to failback users to this pool, we can run the following cmdlet : invoke-CSPoolFailback -PoolFQDN Lync2013.apbeta.local. Users do not experience any outage during the failback process. We do not need to failback the CMS database anymore and it is fine to leave it running on the second Pool Lync15.apbeta.local:
Conclusion
In conclusion, the new Pool Resliency features in Lync2013 offer a much better user experience with its Pool Pairing and failover/failback capabilities. Add SQL Mirroring (to be discussed in a future article) and we truly have an enterprise-ready HA and DR solution offered with Lync2013 which will greatly benefit both users and administrators.