Tuesday, 11 November 2014

Hidden Behind the Load Balancer

Of course you would have been working hard to place your Exchange CASArray’s and Webmail services behind a decent load balancer with decent health checks. (BTW you are only as good as your health checks!!)

Never the less, even in the best designed solutions there is always cracks, things that are missed or worse when things kinda fail but kinda still work. Of course for users they speak first to colleagues, neighbours, Twitter, Facebook and then finally the IT service desk. 

It’s always helpful to confirm which back end client access server the user is currently connecting to, adding this information into an existing ticket will assist confirming if you are indeed have an issue with a single server in your farm.

From the outlook client you can output a number of connection setting via the Connection Status Window. (http://technet.microsoft.com/en-us/library/bb123650(v=exchg.65).aspx)

Of course this is helpful with some aspects such as confirming which type of connection is being used or the number of failed connection being made are important but this fails to confirm which client access server the user is actually hitting behind the load balancer.

OWA is a little better. We have to love the “about” button which cuts straight to the chase and provides details for which Exchange Client Access server is in use and even what roles are used by this server.

Better still is the requirement to all the connections in one go. To do this you can use the following: 

Get-LogonStatistics -id UserID | select clientname, servername, username, applicationid | ft

I was disappointed to hear last year at TechEd that this CMDLet is no longer going to be supported and used in Exchange 2013. Microsoft have been asking for business cases for some time but clearly it lost. So what can we do from here? 

Well moving forward you can always trace the IIS and RPC logs for user connections. Here's one I created a while back. It's a simple script to track the users and can be used to find anything in the logs such as throttled users. 

$PathCAS001 = "\\cas001\C$\inetpub\logs\LogFiles\W3SVC1"
$PathCAS002 = "\\cas002\C$\inetpub\logs\LogFiles\W3SVC1"
$Getdays = 1
$outputpath = "C:\Scripts\IISfind\Output"

$Date = Get-Date
$DateShort = (Get-Date).ToString('yyyyMMdd')

$SearchValue = Read-Host "Please enter a value to seach the IISLogs (eg. UserID):"
    add-content -path "$($outputpath)\$($DateShort)IISFind_Output.txt" -value "$date --------------- CAS001"
    Get-Item -Path "$PathCAS001\*" | ?{$_.LastWriteTime -gt $($(get-date).adddays(-$Getdays))} |  get-content | Select-String $SearchValue | add-content -path "$($outputpath)\$($DateShort)IISFind_Output.txt"
    add-content -path "$($outputpath)\$($DateShort)IISFind_Output.txt" -value "$date --------------- CAS002"
    Get-Item -Path "$PathCAS002\*" | ?{$_.LastWriteTime -gt $($(get-date).adddays(-1))} |  get-content | Select-String $SearchValue | add-content -path "$($outputpath)\$($DateShort)IISFind_Output.txt"
Another helpful tool you could also use is Log Parser (http://technet.microsoft.com/en-au/scriptcenter/dd919274.aspx) which assists tracing these logs. 

For RPC searches just change the source directory to the RPC Log directory and rerun the script across all your Client Access Servers.

I spoke to Scott Schnoll a while back about it here is my case for Get-LogonStatistics. 

Subject: Case Study: Get-LogonStatistics

Hi Scott,

We had a quick chat on Friday and I said I would forward you a case we encountered which required the use of the get-logonstatistics CMDlet.

User Issue: Users were experiencing bad outlook performance to a single datacentre with numerous connects and disconnects throughout a single day.

Core Issue: Due to recent Junos firewall upgrade in the core we found that this firewall had enabled ALG for RPC traffic enabled which was dropping active RPC connections to this datacentre. This would result in outlook not responding and making a new connection to the CAS server. These new connections in turn created the problem that users were starting to breach their throttling limits. MS Support recommended increasing throttling limits which in reality made no difference to outlook performance.

[KB18141] - Microsoft Services are unavailable after upgrade to Junos 10.1 and later versions
MS- RPC ALG is available and enabled by default on SRX-Branch and J-Series platforms running Junos 10.0 and later. However beginning with Junos 10.1, the MS-RPC ALG was added for SRX-HE platforms and enabled by default. This may cause issues with Microsoft traffic such as Exchange and Active Directory (refer to PSN-2010-08-912

Overview: Get-LogonStatistics assisted this issue as I could directly compare between the client and server and confirmed that the CAS server thought it was still holding an active connections when in fact it had been closed by the firewall.