Add robust monitoring of Azure Virtual Desktop using Azure Monitor alerts

Azure Virtual Desktop has become a hot topic. COVID has forced the adoption of remote working unlike anything else has in recent memory. Here's how we can monitor the environment, so you can stay on top of issues with access and desktop experience before your users tell you.


Enable Azure Monitoring for your hosts and install the agent. Update the Event logs it captures.


Conduct Log queries using Kusto Query Language to extract useful information.

Set alerts based on query results.


Monitor FSLogix using Event Signal. FSLogix is recommended as the underlying File System that supports profiles being attached to hosts in Azure Virtual Desktop. There are lots of considerations when using FSLogix. I'm only going to touch on one EventID. It contains lots of different types of errors, however, it shares the same EventID: 26.

Condition Signal: Event



Monitor for "No available resources" error.

WVDErrors
| where CodeSymbolic == "ConnectionFailedNoHealthyRdshAvailable" and Message contains "Could not find any SessionHost available in specified pool"



Monitor for Failed Connections.

WVDConnections
| where State =~ "Started" and Type =~"WVDConnections"
| extend Multi=split(_ResourceId, "/") | extend CState=iff(SessionHostOSVersion=="<>","Failure","Success")
| where CState =~"Failure"
| order by TimeGenerated desc
| where State =~ "Started" | extend Multi=split(_ResourceId, "/")
| project ResourceAlias, ResourceGroup=Multi[4], HostPool=Multi[8], SessionHostName, UserName, CState=iff(SessionHostOSVersion=="<>","Failure","Success"), CorrelationId, TimeGenerated
| join kind= leftouter (WVDErrors) on CorrelationId
| extend DurationFromLogon=datetime_diff("Second",TimeGenerated1,TimeGenerated)
| project  TimeStamp=TimeGenerated, DurationFromLogon, UserName, ResourceAlias, SessionHost=SessionHostName, Source, CodeSymbolic, ErrorMessage=Message, ErrorCode=Code, ErrorSource=Source ,ServiceError, CorrelationId
| order by TimeStamp desc



Monitor Session hosts available Memory when it drops to under 1GB.

Perf
| where ObjectName == "Memory"
| where CounterName == "Available Mbytes"
| where CounterValue <= 1024



Monitor Session hosts memory as % committed bytes when it's above 80%.  This represents how much the CPU is having to do extra work by referring to pagefile.

Signal: % Committed Bytes in Use




Monitor for when Session hosts are simply Out of Memory.
WVDErrors
| where CodeSymbolic == "OutOfMemory" and Message contains "The user was disconnected because the session host memory was exhausted."

Here's the JSON to deploy the whole thing.
Action Group:

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "actionGroupName": {
      "type": "string",
      "metadata": {
        "description": "Unique name (within the Resource Group) for the Action group."
      }
    },
    "actionGroupShortName": {
      "type": "string",
      "metadata": {
        "description": "Short name (maximum 12 characters) for the Action group."
      }
    },
	"alertEmailAddress": {
	"type": "string",
      "metadata": {
        "description": "Should be Azure_Alerts@domain.com"
		}
  }
  },
  "resources": [
    {
      "type": "Microsoft.Insights/actionGroups",
      "apiVersion": "2019-03-01",
      "name": "[parameters('actionGroupName')]",
      "location": "Global",
      "properties": {
        "groupShortName": "[parameters('actionGroupShortName')]",
        "enabled": true,
        "emailReceivers": [
          {
            "name": "Email RR Desk",
            "emailAddress": "[parameters('alertEmailAddress')]",
            "useCommonAlertSchema": true
          }
        ]
      }
    }
  ],
  "outputs":{
      "actionGroupId":{
          "type":"string",
          "value":"[resourceId('Microsoft.Insights/actionGroups',parameters('actionGroupName'))]"
      }
  }
}

Alerts in JSON:

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.j
son#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "client_Name": {
       "defaultValue": "(Client Name)",
       "type": "String"
    },
    "subscription_Id": {
       "defaultValue": "/subscriptions/(subId)",
       "type": "String"
    },
    "activityLogAlerts_name1": {
       "defaultValue": "WVD Service Health Alert",
       "type": "String"
    },
    "scheduledRule_name1": {
        "defaultValue": "WVD 'No available resources'",
        "type": "String"
    },
    "scheduledRule_name2": {
        "defaultValue": "WVD Available Host Memory",
        "type": "String"
    },
    "scheduledRule_name3": {
        "defaultValue": "WVD Failed Connections",
        "type": "String"
    },
    "scheduledRule_name4": {
        "defaultValue": "WVD Error - Out of Memory",
        "type": "String"
    },
    "metricAlerts_name1": {
        "defaultValue": "WVD Pct Processor committed bytes utilization",
        "type": "String"
    },
    "metricAlerts_name2": {
        "defaultValue": "(storAcctName) Capacity Alert",
        "type": "String"
    },
    "metricAlerts_name3": {
        "defaultValue": "WVD FSLogix Errors",
        "type": "String"
    },
    "storageAcct_Region": {
        "defaultValue": "uswest2",
        "type": "String"
    },
    "storageAcct_ThresholdTB": {
        "defaultValue": "24TB",
        "type": "String"
    },
    "storageAcct_ThresholdBytes": {
        "defaultValue": "26388279066624",
        "type": "String"
    },
    "workspaces_externalId": {
        "defaultValue": "/subscriptions/(subId)/resourceGroups/(rgName)/providers/Microso
ft.OperationalInsights/workspaces/(LAWName)",
        "type": "String"
    },
    "storageAccounts_externalId": {
        "defaultValue": "/subscriptions/(subId)/resourceGroups/(rgName)/providers/Microso
ft.Storage/storageAccounts/(storAcctName)",
        "type": "String"
    },
    "actiongroups_EmailDesk_externalId": {
      "defaultValue": "/subscriptions/(subId)/resourceGroups/(rgName)/providers/microsoft
.insights/actionGroups/(actionGroupName)",
      "type": "String"
    }
  },
  "variables": {},
  "resources": [
    {
      "type": "microsoft.insights/activityLogAlerts",
      "apiVersion": "2020-10-01",
      "name": "[concat(parameters('client_Name'), '- ',parameters('activityLogAlerts_Name
1'))]",
      "location": "Global",
      "properties": {
        "scopes": [
          "[parameters('subscription_Id')]"
        ],
        "condition": {
          "allOf": [
            {
              "field": "category",
              "equals": "ServiceHealth"
            },
            {
              "field": "properties.impactedServices[*].ServiceName",
              "containsAny": [
                "Windows Virtual Desktop"
              ]
            },
            {
              "field": "properties.impactedServices[*].ImpactedRegions[*].RegionName",
              "containsAny": [
                "East US",
                "East US 2",
                "Global",
                "South Central US",
                "West US",
                "West US 2"
              ]
            }
          ]
        },
        "actions": {
          "actionGroups": [
            {
              "actionGroupId": "[parameters('actiongroups_EmailDesk_externalId')]",
              "webhookProperties": {}
            }
          ]
        },
        "enabled": true,
        "description": "[concat(parameters('client_Name'), '- ',parameters('activityLogAl
erts_Name1'))]"
      }
    },
    {
        "type": "microsoft.insights/scheduledqueryrules",
        "apiVersion": "2021-02-01-preview",
        "name": "[concat(parameters('client_Name'), '- ',parameters('scheduledRule_name1'
))]",
        "location": "eastus2",
        "properties": {
          "displayName": "[concat(parameters('client_Name'), '- ',parameters('scheduledRu
le_name1'))]",
          "description": "[concat(parameters('client_Name'), '- ',parameters('scheduledRu
le_name1'))]",
          "severity": 1,
          "enabled": true,
          "evaluationFrequency": "PT5M",
          "scopes": [
            "[parameters('workspaces_externalId')]"
          ],
          "windowSize": "PT15M",
          "criteria": {
            "allOf": [
              {
                "query": "WVDErrors\n| where CodeSymbolic == \"ConnectionFailedNoHealthyR
dshAvailable\" and Message contains \"Could not find any SessionHost available in specifi
ed pool\"\n",
                "timeAggregation": "Count",
                "operator": "GreaterThan",
                "threshold": 20,
                "failingPeriods": {
                  "numberOfEvaluationPeriods": 1,
                  "minFailingPeriodsToAlert": 1
                }
              }
            ]
          },
          "autoMitigate": false,
          "actions": {
            "actionGroups": [
              "[parameters('actiongroups_EmailDesk_externalId')]"
              ]
              }
      }
    },
    {
        "type": "microsoft.insights/metricalerts",
        "apiVersion": "2018-03-01",
        "name": "[concat(parameters('client_Name'), '- ',parameters('metricAlerts_name1')
)]",
        "location": "global",
        "properties": {
            "description": "[concat(parameters('client_Name'), '- ',parameters('metricAle
rts_name1'))]",
            "severity": 2,
            "enabled": false,
            "scopes": [
                "[parameters('workspaces_externalId')]"
            ],
            "evaluationFrequency": "PT5M",
            "windowSize": "PT5M",
            "criteria": {
                "allOf": [
                    {
                        "threshold": 80,
                        "name": "Metric1",
                        "metricNamespace": "Microsoft.OperationalInsights/workspaces",
                        "metricName": "Average_% Committed Bytes In Use",
                        "operator": "GreaterThanOrEqual",
                        "timeAggregation": "Maximum",
                        "criterionType": "StaticThresholdCriterion"
                    }
                ],
                "odata.type": "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriter
ia"
            },
            "autoMitigate": false,
            "targetResourceType": "Microsoft.OperationalInsights/workspaces",
            "actions": [
                {
                    "actionGroupId": "[parameters('actiongroups_EmailDesk_externalId')]",
                    "webHookProperties": {}
                }
            ]
            }
      },
    {
        "type": "microsoft.insights/metricalerts",
        "apiVersion": "2018-03-01",
        "name": "[concat(parameters('client_Name'), '- ',parameters('metricAlerts_name2')
)]",
        "location": "global",
        "properties": {
            "description": "[concat(parameters('metricAlerts_name2'), '- ',parameters('st
orageAcct_ThresholdTB'), ' ',parameters('metricAlerts_name2'))]",
            "severity": 1,
            "enabled": true,
            "scopes": [
                "[concat(parameters('storageAccounts_externalId'), '/fileServices/default
')]"
            ],
            "evaluationFrequency": "PT5M",
            "windowSize": "PT1H",
            "criteria": {
                "allOf": [
                    {
                        "threshold": "[parameters('storageAcct_ThresholdBytes')]",
                        "name": "Metric1",
                        "metricNamespace": "microsoft.storage/storageaccounts/fileservice
s",
                        "metricName": "FileCapacity",
                        "dimensions": [
                            {
                                "name": "FileShare",
                                "operator": "Include",
                                "values": [
                                    "fshare"
                                ]
                            }
                        ],
                        "operator": "GreaterThanOrEqual",
                        "timeAggregation": "Average",
                        "criterionType": "StaticThresholdCriterion"
                    }
                ],
                "odata.type": "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriter
ia"
            },
            "autoMitigate": false,
            "targetResourceType": "Microsoft.Storage/storageAccounts/fileServices",
            "targetResourceRegion": "[parameters('storageAcct_Region')]",
            "actions": [
                  {
                    "actionGroupId": "[parameters('actiongroups_EmailRRDesk_externalId')]
",
                    "webhookProperties": {}
                  }
            ]
        }
    },
    {
        "type": "microsoft.insights/scheduledqueryrules",
        "apiVersion": "2021-02-01-preview",
        "name": "[concat(parameters('client_Name'), '- ',parameters('scheduledRule_name2'
))]",
        "location": "eastus2",
        "properties": {
          "displayName": "[concat(parameters('client_Name'), '- ',parameters('scheduledRu
le_name2'))]",
          "description": "[concat(parameters('scheduledRule_name2'), ' below 1024(1GB)')]
",
          "severity": 2,
          "enabled": true,
          "evaluationFrequency": "PT5M",
          "scopes": [
            "[parameters('workspaces_externalId')]"
          ],
          "windowSize": "PT5M",
          "criteria": {
            "allOf": [
              {
                "query": "Perf\n| where ObjectName == \"Memory\"\n| where CounterName ==
\"Available Mbytes\"\n| where CounterValue <= 1024\n",
                "timeAggregation": "Count",
                "operator": "GreaterThanOrEqual",
                "threshold": 1,
                "failingPeriods": {
                  "numberOfEvaluationPeriods": 1,
                  "minFailingPeriodsToAlert": 1
                }
              }
            ]
          },
          "autoMitigate": false,
          "actions": {
            "actionGroups": [
              "[parameters('actiongroups_EmailDesk_externalId')]"
              ]
              }
    }
},
    {
        "type": "microsoft.insights/scheduledqueryrules",
        "apiVersion": "2021-02-01-preview",
        "name": "[concat(parameters('client_Name'), '- ',parameters('scheduledRule_name3'
))]",
        "location": "eastus2",
        "properties": {
          "displayName": "[concat(parameters('client_Name'), '- ',parameters('scheduledRu
le_name3'))]",
          "description": "[concat(parameters('scheduledRule_name3'), ' - More than 10 fai
led connections in 15 minutes.')]",
          "severity": 2,
          "enabled": true,
          "evaluationFrequency": "PT5M",
          "scopes": [
            "[parameters('workspaces_externalId')]"
          ],
          "windowSize": "PT15M",
          "criteria": {
            "allOf": [
              {
                "query": "WVDConnections\n| where State =~ \"Started\" and Type =~\"WVDCo
nnections\"\n| extend Multi=split(_ResourceId, \"/\") | extend CState=iff(SessionHostOSVe
rsion==\"<>\",\"Failure\",\"Success\")\n| where CState =~\"Failure\"\n| order by TimeGene
rated desc\n| where State =~ \"Started\" | extend Multi=split(_ResourceId, \"/\")\n| proj
ect ResourceAlias, ResourceGroup=Multi[4], HostPool=Multi[8], SessionHostName, UserName,
CState=iff(SessionHostOSVersion==\"<>\",\"Failure\",\"Success\"), CorrelationId, TimeGene
rated\n| join kind= leftouter (WVDErrors) on CorrelationId\n| extend DurationFromLogon=da
tetime_diff(\"Second\",TimeGenerated1,TimeGenerated)\n| project  TimeStamp=TimeGenerated,
 DurationFromLogon, UserName, ResourceAlias, SessionHost=SessionHostName, Source, CodeSym
bolic, ErrorMessage=Message, ErrorCode=Code, ErrorSource=Source ,ServiceError, Correlatio
nId\n| order by TimeStamp desc\n",
                "timeAggregation": "Count",
                "operator": "GreaterThanOrEqual",
                "threshold": 10,
                "failingPeriods": {
                  "numberOfEvaluationPeriods": 1,
                  "minFailingPeriodsToAlert": 1
                }
              }
            ]
          },
          "autoMitigate": false,
          "actions": {
            "actionGroups": [
              "[parameters('actiongroups_EmailDesk_externalId')]"
              ]
              }
        }
    },
    {
        "type": "microsoft.insights/metricAlerts",
        "apiVersion": "2018-03-01",
        "name": "[concat(parameters('client_Name'), '- ',parameters('metricAlerts_name3')
)]",
        "location": "global",
        "properties": {
          "description": "[parameters('metricAlerts_name3')]",
          "severity": 1,
          "enabled": true,
          "scopes": [
            "[parameters('workspaces_externalId')]"
          ],
          "evaluationFrequency": "PT5M",
          "windowSize": "PT15M",
          "criteria": {
            "allOf": [
              {
                "threshold": 0,
                "name": "Metric1",
                "metricNamespace": "Microsoft.OperationalInsights/workspaces",
                "metricName": "Event",
                "dimensions": [
                  {
                    "name": "EventLog",
                    "operator": "Include",
                    "values": [
                      "Microsoft-FSLogix-Apps/Operational"
                    ]
                  },
                  {
                    "name": "EventID",
                    "operator": "Include",
                    "values": [
                      "26"
                    ]
                  }
                ],
                "operator": "GreaterThan",
                "timeAggregation": "Total",
                "criterionType": "StaticThresholdCriterion"
              }
            ],
            "odata.type": "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria"
          },
          "autoMitigate": false,
          "targetResourceType": "Microsoft.OperationalInsights/workspaces",
          "actions": [
              {
                "actionGroupId": "[parameters('actiongroups_EmailDesk_externalId')]",
                "webhookProperties": {}
              }
            ]
        }
    },
    {
        "type": "microsoft.insights/scheduledqueryrules",
        "apiVersion": "2021-02-01-preview",
        "name": "[concat(parameters('client_Name'), '- ',parameters('scheduledRule_name4'
))]",
        "location": "eastus2",
        "properties": {
          "displayName": "[concat(parameters('client_Name'), '- ',parameters('scheduledRu
le_name4'))]",
          "description": "[parameters('scheduledRule_name4')]",
          "severity": 1,
          "enabled": true,
          "evaluationFrequency": "PT5M",
          "scopes": [
            "[parameters('workspaces_externalId')]"
          ],
          "windowSize": "PT30M",
          "criteria": {
            "allOf": [
              {
                "query": "WVDErrors\n| where CodeSymbolic == \"OutOfMemory\" and Message
contains \"The user was disconnected because the session host memory was exhausted.\"\n",
                "timeAggregation": "Count",
                "operator": "GreaterThan",
                "threshold": 20,
                "failingPeriods": {
                  "numberOfEvaluationPeriods": 1,
                  "minFailingPeriodsToAlert": 1
                }
              }
            ]
          },
          "autoMitigate": false,
          "muteActionsDuration": "PT1H",
          "actions": {
            "actionGroups": [
              "[parameters('actiongroups_EmailDesk_externalId')]"
              ]
              }
    }
}
]
}


Leave a comment