Skip to content

Feature Request: Add AZ AIOps Dashboard - Azure Monitor Workbook for AI estate observability #100

@krishna-sunkavalli

Description

@krishna-sunkavalli

Summary

Add the AZ AIOps Dashboard as an optional observability component of the AI Landing Zone. It is an Azure Monitor Workbook that provides a single-pane-of-glass view of the entire Azure AI/Cognitive Services estate — no agents, no code, no external dependencies.

Source repo: https://github.com/krishna-sunkavalli/az-aiops-dashboard


Problem / Motivation

The AI Landing Zone provides a solid foundation for deploying AI workloads, but there is no built-in operational visibility layer. Once deployed, platform and app teams have limited means to:

  • Monitor request volumes, error rates (5xx), and throttling (429) across Azure OpenAI and Models API endpoints
  • Inventory all Cognitive Services resources and verify Diagnostic Settings are configured
  • Identify which models are being used and at what volume
  • Understand the geographic distribution of AI resources

Without this, teams manually stitch together Azure Monitor metrics and write ad hoc KQL queries — adding friction during incidents and day-to-day operations.


Proposed Solution

Add the AZ AIOps Dashboard workbook as an optional observability add-on. It is already packaged as:

  • A standalone .workbook file
  • An ARM template (azuredeploy.json) for one-click deployment
  • Deployable via Azure CLI, Azure PowerShell, or manual import

What the workbook shows (9 tabs)

Area Key Visuals
Overview Request volume split (AOAI vs Models API), resource map by region, status code trend lines, model-level request bar chart, resource inventory with Diagnostic Settings status
Request Charts HTTP status code breakdowns — 5xx errors and 429 throttling per API type
Model Usage Per-model request volumes, top models by usage
Availability Resource health and availability trends
+ more Token usage, latency, content safety (on roadmap)

Filter Parameters

Parameter Default
Subscription All
Resource Group All
Cognitive Services Resource All
Log Analytics Workspace All
Time Range Last 24 hours

Prerequisites

  • Azure Reader role on Cognitive Services resources
  • Log Analytics Workspace — already provisioned by the AI Landing Zone

Why it fits the AI Landing Zone

  1. WAF Operational Excellence — Observability is a core WAF pillar; the AI LZ already references WAF AI workload guidance.
  2. Zero additional infrastructure — Consumes the Log Analytics Workspace already provisioned by the landing zone.
  3. CAF alignment — Directly supports the monitoring and management design areas of the AI LZ Design Checklist.
  4. No new dependencies — Pure Azure Monitor Workbook with no code changes to existing IaC required.
  5. One-click deployment — ARM template is ready for an optional Deploy to Azure button.

Implementation Proposal

  • Add workbook source and ARM template under observability/aiops-dashboard/
  • Add an Observability section to the README with a Deploy to Azure button
  • Optionally surface as an optional module in the Bicep/Terraform implementations

Happy to contribute the implementation — PR incoming.


References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions