Recommended Azure Monitors

news/2024/5/19 14:49:25

General

This document describes the recommended Azure monitors which can be implemented in Azure cloud application subscriptions.

SMT incident priority mapping

The priority “Blocker” is mostly used by Developers to prioritize their tasks and its not applicable for operations team.

0-CRITICALCritical<= 4 hrs
1-ERRORHigh<= 12hrs
2-WARNINGMedium<= 48hrs (2days)
3 - InformationalLow<= 96hrs (4days)
4 - VerboseNo TicketAction based on the notification and analysis

Recommended Azure Monitors

All ResourcesResource HealthResource HealthPrevious resource status=All, Current resource status=AllAlwaysCurrent status4 - VerboseMS teamsIncluded all future resource groups and future resourcesExcluding “Virtual machine instance from VMSS”
All ResourcesService HealthService HealthEvent types: Service issue, Planned maintenance , Health advisories, Security AdvisoriesAlwaysCurrent status4 - VerboseMS teamsRegions : North Europe, West EuropeServices: Alerts & Metrics, Activity Logs & Alerts and 21 more
Azure SQL DatabaseCPUMetricapp_cpu_percent > 805 mins1 hour2-WARNINGEmail
Azure SQL DatabaseCPUMetricapp_cpu_percent > 955 mins1 hour1-ERRORMS teams & Email
Azure SQL DatabaseMemoryMetricapp_memory_percent > 805 mins1 hour2-WARNINGEmail
Azure SQL DatabaseMemoryMetricapp_memory_percent > 955 mins1 hour1-ERRORMS teams & Email
Azure SQL DatabaseSpaceMetricallocated_data_storage greater or less than dynamic threshold15 mins1 hour2-WARNINGEmail
AKS - NodeNode CPUMetricnode_cpu_usage_percentage > 8015 mins1 hour2-WARNINGEmailName of the node Include True
AKS - NodeNode MemoryMetricnode_memory_working_set_percentage > 8015 mins1 hour2-WARNINGEmailName of the node Include True
AKS - NodeNode DiskMetricnode_disk_usage_percentage > 8015 mins1 hour2-WARNINGEmailName of the node Include True
AKS - NodeNode Status (NotReady,Unknown)Metrickube_node_status_condition > 05 mins15 mins2-WARNINGEmail
AKS - PodsPods phases (Failed,Unknown,Pending)Metrickube_pod_status_phase >= 15 mins30 mins2-WARNINGEmailPhase of the pod Include Failed,Unknown,Pending
AKS - PodsUnschedulable PodsMetricunschedulable > 115 mins1 hour2-WARNINGEmail
AKS - PodsPods ready state percentageMetricpodReadyPercentage(preview)2-WARNINGEmail
AKS - ContainersRestarting ContainersMetricrestarting container count(preview)2-WARNINGEmail
AKS - ContainersOOM killed containersMetricoomKilledContainerCount)preview)2-WARNINGEmail
AKS - ContainersCPU Exceeded PercentageMetriccpuExceededPercentage (preview)2-WARNINGEmail
AKS - ContainersMemory working set exceeded percentageMetricmemoryWorkingSetExceededPercentage(preview)2-WARNINGEmail
Application GatewayUnhealthy backend HostMetricUnhealthyHostCount > 01 min5 mins0-CRITICALMS teams & Email
Application GatewayFailed RequestsMetricFailedRequests > 1005 mins15 mins2-WARNINGEmail
Load balancerSNAT Connection Status CountMetricSnatConnectionCount >= 15 mins15 mins2-WARNINGEmailConnection State = Failed, Pending
Public IP AddressesUnder DDoS attack or notMetricIfUnderDDoSAttack > 01 min5 mins0-CRITICALMS teams & Email
Virtual machine scalesetCPU UsageMetricPercentage CPU > 9015 mins1 hour2-WARNINGEmail
Container RegistryStorage UsedMetricStorageUsed > 90% of Storage size included in the SKU15 mins1 hour3 - InformationalEmailReview this which SKU of ACR has this metric
LogicAppRunsFailedMetricRunsFailed>01 hour12 hours3 - InformationalEmail
Log Analytics WorkspaceContainer SIGKILL ErrorLogsTable rows Count > 015 mins15 mins2-WARNINGEmailSignal KILL error Expand source
Log Analytics WorkspaceWAF_Possible_DDoS_DetectedLogs Querycount_ > 100015 mins15 mins1 - ErrorMS teams & EmailWAF_Possible_DDoS_Detected Expand source
Log Analytics workspaceNode-restart-delayed triggered by KuredLogs Query2-WARNINGEmailNode-restart-delayed Expand source
Log Analytics workspaceNode-restart-successful-Kured ActionLogs QueryOBSOLETENode-restart-successful Expand source
Azure SQL Database / serverVulnerability Scan ReportVulnerability Scan Report
FailureFailure Anomalies - ETAS-BCP-PT-Forensic-Logic-App Failure Anomalies detected 3 - Informational etas-bcp-pt-forensic-logic-app Application Insights Smart detector

Requirements

ACRACR - To trigger alert when Create or Update Images from the ACR?
SQL DBSQL DB - Slow / Long running Queries?
Service Principal secret / certificate expiry?
AKSCheck if we can sent an alert if k8s is not able to scale in new workernode
VISUALIZATION KURED/AKS ALERTSCurrently we dont have a Dashboard / Vis for kured alertsA overview over time would be helpful to



Refer : https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-overview
Overview diagram of Container insights



https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-overview
Diagram that explains Azure Monitor alerts.


http://www.mrgr.cn/p/04527264

相关文章

主打熟人双向社交,UXLINK 如何用群组打造超强社交生态

社交&#xff0c;作为最强 Web3 流量入口 Web2 世界里&#xff0c;社交产品总是最具想象力。全球使用 Facebook 系列产品的日活用户&#xff08;DAP&#xff09;均值近 30 亿人&#xff0c;占全球人口的 1/3。然而&#xff0c;加密货币用户仅约有 4.2 亿&#xff0c;占全球人口…

Apache RocketMQ ACL 2.0 全新升级

我们推出了 RocketMQ ACL 2.0 升级版,进一步提升 RocketMQ 数据的安全性。本文将介绍 RocketMQ ACL 2.0 的新特性、工作原理,以及相关的配置和实践。作者:徒钟 引言 RocketMQ 作为一款流行的分布式消息中间件,被广泛应用于各种大型分布式系统和微服务中,承担着异步通信、系…

说说你对分而治之、动态规划的理解?区别?

一、分而治之 分而治之是算法设计中的一种方法,就是把一个复杂的问题分成两个或更多的相同或相似的子问题,直到最后子问题可以简单的直接求解,原问题的解即子问题的解的合并 关于分而治之的实现,都会经历三个步骤:分解:将原问题分解为若干个规模较小,相对独立,与原问题…

【C语言】深入解析选择排序算法

一、算法原理二、算法性能分析三、C语言实现示例四、总结 一、算法原理 选择排序&#xff08;Selection Sort&#xff09;是一种简单直观的排序算法。它的工作原理是不断地选择剩余元素中的最小&#xff08;或最大&#xff09;元素&#xff0c;放到已排序的序列的末尾&#xff…

科普:嵌入式代码软件在环(SiL)测试的可靠性

​​关键词:嵌入式系统、软件在环(SiL)、测试、生命周期01.简介当前,嵌入式系统开发的大趋势为通过软件实现大量的硬件功能,这导致软件的复杂程度显著上升——代码开发成本和风险也成倍增加。复用已有系统中的软件组件是改进嵌入式系统生命周期的一种可能的解决方案,对代…

hitcontraining_heapcreator

[BUUCTF]hitcontraining_heapcreator UAF|Off-By-One|堆溢出 对应libc版本libc6_2.23-0ubuntu9_amd64 [*] /home/bamuwe/heapcreator/heapcreatorArch: amd64-64-littleRELRO: Partial RELROStack: Canary foundNX: NX enabledPIE: No PIE (0x3fc000)bamu…

django自定义构建模板,通过bootstrap实现菜单隐藏和显示

实现后的界面1.自定义页面模板实现 主页面代码(home.html) {% extends layout.html %} #引用模板 {% load static %} {% block content %}<h3>欢迎登录</h3> {% endblock %}自定义内容layout.html文件设置(模板){% load static %} {% load menu %} #导入me…

五一~感恩回馈,SolidKits工具折扣来袭!

SOLIDWORKS插件多样且丰富,有着不同的种类和用途,可以为SOLIDWORKS软件本身提升使用效率,更快速的响应你的操作方式。SolidKits自主设计研发多款SOLIDWORKS增效插件,包括:自动化参数设计插件、高级BOM插件、批量编码器插件、标准件增强工具等,也可提供按需定制开发服务。…

蓝桥杯2024年第十五届省赛真题-握手问题

方法一&#xff1a;模拟 #include<bits/stdc.h> using namespace std; #define int long long const int n1e6; int a,b[n],c; signed main() {for(int i1;i<50;i){for(int ji1;j<50;j){if(i<7&&j<7){continue;}c;}}cout<<c<<endl; }方…

wstunnel (websocket模式ssh)

接上一篇 修改客户端运行参数 ssh -o ProxyCommand"./wstunnel client -L stdio://%h:%p ws://192.168.254.131:8080" 127.0.0.1 其中127.0.0.1为服务端的本地ssh访问&#xff0c;可以修改为通过服务端访问其他设备的ssh服务。例如&#xff1a; ssh -o ProxyComma…

一个java项目中,如何使用sse协议,构造一个chatgpt的流式对话接口

前言 如何注册chatGPT&#xff0c;怎么和它交互&#xff0c;本文就不讲了&#xff1b;因为网上教程一大堆&#xff0c;而且你要使用的话&#xff0c;通常会再包一个算法服务&#xff0c;用来做一些数据训练和过滤处理之类的&#xff0c;业务服务基本不会直接与原生chatGPT交互。…

使用自己云服务器frp内网穿透记录

1.前提是自己现有云服务器已经2.下载对应的版本,我使用的是052.3下载地址 https://github.com/fatedier/frp/releases需要注意:下载的linux版本是服务端。windows是客户端 后续需要修改对用的配置文件 3.解压linux3.1 编辑配置文件vi frps.toml bindPort = 7000 # 服务运行端…

.net6 ILogger日志保存到本地

1、新建一个LocalFileLogger的类public class LocalFileLogger : ILogger{private readonly string categoryName;private readonly string basePath;public LocalFileLogger(string categoryName){this.categoryName = categoryName;string[] fieldstrs = Enum.GetNames(typeo…

CISCN2023-华北-normal_snake

就得审java。 路由分析 老规矩,先看看路由:/read路由下传参data,pyload不能包含!!,然后用了yaml来load传入的参数。 稍作了解,这其实就是 SnakeYaml 反序列化漏洞,禁用了 yaml 的常用头 !!。 前面的!!是用于强制类型转化,强制转换为!!后指定的类型,其实这个和Fastjson的…

如何用Sublime Text实现正则查找与替换

比如将下面的汉字语义加上中括号[{"text": "微笑","path": "emot01.png"},{"text": "大笑","path": "emot02.png"},{"text": "鼓掌","path": "emot03.pn…

STM32之UASRT试验

一、实验目的 1.实现STM32F407开发板与上位机工具通讯,中断方式具体实现的效果:上电后,下位机主动发送hello world ,上位机收到并显示;上位机发送数字0~9 ,回复: zero ~ nine 2.通讯协议,后面补充 3.硬件使用野火开发版STM32F407 4.与开发板连接的接口是Usb转串口,根据…

什么是uniapp----分包

前言 还是同样的需求(uniapp的主包要求大小不得大于2MB),但是就算将能封装的都封装了还是会超过2MB,本文将介绍第二个优化点:分包开发 一、什么是分包开发? 有很多小伙伴一听分包开发认为就是多建几个文件夹,到时候引用就行了,说对对,但也不对,慢慢看下去就知道原因了…

80个在线小游戏源码

源码简介 搭建80个在线小游戏网站源码&#xff0c;解压即可食用&#xff0c;支持在本地浏览器打开。 安装教程 纯HTML&#xff0c;直接将压缩包上传网站目录解压即可 首页截图 源码下载 80个在线小游戏源码-小8源码屋

spring-接口大全

1. Bean 相关 1. InitializingBean InitializingBean接口为bean提供了初始化方法的方式,它只包括afterPropertiesSet方法,凡是继承该接口的类,在初始化bean的时候都会执行该方法。 demo @Component public class MyInitBean implements InitializingBean {public void after…

设计不外流,保护创意的同时锁住图纸安全!

在设计行业中,图纸和创意文稿的安全至关重要,因为它们体现了企业的创新能力和核心竞争力。华企盾DSC数据防泄密系统提供了一系列功能,可以有效地保护这些珍贵的设计和文档不被外泄。以下是如何利用华企盾DSC系统保障设计图纸安全的关键措施:全面的加密模式:华企盾DSC系统提…