使用 EventBridge 和 Lambda 進(jìn)行自動(dòng)故障排除和 ITSM 系統(tǒng)

介紹 :

各位,在 it 運(yùn)營(yíng)中,監(jiān)視服務(wù)器指標(biāo)(例如 cpu/內(nèi)存和磁盤或文件系統(tǒng)的利用率)是一項(xiàng)非常通用的任務(wù),但如果任何指標(biāo)被觸發(fā)為關(guān)鍵指標(biāo),則需要專門人員通過(guò)以下方式執(zhí)行一些基本故障排除:登錄服務(wù)器并找出使用的最初原因,如果該人收到多個(gè)相同的警報(bào),導(dǎo)致無(wú)聊且根本沒(méi)有生產(chǎn)力,則他必須多次執(zhí)行該操作。因此,作為一種解決方法,可以開發(fā)一個(gè)系統(tǒng),一旦觸發(fā)警報(bào),該系統(tǒng)就會(huì)做出反應(yīng),并通過(guò)執(zhí)行一些基本的故障排除命令來(lái)對(duì)這些實(shí)例采取行動(dòng)。只是總結(jié)問(wèn)題陳述和期望 -

問(wèn)題陳述:

開發(fā)一個(gè)能夠滿足低于預(yù)期的系統(tǒng) -

  • 每個(gè) ec2 實(shí)例都應(yīng)該由 cloudwatch 監(jiān)控。
  • 一旦觸發(fā)警報(bào),就必須有一些東西可以登錄到受影響的 ec2 實(shí)例并執(zhí)行一些基本的故障排除命令。
  • 然后,創(chuàng)建一個(gè) jira 問(wèn)題來(lái)記錄該事件,并在評(píng)論部分添加命令的輸出。
  • 然后,發(fā)送一封自動(dòng)電子郵件,其中提供所有警報(bào)詳細(xì)信息和 jira 問(wèn)題詳細(xì)信息。

架構(gòu)圖:

使用 EventBridge 和 Lambda 進(jìn)行自動(dòng)故障排除和 ITSM 系統(tǒng)

先決條件:

  1. ec2 實(shí)例
  2. cloudwatch 警報(bào)
  3. eventbridge 規(guī)則
  4. lambda 函數(shù)
  5. jira 賬戶
  6. 簡(jiǎn)單的通知服務(wù)

實(shí)施步驟:

  • a. cloudwatch 代理安裝和配置設(shè)置:
    打開 systems manager 控制臺(tái)并單擊“文檔”
    搜索“aws-configureawspackage”文檔并通過(guò)提供所需的詳細(xì)信息來(lái)執(zhí)行。
    包名稱 = amazoncloudwatchagent
    安裝后,需要根據(jù)配置文件配置 cloudwatch 代理。為此,請(qǐng)執(zhí)行 amazoncloudwatch-manageagent 文檔。另外,請(qǐng)確保 json cloudwatch 配置文件存儲(chǔ)在 ssm 參數(shù)中。
    一旦您看到指標(biāo)正在向 cloudwatch 控制臺(tái)報(bào)告,請(qǐng)為 cpu 和內(nèi)存利用率等創(chuàng)建警報(bào)。

  • b.設(shè)置eventbridge規(guī)則:
    為了跟蹤警報(bào)狀態(tài)的變化,這里,我們稍微定制了模式來(lái)跟蹤警報(bào)狀態(tài)從 ok 到 alarm 的變化,而不是反向變化。然后,將此規(guī)則添加到 lambda 函數(shù)作為觸發(fā)器。

{
  "source": ["aws.cloudwatch"],
  "detail-type": ["cloudwatch alarm state change"],
  "detail": {
    "state": {
      "value": ["alarm"]
    },
    "previousstate": {
      "value": ["ok"]
    }
  }
}
關(guān)注:愛掏網(wǎng)
  • c.創(chuàng)建 lambda 函數(shù)以在 jira 中發(fā)送電子郵件和記錄事件: 此 lambda 函數(shù)是為由 eventbridge 規(guī)則觸發(fā)的多個(gè)活動(dòng)創(chuàng)建的,并作為使用 aws sdk(boto3) 添加的目標(biāo) sns 主題。一旦觸發(fā) eventbridge 規(guī)則,就會(huì)將 json 事件內(nèi)容發(fā)送到 lambda,該函數(shù)通過(guò)該函數(shù)捕獲多個(gè)詳細(xì)信息以不同的方式進(jìn)行處理。 到目前為止,我們已經(jīng)研究了兩種類型的警報(bào) - i。 cpu 利用率和 ii.內(nèi)存利用率。一旦這兩個(gè)警報(bào)中的任何一個(gè)被觸發(fā)并且警報(bào)狀態(tài)從 ok 更改為 alarm,就會(huì)觸發(fā) eventbridge,這也會(huì)觸發(fā) lambda 函數(shù)來(lái)執(zhí)行表單代碼中提到的那些任務(wù)。

lambda 先決條件:
我們需要導(dǎo)入以下模塊才能使代碼正常工作 -

  • >> 操作系統(tǒng)
  • >> 系統(tǒng)
  • >> json
  • >> boto3
  • >> 時(shí)間
  • >> 請(qǐng)求

注意: 從上面的模塊中,除了“requests”模塊之外,其余的都默認(rèn)在 lambda 底層基礎(chǔ)設(shè)施中下載。 lambda 不支持直接導(dǎo)入“requests”模塊。因此,首先,通過(guò)執(zhí)行以下命令將請(qǐng)求模塊安裝在本地計(jì)算機(jī)(筆記本電腦)的文件夾中 -

pip3 install requests -t <directory path> --no-user
</directory>
關(guān)注:愛掏網(wǎng)

_之后,這將被下載到您執(zhí)行上述命令的文件夾或您想要存儲(chǔ)模塊源代碼的文件夾中,這里我希望 lambda 代碼正在您的本地計(jì)算機(jī)中準(zhǔn)備。如果是,則使用 module.txt 創(chuàng)建整個(gè) lambda 源代碼的 zip 文件。之后,將 zip 文件上傳到 lambda 函數(shù)。

所以,我們?cè)谶@里執(zhí)行以下兩個(gè)場(chǎng)景 -

1. cpu 利用率 - 如果觸發(fā) cpu 利用率警報(bào),則 lambda 函數(shù)需要獲取實(shí)例并登錄到該實(shí)例并執(zhí)行前 5 個(gè)高消耗進(jìn)程。然后,它將創(chuàng)建一個(gè) jira 問(wèn)題并在評(píng)論部分添加流程詳細(xì)信息。同時(shí),它將發(fā)送一封電子郵件,其中包含警報(bào)詳細(xì)信息和 jira 問(wèn)題詳細(xì)信息以及流程輸出。

2.內(nèi)存利用率 - 與上面相同的方法

現(xiàn)在,讓我重新構(gòu)建 lambda 應(yīng)該執(zhí)行的任務(wù)細(xì)節(jié) -

  1. 登錄實(shí)例
  2. 執(zhí)行基本故障排除步驟。
  3. 創(chuàng)建 jira 問(wèn)題
  4. 向收件人發(fā)送包含所有詳細(xì)信息的電子郵件

場(chǎng)景 1:當(dāng)警報(bào)狀態(tài)從 ok 更改為 alarm 時(shí)

第一組(定義cpu和內(nèi)存函數(shù)):

################# importing required modules ################
############################################################
import json
import boto3
import time
import os
import sys
sys.path.append('./python')   ## this will add requests module along with all dependencies into this script
import requests
from requests.auth import httpbasicauth

################## calling aws services ###################
###########################################################
ssm = boto3.client('ssm')
sns_client = boto3.client('sns')
ec2 = boto3.client('ec2')

################## defining blank variable ################
###########################################################
cpu_process_op = ''
mem_process_op = ''
issueid = ''
issuekey = ''
issuelink = ''

################# function for cpu utilization ################
###############################################################
def cpu_utilization(instanceid, metric_name, previous_state, current_state):
    global cpu_process_op
    if previous_state == 'ok' and current_state == 'alarm':
        command = 'ps -eo user,pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head -5'
        print(f'impacted instance id is : {instanceid}, metric name: {metric_name}')
        # start a session
        print(f'starting session to {instanceid}')
        response = ssm.send_command(instanceids = [instanceid], documentname="aws-runshellscript", parameters={'commands': [command]})
        command_id = response['command']['commandid']
        print(f'command id: {command_id}')
        # retrieve the command output
        time.sleep(4)
        output = ssm.get_command_invocation(commandid=command_id, instanceid=instanceid)
        print('please find below output -\n', output['standardoutputcontent'])
        cpu_process_op = output['standardoutputcontent']
    else:
        print('none')

################# function for memory utilization ################
############################################################### 
def mem_utilization(instanceid, metric_name, previous_state, current_state):
    global mem_process_op
    if previous_state == 'ok' and current_state == 'alarm':
        command = 'ps -eo user,pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -5'
        print(f'impacted instance id is : {instanceid}, metric name: {metric_name}')
        # start a session
        print(f'starting session to {instanceid}')
        response = ssm.send_command(instanceids = [instanceid], documentname="aws-runshellscript", parameters={'commands': [command]})
        command_id = response['command']['commandid']
        print(f'command id: {command_id}')
        # retrieve the command output
        time.sleep(4)
        output = ssm.get_command_invocation(commandid=command_id, instanceid=instanceid)
        print('please find below output -\n', output['standardoutputcontent'])
        mem_process_op = output['standardoutputcontent']
    else:
        print('none')
關(guān)注:愛掏網(wǎng)

第二組(創(chuàng)建 jira 問(wèn)題):

################## create jira issue ################
#####################################################
def create_issues(instanceid, metric_name, account, timestamp, region, current_state, previous_state, cpu_process_op, mem_process_op, metric_val):
    ## create issue ##
    url ='https://<your-user-name>.atlassian.net//rest/api/2/issue'
    username = os.environ['username']
    api_token = os.environ['token']
    project = 'anirbanspace'
    issue_type = 'incident'
    assignee = os.environ['username']
    summ_metric  = '%cpu utilization' if 'cpu' in metric_name else '%memory utilization' if 'mem' in metric_name else '%filesystem utilization' if metric_name == 'disk_used_percent' else none
    metric_val = metric_val
    summary = f'client | {account} | {instanceid} | {summ_metric} | metric value: {metric_val}'
    description = f'client: company\naccount: {account}\nregion: {region}\ninstanceid = {instanceid}\ntimestamp = {timestamp}\ncurrent state: {current_state}\nprevious state = {previous_state}\nmetric value = {metric_val}'

    issue_data = {
        "fields": {
            "project": {
                "key": "scrum"
            },
            "summary": summary,
            "description": description,
            "issuetype": {
                "name": issue_type
            },
            "assignee": {
                "name": assignee
            }
        }
    }
    data = json.dumps(issue_data)
    headers = {
        "accept": "application/json",
        "content-type": "application/json"
    }
    auth = httpbasicauth(username, api_token)
    response = requests.post(url, headers=headers, auth=auth, data=data)
    global issueid
    global issuekey
    global issuelink
    issueid = response.json().get('id')
    issuekey = response.json().get('key')
    issuelink = response.json().get('self')

    ################ add comment to above created jira issue ###################
    output = cpu_process_op if metric_name == 'cpuutilization' else mem_process_op if metric_name == 'mem_used_percent' else none
    comment_api_url = f"{url}/{issuekey}/comment"
    add_comment = requests.post(comment_api_url, headers=headers, auth=auth, data=json.dumps({"body": output}))

    ## check the response
    if response.status_code == 201:
        print("issue created successfully. issue key:", response.json().get('key'))
    else:
        print(f"failed to create issue. status code: {response.status_code}, response: {response.text}")
</your-user-name>
關(guān)注:愛掏網(wǎng)

第三組(發(fā)送電子郵件):

################## send an email ################
#################################################
def send_email(instanceid, metric_name, account, region, timestamp, current_state, current_reason, previous_state, previous_reason, cpu_process_op, mem_process_op, metric_val, issueid, issuekey, issuelink):
    ### define a dictionary of custom input ###
    metric_list = {'mem_used_percent': 'memory', 'disk_used_percent': 'disk', 'cpuutilization': 'cpu'}

    ### conditions ###
    if previous_state == 'ok' and current_state == 'alarm' and metric_name in list(metric_list.keys()):
        metric_msg = metric_list[metric_name]
        output = cpu_process_op if metric_name == 'cpuutilization' else mem_process_op if metric_name == 'mem_used_percent' else none
        print('this is output', output)
        email_body = f"hi team, \n\nplease be informed that {metric_msg} utilization is high for the instanceid {instanceid}. please find below more information \n\nalarm details:\nmetricname = {metric_name}, \naccount = {account}, \ntimestamp = {timestamp}, \nregion = {region}, \ninstanceid = {instanceid}, \ncurrentstate = {current_state}, \nreason = {current_reason}, \nmetricvalue = {metric_val}, \nthreshold = 80.00 \n\nprocessoutput: \n{output}\nincident deatils:\nissueid = {issueid}, \nissuekey = {issuekey}, \nlink = {issuelink}\n\nregards,\nanirban das,\nglobal cloud operations team"
        res = sns_client.publish(
            topicarn = os.environ['snsarn'],
            subject = f'high {metric_msg} utilization alert : {instanceid}',
            message = str(email_body)
            )
        print('mail has been sent') if res else print('email not sent')
    else:
        email_body = str(0)
關(guān)注:愛掏網(wǎng)

第四組(調(diào)用 lambda 處理函數(shù)):

################## lambda handler function ################
###########################################################
def lambda_handler(event, context):
    instanceid = event['detail']['configuration']['metrics'][0]['metricstat']['metric']['dimensions']['instanceid']
    metric_name = event['detail']['configuration']['metrics'][0]['metricstat']['metric']['name']
    account = event['account']
    timestamp = event['time']
    region = event['region']
    current_state = event['detail']['state']['value']
    current_reason = event['detail']['state']['reason']
    previous_state = event['detail']['previousstate']['value']
    previous_reason = event['detail']['previousstate']['reason']
    metric_val = json.loads(event['detail']['state']['reasondata'])['evaluateddatapoints'][0]['value']
    ##### function calling #####
    if metric_name == 'cpuutilization':
        cpu_utilization(instanceid, metric_name, previous_state, current_state)
        create_issues(instanceid, metric_name, account, timestamp, region, current_state, previous_state, cpu_process_op, mem_process_op, metric_val)
        send_email(instanceid, metric_name, account, region, timestamp, current_state, current_reason, previous_state, previous_reason, cpu_process_op, mem_process_op, metric_val, issueid, issuekey, issuelink)
    elif metric_name == 'mem_used_percent':
        mem_utilization(instanceid, metric_name, previous_state, current_state)
        create_issues(instanceid, metric_name, account, timestamp, region, current_state, previous_state, cpu_process_op, mem_process_op, metric_val)
        send_email(instanceid, metric_name, account, region, timestamp, current_state, current_reason, previous_state, previous_reason, cpu_process_op, mem_process_op, metric_val, issueid, issuekey, issuelink)
    else:
        none
關(guān)注:愛掏網(wǎng)

報(bào)警郵件截圖:

使用 EventBridge 和 Lambda 進(jìn)行自動(dòng)故障排除和 ITSM 系統(tǒng)

注意:在理想情況下,閾值是 80%,但為了測(cè)試我將其更改為 10%。請(qǐng)看原因。

警報(bào) jira 問(wèn)題:

使用 EventBridge 和 Lambda 進(jìn)行自動(dòng)故障排除和 ITSM 系統(tǒng)

場(chǎng)景 2:當(dāng)警報(bào)狀態(tài)從“正常”更改為“數(shù)據(jù)不足”時(shí)

在這種情況下,如果未捕獲任何服務(wù)器 cpu 或內(nèi)存利用率指標(biāo)數(shù)據(jù),則警報(bào)狀態(tài)將從 ok 更改為 insufficient_data。可以通過(guò)兩種方式實(shí)現(xiàn)此狀態(tài) - a.) 如果服務(wù)器處于停止?fàn)顟B(tài) b.) 如果 cloudwatch 代理未運(yùn)行或進(jìn)入死亡狀態(tài)。
因此,根據(jù)下面的腳本,您將能夠看到,當(dāng) cpu 或內(nèi)存利用率警報(bào)狀態(tài)獲取的數(shù)據(jù)不足時(shí),lambda 將首先檢查實(shí)例是否處于運(yùn)行狀態(tài)。如果實(shí)例處于運(yùn)行狀態(tài),那么它將登錄并檢查 cloudwatch 代理狀態(tài)。發(fā)布后,它將創(chuàng)建一個(gè) jira 問(wèn)題并在 jira 問(wèn)題的評(píng)論部分發(fā)布代理狀態(tài)。之后,它將發(fā)送一封包含警報(bào)詳細(xì)信息和代理狀態(tài)的電子郵件。

完整代碼:

################# Importing Required Modules ################
############################################################
import json
import boto3
import time
import os
import sys
sys.path.append('./python')   ## This will add requests module along with all dependencies into this script
import requests
from requests.auth import HTTPBasicAuth

################## Calling AWS Services ###################
###########################################################
ssm = boto3.client('ssm')
sns_client = boto3.client('sns')
ec2 = boto3.client('ec2')

################## Defining Blank Variable ################
###########################################################
cpu_process_op = ''
mem_process_op = ''
issueid = ''
issuekey = ''
issuelink = ''

################# Function for CPU Utilization ################
###############################################################
def cpu_utilization(instanceid, metric_name, previous_state, current_state):
    global cpu_process_op
    if previous_state == 'OK' and current_state == 'INSUFFICIENT_DATA':
        ec2_status = ec2.describe_instance_status(InstanceIds=[instanceid,])['InstanceStatuses'][0]['InstanceState']['Name']
        if ec2_status == 'running':
            command = 'systemctl status amazon-cloudwatch-agent;sleep 3;systemctl restart amazon-cloudwatch-agent'
            print(f'Impacted Instance ID is : {instanceid}, Metric Name: {metric_name}')
            # Start a session
            print(f'Starting session to {instanceid}')
            response = ssm.send_command(InstanceIds = [instanceid], DocumentName="AWS-RunShellScript", Parameters={'commands': [command]})
            command_id = response['Command']['CommandId']
            print(f'Command ID: {command_id}')
            # Retrieve the command output
            time.sleep(4)
            output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
            print('Please find below output -\n', output['StandardOutputContent'])
            cpu_process_op = output['StandardOutputContent']
        else:
            cpu_process_op = f'Instance current status is {ec2_status}. Not able to reach out!!'
            print(f'Instance current status is {ec2_status}. Not able to reach out!!')
    else:
        print('None')

################# Function for Memory Utilization ################
############################################################### 
def mem_utilization(instanceid, metric_name, previous_state, current_state):
    global mem_process_op
    if previous_state == 'OK' and current_state == 'INSUFFICIENT_DATA':
        ec2_status = ec2.describe_instance_status(InstanceIds=[instanceid,])['InstanceStatuses'][0]['InstanceState']['Name']
        if ec2_status == 'running':
            command = 'systemctl status amazon-cloudwatch-agent'
            print(f'Impacted Instance ID is : {instanceid}, Metric Name: {metric_name}')
            # Start a session
            print(f'Starting session to {instanceid}')
            response = ssm.send_command(InstanceIds = [instanceid], DocumentName="AWS-RunShellScript", Parameters={'commands': [command]})
            command_id = response['Command']['CommandId']
            print(f'Command ID: {command_id}')
            # Retrieve the command output
            time.sleep(4)
            output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
            print('Please find below output -\n', output['StandardOutputContent'])
            mem_process_op = output['StandardOutputContent']
            print(mem_process_op)
        else:
            mem_process_op = f'Instance current status is {ec2_status}. Not able to reach out!!'
            print(f'Instance current status is {ec2_status}. Not able to reach out!!')     
    else:
        print('None')

################## Create JIRA Issue ################
#####################################################
def create_issues(instanceid, metric_name, account, timestamp, region, current_state, previous_state, cpu_process_op, mem_process_op, metric_val):
    ## Create Issue ##
    url ='https://<your-user-name>.atlassian.net//rest/api/2/issue'
    username = os.environ['username']
    api_token = os.environ['token']
    project = 'AnirbanSpace'
    issue_type = 'Incident'
    assignee = os.environ['username']
    summ_metric  = '%CPU Utilization' if 'CPU' in metric_name else '%Memory Utilization' if 'mem' in metric_name else '%Filesystem Utilization' if metric_name == 'disk_used_percent' else None
    metric_val = metric_val
    summary = f'Client | {account} | {instanceid} | {summ_metric} | Metric Value: {metric_val}'
    description = f'Client: Company\nAccount: {account}\nRegion: {region}\nInstanceID = {instanceid}\nTimestamp = {timestamp}\nCurrent State: {current_state}\nPrevious State = {previous_state}\nMetric Value = {metric_val}'

    issue_data = {
        "fields": {
            "project": {
                "key": "SCRUM"
            },
            "summary": summary,
            "description": description,
            "issuetype": {
                "name": issue_type
            },
            "assignee": {
                "name": assignee
            }
        }
    }
    data = json.dumps(issue_data)
    headers = {
        "Accept": "application/json",
        "Content-Type": "application/json"
    }
    auth = HTTPBasicAuth(username, api_token)
    response = requests.post(url, headers=headers, auth=auth, data=data)
    global issueid
    global issuekey
    global issuelink
    issueid = response.json().get('id')
    issuekey = response.json().get('key')
    issuelink = response.json().get('self')

    ################ Add Comment To Above Created JIRA Issue ###################
    output = cpu_process_op if metric_name == 'CPUUtilization' else mem_process_op if metric_name == 'mem_used_percent' else None
    comment_api_url = f"{url}/{issuekey}/comment"
    add_comment = requests.post(comment_api_url, headers=headers, auth=auth, data=json.dumps({"body": output}))

    ## Check the response
    if response.status_code == 201:
        print("Issue created successfully. Issue key:", response.json().get('key'))
    else:
        print(f"Failed to create issue. Status code: {response.status_code}, Response: {response.text}")

################## Send An Email ################
#################################################
def send_email(instanceid, metric_name, account, region, timestamp, current_state, current_reason, previous_state, previous_reason, cpu_process_op, mem_process_op, metric_val, issueid, issuekey, issuelink):
    ### Define a dictionary of custom input ###
    metric_list = {'mem_used_percent': 'Memory', 'disk_used_percent': 'Disk', 'CPUUtilization': 'CPU'}

    ### Conditions ###
    if previous_state == 'OK' and current_state == 'INSUFFICIENT_DATA' and metric_name in list(metric_list.keys()):
        metric_msg = metric_list[metric_name]
        output = cpu_process_op if metric_name == 'CPUUtilization' else mem_process_op if metric_name == 'mem_used_percent' else None
        email_body = f"Hi Team, \n\nPlease be informed that {metric_msg} utilization alarm state has been changed to {current_state} for the instanceid {instanceid}. Please find below more information \n\nAlarm Details:\nMetricName = {metric_name}, \n Account = {account}, \nTimestamp = {timestamp}, \nRegion = {region},  \nInstanceID = {instanceid}, \nCurrentState = {current_state}, \nReason = {current_reason}, \nMetricValue = {metric_val}, \nThreshold = 80.00  \n\nProcessOutput = \n{output}\nIncident Deatils:\nIssueID = {issueid}, \nIssueKey = {issuekey}, \nLink = {issuelink}\n\nRegards,\nAnirban Das,\nGlobal Cloud Operations Team"
        res = sns_client.publish(
            TopicArn = os.environ['snsarn'],
            Subject = f'Insufficient {metric_msg} Utilization Alarm : {instanceid}',
            Message = str(email_body)
        )
        print('Mail has been sent') if res else print('Email not sent')
    else:
        email_body = str(0)

################## Lambda Handler Function ################
###########################################################
def lambda_handler(event, context):
    instanceid = event['detail']['configuration']['metrics'][0]['metricStat']['metric']['dimensions']['InstanceId']
    metric_name = event['detail']['configuration']['metrics'][0]['metricStat']['metric']['name']
    account = event['account']
    timestamp = event['time']
    region = event['region']
    current_state = event['detail']['state']['value']
    current_reason = event['detail']['state']['reason']
    previous_state = event['detail']['previousState']['value']
    previous_reason = event['detail']['previousState']['reason']
    metric_val = 'NA'
    ##### function calling #####
    if metric_name == 'CPUUtilization':
        cpu_utilization(instanceid, metric_name, previous_state, current_state)
        create_issues(instanceid, metric_name, account, timestamp, region, current_state, previous_state, cpu_process_op, mem_process_op, metric_val)
        send_email(instanceid, metric_name, account, region, timestamp, current_state, current_reason, previous_state, previous_reason, cpu_process_op, mem_process_op, metric_val, issueid, issuekey, issuelink)
    elif metric_name == 'mem_used_percent':
        mem_utilization(instanceid, metric_name, previous_state, current_state)
        create_issues(instanceid, metric_name, account, timestamp, region, current_state, previous_state, cpu_process_op, mem_process_op, metric_val)
        send_email(instanceid, metric_name, account, region, timestamp, current_state, current_reason, previous_state, previous_reason, cpu_process_op, mem_process_op, metric_val, issueid, issuekey, issuelink)
    else:
        None
</your-user-name>
關(guān)注:愛掏網(wǎng)

數(shù)據(jù)不足郵件截圖:

使用 EventBridge 和 Lambda 進(jìn)行自動(dòng)故障排除和 ITSM 系統(tǒng)

數(shù)據(jù)不足jira問(wèn)題:

使用 EventBridge 和 Lambda 進(jìn)行自動(dòng)故障排除和 ITSM 系統(tǒng)

結(jié)論 :

在本文中,我們測(cè)試了有關(guān) cpu 和內(nèi)存利用率的場(chǎng)景,但是我們可以在很多指標(biāo)上配置自動(dòng)事件和自動(dòng)電子郵件功能,這將減少監(jiān)控和創(chuàng)建事件等方面的大量工作。 。該解決方案為我們提供了進(jìn)一步推進(jìn)的初步方法,但可以肯定的是,還可以有其他可能性來(lái)實(shí)現(xiàn)這一目標(biāo)。我相信你們都會(huì)理解我們?nèi)绾闻ψ屵@一切產(chǎn)生關(guān)聯(lián)。如果您喜歡這篇文章或有任何其他建議,請(qǐng)點(diǎn)贊和評(píng)論,以便我們可以在接下來(lái)的文章中補(bǔ)充。 ??

謝謝!!
阿尼班·達(dá)斯

以上就是使用 EventBridge 和 Lambda 進(jìn)行自動(dòng)故障排除和 ITSM 系統(tǒng)的詳細(xì)內(nèi)容,更多請(qǐng)關(guān)注愛掏網(wǎng) - it200.com其它相關(guān)文章!

聲明:所有內(nèi)容來(lái)自互聯(lián)網(wǎng)搜索結(jié)果,不保證100%準(zhǔn)確性,僅供參考。如若本站內(nèi)容侵犯了原著者的合法權(quán)益,可聯(lián)系我們進(jìn)行處理。
發(fā)表評(píng)論
更多 網(wǎng)友評(píng)論0 條評(píng)論)
暫無(wú)評(píng)論

返回頂部

主站蜘蛛池模板: 两根黑人粗大噗嗤噗嗤视频| 免费一级国产生活片| 久草资源站在线| 日本高清www无色夜在| 欧美亚洲国产片在线播放| 国产精品无码不卡一区二区三区| 亚洲熟妇av一区二区三区下载 | 四虎澳门永久8848在线影院| 久久中文字幕一区二区| 被催眠暴jian的冷艳美mtxt下载 | 日本精品久久久久中文字幕8| 国产成人AV一区二区三区无码| 亚洲av无码国产综合专区| 中文字幕免费在线观看动作大片| 处破女18分钟完整版| 真正全免费视频a毛片| jjzz在线观看| 亚洲另类春色国产精品| 国产成人一区二区在线不卡| 日韩一区二区三| 精品无码一区二区三区在线| a级大片免费观看| 亚洲人av高清无码| 国产一区二区女内射| 大学生男男澡堂69gaysex| 欧美性色黄在线视| 超兴奋的朋…中文字幕| 一本伊大人香蕉高清在线观看| 亚洲欧美另类第一页| 国产免费福利片| 在线欧美日韩精品一区二区| 日韩男人的天堂| 男人天堂资源站| 97精品国产97久久久久久免费| 亚洲熟妇av一区二区三区宅男 | 成人中文字幕一区二区三区| 国产丰满麻豆vⅰde0sex| 中文字幕日产无码| 百合h肉动漫无打码在线观看| 国内精品一战二战| 九九视频在线观看6|