Configuring notification frequency


#1

Hi,

I have an alert configured with a notification frequency of 1 minute, but from logs, they do not run at 1 minute interval.

The alerts gets evaluated very slowly even though the notification frequency is every 1 minute. It seems to be controlled by these variables in Advance configuration:

performance:
  capture_threshold:
    :ems_cluster: 50.minutes
    :host: 50.minutes
    :storage: 60.minutes
    :vm: 50.minutes
  capture_threshold_with_alerts:
    :host: 20.minutes
    :vm: 20.minutes

But I can’t get it to evaluate every 1 minute for a real time alert despite changing capture_threshold_with_alerts.

[----] I, [2016-06-09T10:12:05.216496 #50155:e83988]  INFO -- : MIQ(MiqAlert#evaluate) Evaluating Alert [VM CPU Utilization > 10%] for target: [test]...
[----] I, [2016-06-09T10:14:53.684598 #50155:e83988]  INFO -- : MIQ(MiqAlert#evaluate) Evaluating Alert [VM CPU Utilization > 10%] for target: [test]...
[----] I, [2016-06-09T10:18:07.856951 #50158:865998]  INFO -- : MIQ(MiqAlert#evaluate) Evaluating Alert [VM CPU Utilization > 10%] for target: [test]...
[----] I, [2016-06-09T10:20:57.013168 #50160:e8f990]  INFO -- : MIQ(MiqAlert#evaluate) Evaluating Alert [VM CPU Utilization > 10%] for target: [test]...
[----] I, [2016-06-09T10:24:10.251696 #50155:e83988]  INFO -- : MIQ(MiqAlert#evaluate) Evaluating Alert [VM CPU Utilization > 10%] for target: [test]…

Please see screenshot:


MiqAlert not showing in reports
#2

Any SME here to help look into this?


#3

Hi @ltsai,

The values configured under :capture_threshold_with_alerts will control the minimum amount of time between real-time performance metrics capture, as you’ve already discovered. However, performance metrics capture (which alerting depends on) is run on a built-in system schedule that has a default frequency of every 3 minutes.

That schedule is controlled by this advanced setting:

   :schedule_worker:
     ...
     :performance_collection_interval: 3.minutes

I think if you reduce that value down to 1.minute you’ll get the result you want. However, please be aware that reducing these values will add a lot more overhead on the metrics collection and alerting processes and could lead to performance issues depending on the size of your managed environment.


C&U Rollups and Real Time Vcenter metrics
#4

Hi, @gtanzillo,

Don’t really understand the difference between :capture_threshold_with_alerts and :performance_collection_interval.

According to documentation:

capture_threshold_with_alerts
Amount of time in minutes to wait after capture before capturing again. This value is used instead of capture_threshold for VMs that have alerts assigned based on real time Capacity & Utilization data. Default: 20.minutes

What is a capture? I see perf_capture and Queuing evaluation of Alert in the evm.logs.

performance_collection_interval
Controls how often the schedule worker will put performance collection request on the queue to be picked up by the collection worker. Default: 3.minutes

This is how often C&U runs?

I changed both values to 1min, however I see my C&U is collected every 20s?

vmdb_production=# select capture_interval, created_on, timestamp, capture_interval_name, cpu_usage_rate_average from metrics where resource_name='f15' order by timestamp desc limit 10;
 capture_interval |         created_on         |      timestamp      | capture_interval_name | cpu_usage_rate_average 
------------------+----------------------------+---------------------+-----------------------+------------------------
               20 | 2016-09-25 03:14:44.175978 | 2016-09-25 03:13:20 | realtime              |       97.8734774741517
               20 | 2016-09-25 03:14:44.165218 | 2016-09-25 03:13:00 | realtime              |       97.8734774741517
               20 | 2016-09-25 03:14:44.154467 | 2016-09-25 03:12:40 | realtime              |       97.8734774741517
               20 | 2016-09-25 03:13:59.17687  | 2016-09-25 03:12:20 | realtime              |       98.6069938519003
               20 | 2016-09-25 03:13:59.163404 | 2016-09-25 03:12:00 | realtime              |       98.6069938519003
               20 | 2016-09-25 03:13:59.150119 | 2016-09-25 03:11:40 | realtime              |       98.6069938519003
               20 | 2016-09-25 03:13:02.771908 | 2016-09-25 03:11:20 | realtime              |        98.680234404075
               20 | 2016-09-25 03:13:02.744891 | 2016-09-25 03:11:00 | realtime              |        98.680234404075
               20 | 2016-09-25 03:13:02.707745 | 2016-09-25 03:10:40 | realtime              |        98.680234404075
               20 | 2016-09-25 03:11:27.50329  | 2016-09-25 03:10:20 | realtime              |       98.7830422033523
(10 rows)

#5

bump. Any SME can help?