-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Cloudwatch: add basic composite alarm support #11828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloudwatch: add basic composite alarm support #11828
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome to LocalStack! Thanks for raising your first Pull Request and landing in your contributions. Our team will reach out with any reviews or feedbacks that we have shortly. We recommend joining our Slack Community and share your PR on the #community channel to share your contributions with us. Please make sure you are following our contributing guidelines and our Code of Conduct.
All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
Haven't added PR labels as I seem to not have permissions. |
new_state_value = StateValue.OK | ||
alarm_rule = composite_alarm.alarm["AlarmRule"] | ||
# assuming that a rule consists only of ALARM evaluations of metric alarms, with OR logic applied | ||
metric_alarm_arns = re.findall( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alarm rule parsing and further evaluation definitely deserves more checks, like check if parsed metric alarm ARM exists in the store and actually check for OR
presence. However, I wanted to first get overall feedback on PR and approach and then work on these details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are some things that can break in real usage if unchecked, but I'd like to fix them after a general feedback on approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added several input validations in put_composite_alarm
, namely:
- if each metric alarm condition starts with
ALARM
- if there is a metric alarm in the store that corresponds to a referenced one
Likely some edge cases are still left out. For a more robust and extensible parsing likely a different approach needs to be taken, probably with recursive descent parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of questions about the correct behaviour, but this is looking great!
current_time = datetime.datetime.utcnow() | ||
current_time = datetime.datetime.now(timezone.utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for correcting this!
aws_client.cloudwatch.put_composite_alarm( | ||
AlarmName=composite_alarm_name, | ||
AlarmDescription=composite_alarm_description, | ||
AlarmRule=composite_alarm_rule, | ||
OKActions=[topic_arn_ok], | ||
AlarmActions=[topic_arn_alarm], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are making a change to this method, it might be nice to try snapshotting the response for extra parity coverage
aws_client.cloudwatch.put_composite_alarm( | |
AlarmName=composite_alarm_name, | |
AlarmDescription=composite_alarm_description, | |
AlarmRule=composite_alarm_rule, | |
OKActions=[topic_arn_ok], | |
AlarmActions=[topic_arn_alarm], | |
) | |
put_composite_alarm_response = aws_client.cloudwatch.put_composite_alarm( | |
AlarmName=composite_alarm_name, | |
AlarmDescription=composite_alarm_description, | |
AlarmRule=composite_alarm_rule, | |
OKActions=[topic_arn_ok], | |
AlarmActions=[topic_arn_alarm], | |
) | |
snapshot.match("put-composite-alarm", put_composite_alarm_response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Makes sense! Adding it in a separate commit, together with the updated snapshot file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# It is not possible to reuse the snapshot within the same test, | ||
# therefore checking that alarm reason hasn't changed instead of recording new snapshot | ||
state_reason_data = json.loads( | ||
composite_alarm_not_changed_by_second_trigger["StateReasonData"] | ||
) | ||
triggering_alarms = state_reason_data["triggeringAlarms"] | ||
assert len(triggering_alarms) == 1 | ||
assert triggering_alarms[0]["arn"] == alarm_2_arn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: why do you think it's a problem to record new state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that because composite alarm is unaffected it should be compared against previous snapshot, but now I can't remember why I thought it is a problem to record a new one. 🙂 🤷 I think I have a better understanding of snapshot testing by now and I changed this check to a new recorded snapshot comparison.
@@ -889,7 +970,7 @@ def create_message_response_update_state_lambda( | |||
return json.dumps(response, cls=JSONEncoder) | |||
|
|||
|
|||
def create_message_response_update_state_sns(alarm, old_state): | |||
def create_message_response_update_state_sns(alarm: LocalStackMetricAlarm, old_state): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, typing is great!
It looks like these states are StateValue
instances
def create_message_response_update_state_sns(alarm: LocalStackMetricAlarm, old_state): | |
def create_message_response_update_state_sns(alarm: LocalStackMetricAlarm, old_state: StateValue): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will add it here as per suggestion and to a new create_message_response_update_composite_alarm_state_sns
one as well. 👍
raise ValidationError( | ||
f"Error in alarm rule condition '{alarm}': Only ALARM expression is supported" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should perhaps LOG.warning
and ignore the rule rather than raising an error on unsupported actions. This becomes a known lack of parity rather than a user error.
... except if this is actually a parity improvement with AWS, in which case we should have a snapshot to cover it. I think this would be better as an additional test to capture any validation errors thrown by CloudWatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did take inspiration from AWS response for put-composite-alarm
when the alarm rule grammar has unsupported tokens.
For example, if I run the following command against AWS
aws cloudwatch put-composite-alarm --alarm-name "composite-alarm" --alarm-rule "ALARRRRM(metric-alarm)"
then the response is:
An error occurred (ValidationError) when calling the PutCompositeAlarm operation: Error in AlarmRule [Unsupported token 'ALARRRRM' at char 0]
I'm not sure if to qualify the validation I added as a parity improvement though. My motivation to add this validation was to make it very explicit to a Localstack user that as of now no expressions except ALARM
one are supported. This way when they try to add a composite alarm that has expressions that are not supported yet by Localstack they will get a message and error response quite similar to what AWS provides. If there is no such validation a user could add a composite alarm with what is supported by AWS, let's say, OK(metric-alarm)
in the rule. This rule would be saved and later be evaluated not as user expects, because in the current implementation every metric alarm is checked for ALARM state only.
So what I added is a similar behaviour, not full parity.
Because of that, I didn't see how I can do a shapshot, as AWS would happily accept the ALARM
token.
I think when a full alarm rule grammar is implemented many things will become simpler and cleaner, validation included. I just didn't venture into implementing full grammar parser in the first iteration, which lead to having to make several workarounds.
After my explanation, do you think it is worth to keep this validation? I'm absolutely good to change to LOG.warning
as you suggest if that's the go-to approach in Localstack for lack of parity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a line needs to be drawn for this first implementation. My take is: I would prefer to not fail on valid input if it is not supported by LocalStack (yet). I think it is better to not break existing infrastructure (e.g. if a user tries to deploy their app to LocalStack, we should not block them just because we don't support a feature).
The only other option we have is to raise NotImplementedError
which correctly conveys the message, however we treat that exception specially to return a response similar to "The API action '...' is not supported" which is not quite what we want. The action is supported, just that this specific input is not.
Can we maybe switch to the LOG
statement in this instance, and as you say: a proper rule parser will likely be a future step and we can properly validate the input at that stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to not break existing infrastructure (e.g. if a user tries to deploy their app to LocalStack, we should not block them just because we don't support a feature).
Great point, now what you suggest makes a lot of sense to me! Also it made me think whether a partial rule implementation does more good than harm. 😄 I mean situations like when someone has a rule with AND in their infra and Localstack will treat it as if it was OR. Might be a rare case, yet it is probably better to not delay full alarm expression grammar support by a long time after (if) this gets merged.
Anyway, will now change raising error to logging as you suggest, thanks for explaining the details about parity concept!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good PR and now we're closer to provide full functionality to CW logs. Please resolve the comments from @simonrw they're very insightful.
Also another test about composite alarms is failing:
FAILED tests/aws/services/cloudwatch/test_cloudwatch.py::TestCloudwatch::test_put_composite_alarm_describe_alarms - botocore.exceptions.ClientError: An error occurred (ValidationError) when calling the PutCompositeAlarm operation: Could not save the composite alarm as alarms [a-ae401a82] in the alarm rule do not exist
topic_name_alarm = f"topic-alarm-{short_uid()}" | ||
topic_name_ok = f"topic-ok-{short_uid()}" | ||
|
||
sns_topic_alarm = sns_create_topic(Name=topic_name_alarm) | ||
topic_arn_alarm = sns_topic_alarm["TopicArn"] | ||
sns_topic_ok = sns_create_topic(Name=topic_name_ok) | ||
topic_arn_ok = sns_topic_ok["TopicArn"] | ||
|
||
# create queues for 'ALARM' and 'OK' of the composite alarm (will receive sns messages) | ||
queue_url_alarm = sqs_create_queue(QueueName=f"AlarmQueue-{short_uid()}") | ||
queue_url_ok = sqs_create_queue(QueueName=f"OKQueue-{short_uid()}") | ||
|
||
arn_queue_alarm = aws_client.sqs.get_queue_attributes( | ||
QueueUrl=queue_url_alarm, AttributeNames=["QueueArn"] | ||
)["Attributes"]["QueueArn"] | ||
arn_queue_ok = aws_client.sqs.get_queue_attributes( | ||
QueueUrl=queue_url_ok, AttributeNames=["QueueArn"] | ||
)["Attributes"]["QueueArn"] | ||
aws_client.sqs.set_queue_attributes( | ||
QueueUrl=queue_url_alarm, | ||
Attributes={"Policy": get_sqs_policy(arn_queue_alarm, topic_arn_alarm)}, | ||
) | ||
aws_client.sqs.set_queue_attributes( | ||
QueueUrl=queue_url_ok, Attributes={"Policy": get_sqs_policy(arn_queue_ok, topic_arn_ok)} | ||
) | ||
|
||
# subscribe to SQS | ||
subscription_alarm = aws_client.sns.subscribe( | ||
TopicArn=topic_arn_alarm, Protocol="sqs", Endpoint=arn_queue_alarm | ||
) | ||
cleanups.append( | ||
lambda: aws_client.sns.unsubscribe( | ||
SubscriptionArn=subscription_alarm["SubscriptionArn"] | ||
) | ||
) | ||
subscription_ok = aws_client.sns.subscribe( | ||
TopicArn=topic_arn_ok, Protocol="sqs", Endpoint=arn_queue_ok | ||
) | ||
cleanups.append( | ||
lambda: aws_client.sns.unsubscribe(SubscriptionArn=subscription_ok["SubscriptionArn"]) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SNS to SQS is the default pattern we use to test alarms, maybe we could convert it into a fixture for the next iteration 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion, added a TODO comment for now but definitely will address either in this or follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed a test by changing alarm name to alarm ARN in the rule, according to scope, but probably it is better to just add support for alarm-names, especially after what @simonrw explained about parity here. Using alarm names in rules is more human-readable and is likely used a lot. Will look into adding support for names tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually reverted to the original test and moved validation from put alarm method to alarm evaluation, skipping and logging unsupported rules. This is the least intrusive approach which won't harm users who might have valid composite alarms in their infra - they should be able to deploy on Localstack regardless on whether Localstack can evaluate rules in their alarms or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for keeping this PR up to date. I will ask @pinzon to do the final approval since he has a "request changes" review blocking the merge but otherwise LGTM!
@@ -2713,6 +2928,34 @@ def _sqs_messages_snapshot(expected_state, sqs_client, sqs_queue, snapshot, iden | |||
snapshot.match(f"{identifier}-sqs-msg", found_msg) | |||
|
|||
|
|||
def check_composite_alarm_message( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a reasonable compromise, thanks!
Currently, only patch changes are allowed on master. Your PR labels (aws:cloudwatch, semver: minor) indicate that it cannot be merged into the master at this time. |
Hi @tiurin. Thank you for addressing the previous comments. I'm pleased to approve your pull request ✔️. I noticed that the PR was labeled as a PATCH, but it should be classified as a MINOR since it introduces new functionality to the CloudWatch service. Given our current code freeze, we will merge this PR next week. |
WIP, didn't run yet against AWS so no snapshot yet.
Focus on making test pass in AWS mode.
Improves readability and fixes a bug with returning from all alarms evaluation if state of one of them hasn't changed - function complexity needs to be reduced as there are too many break and return statements.
Improves readability and reduces return statements from the original method.
Co-authored-by: Simon Walker <s.r.walker101@googlemail.com>
It relied on the alarm name reference in composite alarm rule which is not supported yet. Changing it to ARN reference. Alarm names support should be added as the next step though, it is not too difficult and is very natural to use them in alarm rule description.
Co-authored-by: Simon Walker <s.r.walker101@googlemail.com>
Avoid repetitive code.
As discussed in the (PR)[#11828 (comment)], partial behaviour should not hurt parity. In this case, because evaluation by name is not yet implemented, it is better to let user save an alarm that is not fully supported by Localstack but skip this alarm evaluation.
This reverts commit f77b418.
FYI I re-pushed to trigger a reassessment of the PR label enforcer job |
Motivation
Add basic support for composite alarms evaluation in Cloudwatch. Currently composite alarms can be saved but there is no evaluation for them, therefore there is no way to get them triggered.
Changes
Composite alarm rules will be evaluated now whenever a metric alarm state is changed.
This PR adds limited support for alarm rules:
Example of supported rule:
ALARM("arn:<partition>:cloudwatch:<region>:111111111111:alarm:<simple-alarm-1-name>") OR ALARM("arn:<partition>:cloudwatch:<region>:111111111111:alarm:<simple-alarm-2-name>")
Testing
Snapshot tests have been recorded for several test cases to verify that alarm rule logic applies correctly.