Cloud Posse recently overhauled its Terraform module for managing security groups and rules. We rely on this module to provide a consistent interface for managing AWS security groups and associated security group rules across our Open Source Terraform modules.
This new module can be used very simply, but under the hood, it is quite complex because it is attempting to handle numerous interrelationships, restrictions, and a few bugs in ways that offer a choice between zero service interruption for updates to a security group not referenced by other security groups (by replacing the security group with a new one) versus brief service interruptions for security groups that must be preserved. Another enhancement is now you can provide the ID of an existing security group to modify, or, by default, this module will create a new security group and apply the given rules to it.
Avoiding Service Interruptions
It is desirable to avoid having service interruptions when updating a security group. This is not always possible due to the way Terraform organizes its activities and the fact that AWS will reject an attempt to create a duplicate of an existing security group rule. There is also the issue that while most AWS resources can be associated with and disassociated from security groups at any time, there remain some that may not have their security group association changed, and an attempt to change their security group will cause Terraform to delete and recreate the resource.
The 2 Ways Security Group Changes Cause Service Interruptions
Changes to a security group can cause service interruptions in 2 ways:
Changing rules may be implemented as deleting existing rules and creating new ones. During the period between deleting the old rules and creating the new rules, the security group will block traffic intended to be allowed by the new rules.
Changing rules may be implemented as creating a new security group with the new rules and replacing the existing security group with the new one (then deleting the old one). This usually works with no service interruption when all resources referencing the security group are part of the same Terraform plan. However, if, for example, the security group ID is referenced in a security group rule in a security group that is not part of the same Terraform plan, then AWS will not allow the existing (referenced) security group to be deleted, and even if it did, Terraform would not know to update the rule to reference the new security group.
The key question you need to answer to decide which configuration to use is "will anything break if the security
group ID changes". If not, then use the defaults create_before_destroy = true
and
preserve_security_group_id = false
and do not worry about providing "keys" for security group rules. This
is the default because it is the easiest and safest solution when the way the security group is being used allows it.
If things will break when the security group ID changes, then set preserve_security_group_id
to true
. Also read and follow the guidance below about keys and limiting Terraform security group rules to
a single AWS security group rule if you want to mitigate against service interruptions caused by rule changes.
Note that even in this case, you probably want to keep create_before_destroy = true
because
otherwise, if some change requires the security group to be replaced, Terraform will likely succeed in deleting all
the security group rules but fail to delete the security group itself, leaving the associated resources completely
inaccessible. At least with create_before_destroy = true
, the new security group will be created and
used where Terraform can make the changes, even though the old security group will still fail to be deleted.
The 3 Ways to Mitigate Against Service Interruptions
Security
Group create_before_destroy = true
The most important option is create_before_destroy
which, when set to true
(the default), ensures that a new replacement security group is created before an existing one is destroyed.
This is particularly important because a security group cannot be destroyed while it is associated with a resource
(e.g. a load balancer), but "destroy before create" behavior causes Terraform to try to destroy the security group
before disassociating it from associated resources so plans fail to apply with the error
Error deleting security group: DependencyViolation: resource sg-XXX has a dependent object
With "create before destroy" set, and any resources dependent on the security group as part of the same Terraform plan, replacement happens successfully:
(If a resource is dependent on the security group and is also outside the scope of the Terraform plan, the old security group will fail to be deleted and you will have to address the dependency manually.)
Note that the module's default configuration of create_before_destroy = true
and
preserve_security_group_id = false
will force the "create before destroy" behavior on the target
security group, even if the module did not create it and instead you provided a
target_security_group_id
.
Unfortunately, creating a new security group is not enough to prevent a service interruption. Keep reading for more on that.
Setting Rule Changes to Force Replacement of the Security Group
A security group by itself is just a container for rules. It only functions as desired when all the rules are in
place. If using the Terraform default "destroy before create" behavior for rules, even when using
create_before_destroy
for the security group itself, an outage occurs when updating the rules or
security group because the order of operations is:
Associate the new security group with resources and disassociate the old one (which can take a substantial amount of time for a resource like a NAT Gateway)
To resolve this issue, the module's default configuration of create_before_destroy = true
and preserve_security_group_id = false
causes any change in the security group rules to
trigger the creation of a new security group. With that, a rule change causes operations to occur in this order:
There can be a downside to creating a new security group with every rule change. If you want to prevent the security
group ID from changing unless absolutely necessary, perhaps because the associated resource does not allow the
security group to be changed or because the ID is referenced somewhere (like in another security group's rules)
outside of this Terraform plan, then you need to set preserve_security_group_id
to
true
.
The main drawback of this configuration is that there will normally be a service outage during an update because existing rules will be deleted before replacement rules are created. Using keys to identify rules can help limit the impact, but even with keys, simply adding a CIDR to the list of allowed CIDRs will cause that entire rule to be deleted and recreated, causing a temporary access denial for all of the CIDRs in the rule. (For more on this and how to mitigate against it, see The Importance of Keys below.)
Also, note that setting preserve_security_group_id
to true
does not
prevent Terraform from replacing the security group when modifying it is not an option, such as when its name or
description changes. However, if you can control the configuration adequately, you can maintain the security group ID
and eliminate the impact on other security groups by setting preserve_security_group_id
to true
. We still recommend leaving create_before_destroy
set to
true
for the times when the security group must be replaced to avoid the
DependencyViolation
described above.
We provide several different ways to define rules for the security group for a few reasons:
type
constraints make it difficult to create collections of objects with optional membersTerraform resource addressing can cause resources that did not actually change to be nevertheless replaced (deleted and recreated), which, in the case of security group rules, then causes a brief service interruption
Terraform resource addresses must be known at plan
time, making it challenging to create rules that depend on
resources being created during apply
and at the same time are not replaced needlessly when something else changes
When Terraform rules can be successfully created before being destroyed, there is no service interruption for the resources associated with that security group (unless the security group ID is used in other security group rules outside of the scope of the Terraform plan)
If you are relying on the "create before destroy" behavior for the security group and security group rules, you can skip this section and much of the discussion about keys in the later sections because keys do not matter in this configuration. However, if you are using the "destroy before create" behavior, a full understanding of keys applied to security group rules will help you minimize service interruptions due to changing rules.
When creating a collection of resources, Terraform requires each resource to be identified by a key so that each
resource has a unique "address" and Terraform uses these keys to track changes to resources. Every security group rule
input to this module accepts optional identifying keys (arbitrary strings) for each rule. If you do not supply keys,
then the rules are treated as a list, and the index of the rule in the list will be used as its key. Note that not
supplying keys, therefore, has the unwelcome behavior that removing a rule from the list will cause all the rules
later in the list to be destroyed and recreated. For example, changing [A, B, C, D]
to
[A, C, D]
causes rules 1(B)
, 2(C)
, and 3(D)
to be deleted and
new rules 1(C
) and 2(D
) to be created.
We allow you to specify keys (arbitrary strings) for each rule to mitigate this problem. (Exactly how you specify the
key is explained in the next sections.) Going back to our example, if the initial set of rules were specified with
keys, e.g. [{A: A}, {B: B}, {C: C}, {D: D}]
, then removing B
from the list would only cause B
to be deleted, leaving C
and D
intact.
Note, however, two cautions. First, the keys must be known at terraform plan
time and
therefore cannot depend on resources that will be created during apply
. Second, in order to be
helpful, the keys must remain consistently attached to the same rules. For example, if you did the following:
Then you will have merely recreated the initial problem by using a plain list. If you cannot attach meaningful keys to the rules, there is no advantage to specifying keys at all.
A single security group rule input can actually specify multiple security group rules. For example,
ipv6_cidr_blocks
takes a list of CIDRs. However, AWS security group rules do not allow for a list of
CIDRs, so the AWS Terraform provider converts that list of CIDRs into a list of AWS security group rules, one for each
CIDR. (This is the underlying cause of several AWS Terraform provider bugs, such as #25173.) As of this writing, any change
to any element of such a rule will cause all the AWS rules specified by the Terraform rule to be deleted and
recreated, causing the same kind of service interruption we sought to avoid by providing keys for the rules, or, when
create_before_destroy = true
, causing a complete failure as Terraform tries to create duplicate rules
which AWS rejects. To guard against this issue, when not using the default behavior, you should avoid the convenience
of specifying multiple AWS rules in a single Terraform rule and instead create a separate Terraform rule for each
source or destination specification.
This module provides 3 ways to set security group rules. You can use any or all of them at the same time.
The easy way to specify rules is via the rules
input. It takes a list of rules. (We will
define a rule a bit later.)
The problem is that a Terraform list must be composed of elements of the exact same type, and rules can
be any of several different Terraform types. So to get around this restriction, the second way to specify rules is via
the rules_map
input, which is more complex.
The rules_map
input takes an object.
The attribute names (keys) of the object can be anything you want, but need to be known during terraform plan
,
which means they cannot depend on any resources created or changed by Terraform.
The values of the attributes are lists of rule objects, each representing one Security Group Rule. As explained above in "Why the input is so complex", each object in the list must be exactly the same type. To use multiple types, you must put them in separate lists and put the lists in a map with distinct keys.
For our module, a rule is defined as an object. The attributes and values of the rule objects are fully compatible (have the same keys and accept the same values) as the Terraform aws_security_group_rule resource, except
security_group_id
will be ignored, if presentYou can include an optional key
attribute. Its value must be unique among all security group rules in the security
group, and it must be known in the Terraform "plan" phase, meaning it cannot depend on anything being generated or
created by Terraform.
If provided, the key
attribute value will be used to identify the Security Group Rule to Terraform to
prevent Terraform from modifying it unnecessarily. If the key is not provided, Terraform will assign an identifier
based on the rule's position in its list, which can cause a ripple effect of rules being deleted and recreated if a
rule gets deleted from the start of a list, causing all the other rules to shift position. See "Unexpected
changes..." below for more details.
Unexpected changes during plan and apply
When configuring this module for "create before destroy" behavior, any change to a security group rule will cause an entirely new security group to be created with all new rules. This can make a small change look like a big one, but is intentional and should not cause concern.
As explained above under The Importance of Keys, when using "destroy before create" behavior, security group rules without keys are identified by their indices in the input lists. If a rule is deleted and the other rules move closer to the start of the list, those rules will be deleted and recreated. This can make a small change look like a big one when viewing the output of Terraform plan, and will likely cause a brief (seconds) service interruption.
You can avoid this for the most part by providing the optional keys, and limiting each
rule to a single source or destination. Rules with keys will not be changed if their keys do not change and the
rules themselves do not change, except in the case of rule_matrix
, where the rules are still
dependent on the order of the security groups in source_security_group_ids
. You can avoid this by
using rules
instead of rule_matrix
when you have more than one security
group in the list. You cannot avoid this by sorting the source_security_group_ids
, because that
leads to the "Invalid for_each
argument" error because of terraform#31035.
You can supply many rules as inputs to this module, and they (usually) get transformed into
aws_security_group_rule
resources. However, Terraform works in 2 steps: a plan
step where it calculates the changes to be made, and an apply
step where it makes the
changes. This is so you can review and approve the plan before changing anything. One big limitation of this approach
is that it requires that Terraform be able to count the number of resources to create without the benefit of any data
generated during the apply
phase. So if you try to generate a rule based on something you are
creating at the same time, you can get an error like
Error: Invalid for_each argument
The "for_each" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created.
This module uses lists to minimize the chance of that happening, as all it needs to know is the length of the list,
not the values in it, but this error still can happen for subtle reasons. Most commonly, using a function like
compact
on a list will cause the length to become unknown (since the values have to be checked
and null
s removed). In the case of source_security_group_ids
, just sorting the
list using sort
will cause this error. (See terraform#31035.) If you run into this error, check
for functions like compact
somewhere in the chain that produces the list and remove them if you
find them.
inline_rules_enabled
is not recommended and NOT SUPPORTED
: Any issues arising from setting inlne_rules_enabled = true
(including issues about setting it to false
after
setting it to true
) will not be addressed because they flow from fundamental
problems with the underlying aws_security_group
resource. The setting is provided for people who know and accept the limitations and trade-offs and want to use it
anyway. The main advantage is that when using inline rules, Terraform will perform "drift detection" and attempt to
remove any rules it finds in place but not specified inline. See this
post for a discussion of the
difference between inline and resource rules and some of the reasons inline rules are not satisfactory.
(#20046): If you set inline_rules_enabled = true
, you cannot later set it to false
. If you try, Terraform will
complain and fail. You will either have to delete and recreate the
security group or manually delete all the security group rules via the AWS console or CLI before applying
inline_rules_enabled = false
.
: Any time you provide a list of objects, Terraform requires that all objects in the list must be the exact same
type.
This means that all objects in the list have exactly the same set of attributes and that each attribute has the same
type of value in every object. So while some attributes are optional for this module, if you include an attribute in any
of the objects in a list, you have to include that same attribute in all of them. In rules where the key would otherwise
be omitted, including the key with a value of null
, unless the value is a list type, in which case set the value to
[]
(an empty list), due to #28137.