Skip to content

libnet/d/bridge: drop connections to lo mappings, and direct remote connections #49325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 28, 2025

Conversation

akerouanton
Copy link
Member

@akerouanton akerouanton commented Jan 22, 2025

- What I did

This PR fixes two separate security issues that have a common cause -- lack of proper packet filtering before NATing.

- How I did it

1st and 2nd commit are just a small refactoring, and a new integration for the use-case broken by:

3rd commit: libnet/d/bridge: port mappings: drop direct-access when gw_mode=nat

When a NAT-based port mapping is created, the daemon adds a DNAT rule in nat-DOCKER to replace the dest addr with the container IP. However, the daemon never sets up rules to filter packets destined directly to the container port. This allows a rogue neighbor (ie. a host that shares a L2 segment with the host) to send packets directly to the container on its container-side exposed port.

For instance, if container port 5000 is mapped to host port 6000, a neighbor could send packets directly to the container on its port 5000.

Since nat-DOCKER mangles the dest addr, and the nat table forbids DROP rules, this change adds a new rule in the raw-PREROUTING chain to filter ingress connections targeting the container's IP address.

This filtering is only done when gw_mode=nat. For the unprotected variant, no filtering is done.

4th commit: libnet/d/bridge: drop remote connections to port mapped on lo

Traditionnally when Linux receives remote packets with daddr set to a loopback address, it reject them as 'martians'. However, when a NAT rule is applied through iptables this doesn't happen. Our current DNAT rule used to map host ports to containers is applied unconditionnally, even for such 'martian' packets.

This means a neighbor host (ie. a host connected to the same L2 segment) can send packets to a port mapped on a loopback address. The purpose of publishing on a loopback address is to make ports inaccessible to remote hosts -- lack of proper filtering defeats that.

This commit adds an iptables rule to the raw-PREROUTING chain to drop packets with a loopback dest address and coming from any interface other than lo.

To accomodate WSL2 mirrored mode, another rule is inserted beforehand to specifically accept packets coming from the loopback0 interface.

- How to verify it

New integration tests.

- Description for the changelog

- Fix a security issue that was allowing remote hosts to connect directly to a container, on one of its published port.
- Fix a security issue that was allowing neighbor hosts to connect to ports mapped on a loopback address.

@akerouanton akerouanton added this to the 28.0.0 milestone Jan 22, 2025
@akerouanton akerouanton self-assigned this Jan 22, 2025
@akerouanton akerouanton changed the title Fix 45610 v2 libnet/d/bridge: drop connections to lo mappings, and direct remote connections Jan 22, 2025
@akerouanton akerouanton force-pushed the fix-45610-v2 branch 3 times, most recently from 03b8006 to ecde2b3 Compare January 23, 2025 22:51
@akerouanton akerouanton marked this pull request as ready for review January 23, 2025 22:56
Copy link
Contributor

@robmry robmry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good ...

Disallowing direct routing to published ports is quite a bit more drastic than the original change - we'll need to make it clear in release notes. (In this release we were already blocking direct routing to unpublished ports, which were previously open if the filter-FORWARD policy was ACCEPT. So it's an update for that description.)

We'll also need a docs update - this note can go, but we should probably replace it with a description of these new rules. I've not checked for other mentions / places we should mention this.

Nit - typo releasePortBindigs in the first commit comment, might make it less searchable-for. (Also, in the last commit comment, Traditionnally and unconditionnally only have one n.)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks mostly good to me; left some small suggestions.

not good enough on the networking part of things to say if it's all good though 🙈

Comment on lines +493 to +501
err := netlink.LinkAdd(iface)
assert.NilError(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that needs cleaning up afterwards, or not a problem if we don't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both times this function is called a disposable network namespace is created first. I'll add a comment stating that it should be used that way.

@akerouanton akerouanton force-pushed the fix-45610-v2 branch 2 times, most recently from 7a891ad to ecf3873 Compare January 27, 2025 17:41
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Commit fc7caf9 reverted 433b1f9 as it was introducing a regression,
ie. containers couldn't reach ports published on the host using their
gateway's IP address or the host IP address.

These scenarios are now tested.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
When a NAT-based port mapping is created, the daemon adds a DNAT rule in
nat-DOCKER to replace the dest addr with the container IP. However, the
daemon never sets up rules to filter packets destined directly to the
container port. This allows a rogue neighbor (ie. a host that shares a
L2 segment with the host) to send packets directly to the container on
its container-side exposed port.

For instance, if container port 5000 is mapped to host port 6000, a
neighbor could send packets directly to the container on its port 5000.

Since nat-DOCKER mangles the dest addr, and the nat table forbids DROP
rules, this change adds a new rule in the raw-PREROUTING chain to filter
ingress connections targeting the container's IP address.

This filtering is only done when gw_mode=nat. For the unprotected
variant, no filtering is done.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Traditionally when Linux receives remote packets with daddr set to a
loopback address, it reject them as 'martians'. However, when a NAT rule
is applied through iptables this doesn't happen. Our current DNAT rule
used to map host ports to containers is applied unconditionally, even
for such 'martian' packets.

This means a neighbor host (ie. a host connected to the same L2
segment) can send packets to a port mapped on a loopback address. The
purpose of publishing on a loopback address is to make ports
inaccessible to remote hosts -- lack of proper filtering defeats that.

This commit adds an iptables rule to the raw-PREROUTING chain to drop
packets with a loopback dest address and coming from any interface other
than lo.

To accomodate WSL2 mirrored mode, another rule is inserted beforehand to
specifically accept packets coming from the loopback0 interface.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

@thaJeztah
Copy link
Member

Discussing with @akerouanton and @robmry - more eyes on this could still be useful, but we can make changes in a follow-up where needed; we can probably bring this one in

@thaJeztah thaJeztah merged commit 47dc8d5 into moby:master Jan 28, 2025
147 checks passed
@raesene
Copy link
Contributor

raesene commented Feb 20, 2025

Hi all, can I check as this was identified as a security issue that's been fixed (and I know from reading some linked issues, it's been around a while), are there any plans to assign a CVE for it?

Just thinking that without that people might not realise that it's important to upgrade to this version for security purposes.

@akerouanton akerouanton deleted the fix-45610-v2 branch February 20, 2025 09:01
@akerouanton
Copy link
Member Author

Thanks for asking @raesene.

We're planning to release a blog post soon that should help clarify what we did, what this issue is about, and why we did consider fixing it in v28. At the moment, we're considering this as an hardening improvement, and we're not planning to get a CVE assigned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Complete
Development

Successfully merging this pull request may close these issues.

Publishing ports explicitly to private networks should not be accessible from LAN hosts
4 participants