In an integration test - run a daemon, capture iptables, and feed them
to a markdown text/template describing them.
Prep for repeating that, for different network configurations.
Fail the test if the generated markdown differs from a "golden" version.
(So, at-least the golden markdown will need to be updated if the
iptables rules are deliberately changed - hopefully the corresponding
description in the template will also be updated.)
Signed-off-by: Rob Murray <rob.murray@docker.com>
| 1 | 1 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,159 @@ |
| 0 |
+## iptables for a new Daemon |
|
| 1 |
+ |
|
| 2 |
+When the daemon starts, it creates custom chains, and rules for the |
|
| 3 |
+default bridge network. |
|
| 4 |
+ |
|
| 5 |
+Table `filter`: |
|
| 6 |
+ |
|
| 7 |
+ Chain INPUT (policy ACCEPT 0 packets, 0 bytes) |
|
| 8 |
+ num pkts bytes target prot opt in out source destination |
|
| 9 |
+ |
|
| 10 |
+ Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) |
|
| 11 |
+ num pkts bytes target prot opt in out source destination |
|
| 12 |
+ 1 0 0 DOCKER-USER 0 -- * * 0.0.0.0/0 0.0.0.0/0 |
|
| 13 |
+ 2 0 0 DOCKER-ISOLATION-STAGE-1 0 -- * * 0.0.0.0/0 0.0.0.0/0 |
|
| 14 |
+ 3 0 0 ACCEPT 0 -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED |
|
| 15 |
+ 4 0 0 DOCKER 0 -- * docker0 0.0.0.0/0 0.0.0.0/0 |
|
| 16 |
+ 5 0 0 ACCEPT 0 -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0 |
|
| 17 |
+ 6 0 0 ACCEPT 0 -- docker0 docker0 0.0.0.0/0 0.0.0.0/0 |
|
| 18 |
+ |
|
| 19 |
+ Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) |
|
| 20 |
+ num pkts bytes target prot opt in out source destination |
|
| 21 |
+ |
|
| 22 |
+ Chain DOCKER (1 references) |
|
| 23 |
+ num pkts bytes target prot opt in out source destination |
|
| 24 |
+ |
|
| 25 |
+ Chain DOCKER-ISOLATION-STAGE-1 (1 references) |
|
| 26 |
+ num pkts bytes target prot opt in out source destination |
|
| 27 |
+ 1 0 0 DOCKER-ISOLATION-STAGE-2 0 -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0 |
|
| 28 |
+ 2 0 0 RETURN 0 -- * * 0.0.0.0/0 0.0.0.0/0 |
|
| 29 |
+ |
|
| 30 |
+ Chain DOCKER-ISOLATION-STAGE-2 (1 references) |
|
| 31 |
+ num pkts bytes target prot opt in out source destination |
|
| 32 |
+ 1 0 0 DROP 0 -- * docker0 0.0.0.0/0 0.0.0.0/0 |
|
| 33 |
+ 2 0 0 RETURN 0 -- * * 0.0.0.0/0 0.0.0.0/0 |
|
| 34 |
+ |
|
| 35 |
+ Chain DOCKER-USER (1 references) |
|
| 36 |
+ num pkts bytes target prot opt in out source destination |
|
| 37 |
+ 1 0 0 RETURN 0 -- * * 0.0.0.0/0 0.0.0.0/0 |
|
| 38 |
+ |
|
| 39 |
+ |
|
| 40 |
+<details> |
|
| 41 |
+<summary>iptables commands</summary> |
|
| 42 |
+ |
|
| 43 |
+ -P INPUT ACCEPT |
|
| 44 |
+ -P FORWARD ACCEPT |
|
| 45 |
+ -P OUTPUT ACCEPT |
|
| 46 |
+ -N DOCKER |
|
| 47 |
+ -N DOCKER-ISOLATION-STAGE-1 |
|
| 48 |
+ -N DOCKER-ISOLATION-STAGE-2 |
|
| 49 |
+ -N DOCKER-USER |
|
| 50 |
+ -A FORWARD -j DOCKER-USER |
|
| 51 |
+ -A FORWARD -j DOCKER-ISOLATION-STAGE-1 |
|
| 52 |
+ -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT |
|
| 53 |
+ -A FORWARD -o docker0 -j DOCKER |
|
| 54 |
+ -A FORWARD -i docker0 ! -o docker0 -j ACCEPT |
|
| 55 |
+ -A FORWARD -i docker0 -o docker0 -j ACCEPT |
|
| 56 |
+ -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2 |
|
| 57 |
+ -A DOCKER-ISOLATION-STAGE-1 -j RETURN |
|
| 58 |
+ -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP |
|
| 59 |
+ -A DOCKER-ISOLATION-STAGE-2 -j RETURN |
|
| 60 |
+ -A DOCKER-USER -j RETURN |
|
| 61 |
+ |
|
| 62 |
+ |
|
| 63 |
+</details> |
|
| 64 |
+ |
|
| 65 |
+The FORWARD chain's policy shown above is ACCEPT. However: |
|
| 66 |
+ |
|
| 67 |
+ - For IPv4, [setupIPForwarding][1] sets the POLICY to DROP if the sysctl |
|
| 68 |
+ net.ipv4.ip_forward was not set to '1', and the daemon set it itself. |
|
| 69 |
+ - For IPv6, the policy is always DROP. |
|
| 70 |
+ |
|
| 71 |
+[1]: https://github.com/moby/moby/blob/cff4f20c44a3a7c882ed73934dec6a77246c6323/libnetwork/drivers/bridge/setup_ip_forwarding.go#L44 |
|
| 72 |
+ |
|
| 73 |
+The FORWARD chain rules are numbered in the output above, they are: |
|
| 74 |
+ |
|
| 75 |
+ 1. Unconditional jump to DOCKER-USER. |
|
| 76 |
+ This is set up by libnetwork, in [setupUserChain][10]. |
|
| 77 |
+ Docker won't add rules to the DOCKER-USER chain, it's only for user-defined rules. |
|
| 78 |
+ It's (mostly) kept at the top of the by deleting it and re-creating after each |
|
| 79 |
+ new network is created, while traffic may be running for other networks. |
|
| 80 |
+ 2. Unconditional jump to DOCKER-ISOLATION-STAGE-1. |
|
| 81 |
+ Set up during network creation by [setupIPTables][11], which ensures it appears |
|
| 82 |
+ after the jump to DOCKER-USER (by deleting it and re-creating, while traffic |
|
| 83 |
+ may be running for other networks). |
|
| 84 |
+ 3. ACCEPT RELATED,ESTABLISHED packets into a specific bridge network. |
|
| 85 |
+ Allows responses to outgoing requests, and continuation of incoming requests, |
|
| 86 |
+ without needing to process any further rules. |
|
| 87 |
+ This rule is also added during network creation, but the code to do it |
|
| 88 |
+ is in libnetwork, [ProgramChain][12]. |
|
| 89 |
+ 4. Jump to DOCKER, for any packet destined for a bridge network. Added when |
|
| 90 |
+ the network is created, in [ProgramChain][13] ("filterChain" is the DOCKER chain).
|
|
| 91 |
+ The DOCKER chain implements per-port/protocol filtering for each container. |
|
| 92 |
+ 5. ACCEPT any packet leaving a network, also set up when the network is created, in |
|
| 93 |
+ [setupIPTablesInternal][14]. |
|
| 94 |
+ 6. ACCEPT packets flowing between containers within a network, because by default |
|
| 95 |
+ container isolation is disabled. Also set up when the network is created, in |
|
| 96 |
+ [setIcc][15]. |
|
| 97 |
+ |
|
| 98 |
+[10]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/firewall_linux.go#L50 |
|
| 99 |
+[11]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L201 |
|
| 100 |
+[12]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/iptables/iptables.go#L270 |
|
| 101 |
+[13]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/iptables/iptables.go#L251-L255 |
|
| 102 |
+[14]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L264 |
|
| 103 |
+[15]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L343 |
|
| 104 |
+ |
|
| 105 |
+_With ICC enabled 5 and 6 could be combined, to ACCEPT anything from the bridge. |
|
| 106 |
+But, when ICC is disabled, rule 6 is DROP, so it would need to be placed before |
|
| 107 |
+rule 5. Because the rules are generated in different places, that's a slightly |
|
| 108 |
+bigger change than it should be._ |
|
| 109 |
+ |
|
| 110 |
+The DOCKER chain is empty, because there are no containers with port mappings yet. |
|
| 111 |
+ |
|
| 112 |
+The DOCKER-ISOLATION chains implement inter-network isolation, all (unrelated) |
|
| 113 |
+packets are processed by these chains. The rule are inserted at the head of the |
|
| 114 |
+chain when a network is created, in [setINC][20]. |
|
| 115 |
+ - DOCKER-ISOLATION-STAGE-1 jumps to DOCKER-ISOLATION-STAGE-2 for any packet |
|
| 116 |
+ routed to a docker network that has not come from that docker network. |
|
| 117 |
+ - DOCKER-ISOLATION-STAGE-2 processes all packets leaving a bridge network, |
|
| 118 |
+ packets that are destined for any other network are dropped. |
|
| 119 |
+ |
|
| 120 |
+[20]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L369 |
|
| 121 |
+ |
|
| 122 |
+Table nat: |
|
| 123 |
+ |
|
| 124 |
+ Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) |
|
| 125 |
+ num pkts bytes target prot opt in out source destination |
|
| 126 |
+ 1 0 0 DOCKER 0 -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL |
|
| 127 |
+ |
|
| 128 |
+ Chain INPUT (policy ACCEPT 0 packets, 0 bytes) |
|
| 129 |
+ num pkts bytes target prot opt in out source destination |
|
| 130 |
+ |
|
| 131 |
+ Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) |
|
| 132 |
+ num pkts bytes target prot opt in out source destination |
|
| 133 |
+ 1 0 0 DOCKER 0 -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL |
|
| 134 |
+ |
|
| 135 |
+ Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) |
|
| 136 |
+ num pkts bytes target prot opt in out source destination |
|
| 137 |
+ 1 0 0 MASQUERADE 0 -- * !docker0 172.17.0.0/16 0.0.0.0/0 |
|
| 138 |
+ |
|
| 139 |
+ Chain DOCKER (2 references) |
|
| 140 |
+ num pkts bytes target prot opt in out source destination |
|
| 141 |
+ 1 0 0 RETURN 0 -- docker0 * 0.0.0.0/0 0.0.0.0/0 |
|
| 142 |
+ |
|
| 143 |
+ |
|
| 144 |
+<details> |
|
| 145 |
+<summary>iptables commands</summary> |
|
| 146 |
+ |
|
| 147 |
+ -P PREROUTING ACCEPT |
|
| 148 |
+ -P INPUT ACCEPT |
|
| 149 |
+ -P OUTPUT ACCEPT |
|
| 150 |
+ -P POSTROUTING ACCEPT |
|
| 151 |
+ -N DOCKER |
|
| 152 |
+ -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER |
|
| 153 |
+ -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER |
|
| 154 |
+ -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE |
|
| 155 |
+ -A DOCKER -i docker0 -j RETURN |
|
| 156 |
+ |
|
| 157 |
+ |
|
| 158 |
+</details> |
| 0 | 159 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,41 @@ |
| 0 |
+# Docker Engine's use of iptables |
|
| 1 |
+ |
|
| 2 |
+> [!WARNING] |
|
| 3 |
+> This is intended for development use - the structure of docker's iptables |
|
| 4 |
+> (and ip6tables) rules will change between releases, it is not a stable |
|
| 5 |
+> interface. |
|
| 6 |
+ |
|
| 7 |
+> [!NOTE] |
|
| 8 |
+> This document is generated by `TestBridgeIptablesDoc` by running a |
|
| 9 |
+> daemon, creating networks and containers, and capturing iptables. |
|
| 10 |
+> The iptables are then merged with a text/template for each section. |
|
| 11 |
+> The resulting document is diffed against one in the repo, so the |
|
| 12 |
+> test will fail if there are differences in the generated rules (but |
|
| 13 |
+> changes in the templates may go unnoticed). |
|
| 14 |
+> |
|
| 15 |
+> Links to code are permalinks - they will be out of date, and may not |
|
| 16 |
+> point to the master branch. But, it's difficult to work out where |
|
| 17 |
+> some of the rules come from, the links are intended as hints. |
|
| 18 |
+ |
|
| 19 |
+ip6tables rules follow the same pattern as iptables rules. So, only the |
|
| 20 |
+IPv4 rules are shown here. |
|
| 21 |
+ |
|
| 22 |
+The bridge driver deletes its custom chains during its initialisation, in |
|
| 23 |
+[configure][100]. Rules are then re-created as networks are restored. However, |
|
| 24 |
+the filter-FORWARD chain is not cleared. The order in which networks are |
|
| 25 |
+re-created is not the order in which they were originally created. So, |
|
| 26 |
+rules may be arranged differently following a daemon restart. |
|
| 27 |
+ |
|
| 28 |
+When firewalld is running, if it's reloaded, iptables rules are cleared. |
|
| 29 |
+The daemon registers handlers for its reload event (received via dbus) |
|
| 30 |
+to reconstruct the rules. |
|
| 31 |
+ |
|
| 32 |
+The filter-INPUT chain is not used by Docker. Packets arriving from the host's |
|
| 33 |
+physical network or the host itself hit the filter-FORWARD chain, as they are |
|
| 34 |
+routed into the bridge network. Similarly, filter-OUTPUT is not used. |
|
| 35 |
+ |
|
| 36 |
+[100]: https://github.com/moby/moby/blob/fe09cab7fe04c3911417061f7c7ef60a8acc6bf3/libnetwork/drivers/bridge/bridge_linux.go#L508 |
|
| 37 |
+ |
|
| 38 |
+Scenarios: |
|
| 39 |
+ |
|
| 40 |
+ - [New daemon](generated/new-daemon.md) |
| 0 | 41 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,233 @@ |
| 0 |
+// Package iptablesdoc runs docker, creates networks, runs containers and |
|
| 1 |
+// captures iptables output for various configurations. |
|
| 2 |
+// |
|
| 3 |
+// The iptables output is then used with a markdown text/template from the |
|
| 4 |
+// "templates" directory for each configuration (for each "section" in "index"), |
|
| 5 |
+// to generate a markdown document for each section. |
|
| 6 |
+// |
|
| 7 |
+// The newly generated documents are placed in: |
|
| 8 |
+// |
|
| 9 |
+// bundles/test-integration/TestBridgeIptablesDoc/iptables.md |
|
| 10 |
+// |
|
| 11 |
+// If the generated doc differs from the "golden" reference in "generated/", |
|
| 12 |
+// the test fails. When that happens: |
|
| 13 |
+// |
|
| 14 |
+// - check the iptables rules changes in the diff |
|
| 15 |
+// - update the description in the corresponding "_templ.md" file |
|
| 16 |
+// - re-run with TESTFLAGS='-update' to update the reference docs |
|
| 17 |
+package iptablesdoc |
|
| 18 |
+ |
|
| 19 |
+import ( |
|
| 20 |
+ "context" |
|
| 21 |
+ "fmt" |
|
| 22 |
+ "net/netip" |
|
| 23 |
+ "os" |
|
| 24 |
+ "path/filepath" |
|
| 25 |
+ "regexp" |
|
| 26 |
+ "strings" |
|
| 27 |
+ "testing" |
|
| 28 |
+ "text/template" |
|
| 29 |
+ |
|
| 30 |
+ containertypes "github.com/docker/docker/api/types/container" |
|
| 31 |
+ networktypes "github.com/docker/docker/api/types/network" |
|
| 32 |
+ "github.com/docker/docker/integration/internal/container" |
|
| 33 |
+ "github.com/docker/docker/integration/internal/network" |
|
| 34 |
+ "github.com/docker/docker/internal/testutils/networking" |
|
| 35 |
+ "github.com/docker/docker/libnetwork/drivers/bridge" |
|
| 36 |
+ "github.com/docker/docker/testutil" |
|
| 37 |
+ "github.com/docker/docker/testutil/daemon" |
|
| 38 |
+ "github.com/docker/go-connections/nat" |
|
| 39 |
+ "gotest.tools/v3/assert" |
|
| 40 |
+ "gotest.tools/v3/golden" |
|
| 41 |
+ "gotest.tools/v3/skip" |
|
| 42 |
+) |
|
| 43 |
+ |
|
| 44 |
+var ( |
|
| 45 |
+ docNetworks = []string{"192.0.2.0/24", "198.51.100.0/24", "203.0.113.0/24"}
|
|
| 46 |
+ docGateways = []string{"192.0.2.1", "198.51.100.1", "203.0.113.1"}
|
|
| 47 |
+) |
|
| 48 |
+ |
|
| 49 |
+type ctr struct {
|
|
| 50 |
+ name string |
|
| 51 |
+ portMappings nat.PortMap |
|
| 52 |
+} |
|
| 53 |
+ |
|
| 54 |
+type bridgeNetwork struct {
|
|
| 55 |
+ bridge string |
|
| 56 |
+ gwMode string |
|
| 57 |
+ noICC bool |
|
| 58 |
+ internal bool |
|
| 59 |
+ containers []ctr |
|
| 60 |
+} |
|
| 61 |
+ |
|
| 62 |
+type section struct {
|
|
| 63 |
+ name string |
|
| 64 |
+ noUserlandProxy bool |
|
| 65 |
+ networks []bridgeNetwork |
|
| 66 |
+} |
|
| 67 |
+ |
|
| 68 |
+var index = []section{
|
|
| 69 |
+ {
|
|
| 70 |
+ name: "new-daemon.md", |
|
| 71 |
+ }, |
|
| 72 |
+} |
|
| 73 |
+ |
|
| 74 |
+// iptCmdType is used to look up iptCmds in the markdown (can't use an int |
|
| 75 |
+// type, or a new string type, so it's just an alias). |
|
| 76 |
+type iptCmdType = string |
|
| 77 |
+ |
|
| 78 |
+const ( |
|
| 79 |
+ iptCmdLFilter4 iptCmdType = "LFilter4" |
|
| 80 |
+ iptCmdSFilter4 iptCmdType = "SFilter4" |
|
| 81 |
+ iptCmdSFilterForward4 iptCmdType = "SFilterForward4" |
|
| 82 |
+ iptCmdSFilterDocker4 iptCmdType = "SFilterDocker4" |
|
| 83 |
+ iptCmdLNat4 iptCmdType = "LNat4" |
|
| 84 |
+ iptCmdSNat4 iptCmdType = "SNat4" |
|
| 85 |
+) |
|
| 86 |
+ |
|
| 87 |
+var iptCmds = map[iptCmdType][]string{
|
|
| 88 |
+ iptCmdLFilter4: {"iptables", "-nvL", "--line-numbers", "-t", "filter"},
|
|
| 89 |
+ iptCmdSFilter4: {"iptables", "-S", "-t", "filter"},
|
|
| 90 |
+ iptCmdSFilterForward4: {"iptables", "-S", "FORWARD"},
|
|
| 91 |
+ iptCmdSFilterDocker4: {"iptables", "-S", "DOCKER"},
|
|
| 92 |
+ iptCmdLNat4: {"iptables", "-nvL", "--line-numbers", "-t", "nat"},
|
|
| 93 |
+ iptCmdSNat4: {"iptables", "-S", "-t", "nat"},
|
|
| 94 |
+} |
|
| 95 |
+ |
|
| 96 |
+func TestBridgeIptablesDoc(t *testing.T) {
|
|
| 97 |
+ skip.If(t, testEnv.IsRootless) |
|
| 98 |
+ ctx := setupTest(t) |
|
| 99 |
+ |
|
| 100 |
+ // Get the full path for "bundles/TestBridgeIptablesDoc". |
|
| 101 |
+ dest := os.Getenv("DOCKER_INTEGRATION_DAEMON_DEST")
|
|
| 102 |
+ if dest == "" {
|
|
| 103 |
+ dest = os.Getenv("DEST")
|
|
| 104 |
+ } |
|
| 105 |
+ dest = filepath.Join(dest, t.Name()) |
|
| 106 |
+ |
|
| 107 |
+ // Set up an L3Segment, which will have a netns for each "section". |
|
| 108 |
+ addr4 := netip.MustParseAddr("192.168.124.1")
|
|
| 109 |
+ addr6 := netip.MustParseAddr("fdc0:36dc:a4dd::1")
|
|
| 110 |
+ l3 := networking.NewL3Segment(t, "gen-iptables-doc", |
|
| 111 |
+ netip.PrefixFrom(addr4, 24), |
|
| 112 |
+ netip.PrefixFrom(addr6, 64), |
|
| 113 |
+ ) |
|
| 114 |
+ t.Cleanup(func() { l3.Destroy(t) })
|
|
| 115 |
+ |
|
| 116 |
+ for i, sec := range index {
|
|
| 117 |
+ // Create a netns for this section. |
|
| 118 |
+ addr4 = addr4.Next() |
|
| 119 |
+ addr6 = addr6.Next() |
|
| 120 |
+ hostname := fmt.Sprintf("docker%d", i)
|
|
| 121 |
+ l3.AddHost(t, hostname, hostname+"-host", "eth0", |
|
| 122 |
+ netip.PrefixFrom(addr4, 24), |
|
| 123 |
+ netip.PrefixFrom(addr6, 64), |
|
| 124 |
+ ) |
|
| 125 |
+ host := l3.Hosts[hostname] |
|
| 126 |
+ // Stop the interface, to reduce the chances of stray packets getting counted by iptables. |
|
| 127 |
+ host.Run(t, "ip", "link", "set", "eth0", "down") |
|
| 128 |
+ |
|
| 129 |
+ t.Run("gen_"+sec.name, func(t *testing.T) {
|
|
| 130 |
+ // t.Parallel() - doesn't speed things up, startup times just extend |
|
| 131 |
+ runTestNet(t, testutil.StartSpan(ctx, t), dest, sec, host) |
|
| 132 |
+ }) |
|
| 133 |
+ } |
|
| 134 |
+} |
|
| 135 |
+ |
|
| 136 |
+func runTestNet(t *testing.T, ctx context.Context, bundlesDir string, section section, host networking.Host) {
|
|
| 137 |
+ var dArgs []string |
|
| 138 |
+ if section.noUserlandProxy {
|
|
| 139 |
+ dArgs = append(dArgs, "--userland-proxy=false") |
|
| 140 |
+ } |
|
| 141 |
+ |
|
| 142 |
+ // Start the daemon in its own network namespace. |
|
| 143 |
+ var d *daemon.Daemon |
|
| 144 |
+ host.Do(t, func() {
|
|
| 145 |
+ // Run without OTEL because there's no routing from this netns for it - which |
|
| 146 |
+ // means the daemon doesn't shut down cleanly, causing the test to fail. |
|
| 147 |
+ d = daemon.New(t, daemon.WithEnvVars("OTEL_EXPORTER_OTLP_ENDPOINT="))
|
|
| 148 |
+ d.StartWithBusybox(ctx, t, dArgs...) |
|
| 149 |
+ t.Cleanup(func() { d.Stop(t) })
|
|
| 150 |
+ }) |
|
| 151 |
+ |
|
| 152 |
+ c := d.NewClientT(t) |
|
| 153 |
+ t.Cleanup(func() { c.Close() })
|
|
| 154 |
+ |
|
| 155 |
+ assert.Assert(t, len(section.networks) < len(docNetworks), "Don't have enough container network addresses") |
|
| 156 |
+ for i, nw := range section.networks {
|
|
| 157 |
+ gwMode := nw.gwMode |
|
| 158 |
+ if gwMode == "" {
|
|
| 159 |
+ gwMode = "nat" |
|
| 160 |
+ } |
|
| 161 |
+ netOpts := []func(*networktypes.CreateOptions){
|
|
| 162 |
+ network.WithIPAM(docNetworks[i], docGateways[i]), |
|
| 163 |
+ network.WithOption(bridge.BridgeName, nw.bridge), |
|
| 164 |
+ network.WithOption(bridge.IPv4GatewayMode, gwMode), |
|
| 165 |
+ } |
|
| 166 |
+ if nw.noICC {
|
|
| 167 |
+ netOpts = append(netOpts, network.WithOption(bridge.EnableICC, "false")) |
|
| 168 |
+ } |
|
| 169 |
+ if nw.internal {
|
|
| 170 |
+ netOpts = append(netOpts, network.WithInternal()) |
|
| 171 |
+ } |
|
| 172 |
+ network.CreateNoError(ctx, t, c, nw.bridge, netOpts...) |
|
| 173 |
+ t.Cleanup(func() { network.RemoveNoError(ctx, t, c, nw.bridge) })
|
|
| 174 |
+ |
|
| 175 |
+ for _, ctr := range nw.containers {
|
|
| 176 |
+ var exposedPorts []string |
|
| 177 |
+ for ep := range ctr.portMappings {
|
|
| 178 |
+ exposedPorts = append(exposedPorts, ep.Port()+"/"+ep.Proto()) |
|
| 179 |
+ } |
|
| 180 |
+ id := container.Run(ctx, t, c, |
|
| 181 |
+ container.WithNetworkMode(nw.bridge), |
|
| 182 |
+ container.WithExposedPorts(exposedPorts...), |
|
| 183 |
+ container.WithPortMap(ctr.portMappings), |
|
| 184 |
+ ) |
|
| 185 |
+ t.Cleanup(func() {
|
|
| 186 |
+ c.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
|
|
| 187 |
+ }) |
|
| 188 |
+ } |
|
| 189 |
+ } |
|
| 190 |
+ |
|
| 191 |
+ iptablesOutput := runIptables(t, host) |
|
| 192 |
+ generated := generate(t, section.name, iptablesOutput) |
|
| 193 |
+ |
|
| 194 |
+ // Write the output to the 'bundles' directory for easy reference. |
|
| 195 |
+ outFile := filepath.Join(bundlesDir, section.name) |
|
| 196 |
+ err := os.WriteFile(outFile, []byte(generated), 0o644) |
|
| 197 |
+ assert.NilError(t, err) |
|
| 198 |
+ t.Log("Wrote ", outFile)
|
|
| 199 |
+ |
|
| 200 |
+ // Compare against "golden" results. |
|
| 201 |
+ // Use full path so that the directory containing generated docs doesn't |
|
| 202 |
+ // have to be called 'testdata'. |
|
| 203 |
+ wd, err := os.Getwd() |
|
| 204 |
+ assert.NilError(t, err) |
|
| 205 |
+ golden.Assert(t, generated, filepath.Join(wd, "generated", section.name)) |
|
| 206 |
+} |
|
| 207 |
+ |
|
| 208 |
+var rePacketByteCounts = regexp.MustCompile(`\d+ packets, \d+ bytes`) |
|
| 209 |
+ |
|
| 210 |
+func runIptables(t *testing.T, host networking.Host) map[iptCmdType]string {
|
|
| 211 |
+ host.Run(t, "iptables", "-Z") |
|
| 212 |
+ host.Run(t, "iptables", "-Z", "-t", "nat") |
|
| 213 |
+ res := map[iptCmdType]string{}
|
|
| 214 |
+ for k, cmd := range iptCmds {
|
|
| 215 |
+ d := host.Run(t, cmd[0], cmd[1:]...) |
|
| 216 |
+ // In CI, the OUTPUT chain sometimes sees a packet. Remove the counts. |
|
| 217 |
+ d = rePacketByteCounts.ReplaceAllString(d, "0 packets, 0 bytes") |
|
| 218 |
+ // Indent the result, so that it's treated as preformatted markdown. |
|
| 219 |
+ res[k] = strings.ReplaceAll(d, "\n", "\n ") |
|
| 220 |
+ } |
|
| 221 |
+ return res |
|
| 222 |
+} |
|
| 223 |
+ |
|
| 224 |
+func generate(t *testing.T, name string, data map[iptCmdType]string) string {
|
|
| 225 |
+ t.Helper() |
|
| 226 |
+ templ, err := template.New(name).ParseFiles(filepath.Join("templates", name))
|
|
| 227 |
+ assert.NilError(t, err) |
|
| 228 |
+ wr := strings.Builder{}
|
|
| 229 |
+ err = templ.ExecuteTemplate(&wr, name, data) |
|
| 230 |
+ assert.NilError(t, err) |
|
| 231 |
+ return wr.String() |
|
| 232 |
+} |
| 0 | 233 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,56 @@ |
| 0 |
+package iptablesdoc // import "github.com/docker/docker/integration/network/bridge/iptablesdoc" |
|
| 1 |
+ |
|
| 2 |
+import ( |
|
| 3 |
+ "context" |
|
| 4 |
+ "os" |
|
| 5 |
+ "testing" |
|
| 6 |
+ |
|
| 7 |
+ "github.com/docker/docker/testutil" |
|
| 8 |
+ "github.com/docker/docker/testutil/environment" |
|
| 9 |
+ "go.opentelemetry.io/otel" |
|
| 10 |
+ "go.opentelemetry.io/otel/codes" |
|
| 11 |
+) |
|
| 12 |
+ |
|
| 13 |
+var ( |
|
| 14 |
+ testEnv *environment.Execution |
|
| 15 |
+ baseContext context.Context |
|
| 16 |
+) |
|
| 17 |
+ |
|
| 18 |
+func TestMain(m *testing.M) {
|
|
| 19 |
+ shutdown := testutil.ConfigureTracing() |
|
| 20 |
+ ctx, span := otel.Tracer("").Start(context.Background(), "integration/network/bridge/iptablesdoc.TestMain")
|
|
| 21 |
+ baseContext = ctx |
|
| 22 |
+ |
|
| 23 |
+ var err error |
|
| 24 |
+ testEnv, err = environment.New(ctx) |
|
| 25 |
+ if err != nil {
|
|
| 26 |
+ span.SetStatus(codes.Error, err.Error()) |
|
| 27 |
+ span.End() |
|
| 28 |
+ shutdown(ctx) |
|
| 29 |
+ panic(err) |
|
| 30 |
+ } |
|
| 31 |
+ |
|
| 32 |
+ err = environment.EnsureFrozenImagesLinux(ctx, testEnv) |
|
| 33 |
+ if err != nil {
|
|
| 34 |
+ span.SetStatus(codes.Error, err.Error()) |
|
| 35 |
+ span.End() |
|
| 36 |
+ shutdown(ctx) |
|
| 37 |
+ panic(err) |
|
| 38 |
+ } |
|
| 39 |
+ |
|
| 40 |
+ testEnv.Print() |
|
| 41 |
+ code := m.Run() |
|
| 42 |
+ if code != 0 {
|
|
| 43 |
+ span.SetStatus(codes.Error, "m.Run() returned non-zero exit code") |
|
| 44 |
+ } |
|
| 45 |
+ span.End() |
|
| 46 |
+ shutdown(ctx) |
|
| 47 |
+ os.Exit(code) |
|
| 48 |
+} |
|
| 49 |
+ |
|
| 50 |
+func setupTest(t *testing.T) context.Context {
|
|
| 51 |
+ ctx := testutil.StartSpan(baseContext, t) |
|
| 52 |
+ environment.ProtectAll(ctx, t, testEnv) |
|
| 53 |
+ t.Cleanup(func() { testEnv.Clean(ctx, t) })
|
|
| 54 |
+ return ctx |
|
| 55 |
+} |
| 0 | 56 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,83 @@ |
| 0 |
+## iptables for a new Daemon |
|
| 1 |
+ |
|
| 2 |
+When the daemon starts, it creates custom chains, and rules for the |
|
| 3 |
+default bridge network. |
|
| 4 |
+ |
|
| 5 |
+Table `filter`: |
|
| 6 |
+ |
|
| 7 |
+ {{index . "LFilter4"}}
|
|
| 8 |
+ |
|
| 9 |
+<details> |
|
| 10 |
+<summary>iptables commands</summary> |
|
| 11 |
+ |
|
| 12 |
+ {{index . "SFilter4"}}
|
|
| 13 |
+ |
|
| 14 |
+</details> |
|
| 15 |
+ |
|
| 16 |
+The FORWARD chain's policy shown above is ACCEPT. However: |
|
| 17 |
+ |
|
| 18 |
+ - For IPv4, [setupIPForwarding][1] sets the POLICY to DROP if the sysctl |
|
| 19 |
+ net.ipv4.ip_forward was not set to '1', and the daemon set it itself. |
|
| 20 |
+ - For IPv6, the policy is always DROP. |
|
| 21 |
+ |
|
| 22 |
+[1]: https://github.com/moby/moby/blob/cff4f20c44a3a7c882ed73934dec6a77246c6323/libnetwork/drivers/bridge/setup_ip_forwarding.go#L44 |
|
| 23 |
+ |
|
| 24 |
+The FORWARD chain rules are numbered in the output above, they are: |
|
| 25 |
+ |
|
| 26 |
+ 1. Unconditional jump to DOCKER-USER. |
|
| 27 |
+ This is set up by libnetwork, in [setupUserChain][10]. |
|
| 28 |
+ Docker won't add rules to the DOCKER-USER chain, it's only for user-defined rules. |
|
| 29 |
+ It's (mostly) kept at the top of the by deleting it and re-creating after each |
|
| 30 |
+ new network is created, while traffic may be running for other networks. |
|
| 31 |
+ 2. Unconditional jump to DOCKER-ISOLATION-STAGE-1. |
|
| 32 |
+ Set up during network creation by [setupIPTables][11], which ensures it appears |
|
| 33 |
+ after the jump to DOCKER-USER (by deleting it and re-creating, while traffic |
|
| 34 |
+ may be running for other networks). |
|
| 35 |
+ 3. ACCEPT RELATED,ESTABLISHED packets into a specific bridge network. |
|
| 36 |
+ Allows responses to outgoing requests, and continuation of incoming requests, |
|
| 37 |
+ without needing to process any further rules. |
|
| 38 |
+ This rule is also added during network creation, but the code to do it |
|
| 39 |
+ is in libnetwork, [ProgramChain][12]. |
|
| 40 |
+ 4. Jump to DOCKER, for any packet destined for a bridge network. Added when |
|
| 41 |
+ the network is created, in [ProgramChain][13] ("filterChain" is the DOCKER chain).
|
|
| 42 |
+ The DOCKER chain implements per-port/protocol filtering for each container. |
|
| 43 |
+ 5. ACCEPT any packet leaving a network, also set up when the network is created, in |
|
| 44 |
+ [setupIPTablesInternal][14]. |
|
| 45 |
+ 6. ACCEPT packets flowing between containers within a network, because by default |
|
| 46 |
+ container isolation is disabled. Also set up when the network is created, in |
|
| 47 |
+ [setIcc][15]. |
|
| 48 |
+ |
|
| 49 |
+[10]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/firewall_linux.go#L50 |
|
| 50 |
+[11]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L201 |
|
| 51 |
+[12]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/iptables/iptables.go#L270 |
|
| 52 |
+[13]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/iptables/iptables.go#L251-L255 |
|
| 53 |
+[14]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L264 |
|
| 54 |
+[15]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L343 |
|
| 55 |
+ |
|
| 56 |
+_With ICC enabled 5 and 6 could be combined, to ACCEPT anything from the bridge. |
|
| 57 |
+But, when ICC is disabled, rule 6 is DROP, so it would need to be placed before |
|
| 58 |
+rule 5. Because the rules are generated in different places, that's a slightly |
|
| 59 |
+bigger change than it should be._ |
|
| 60 |
+ |
|
| 61 |
+The DOCKER chain is empty, because there are no containers with port mappings yet. |
|
| 62 |
+ |
|
| 63 |
+The DOCKER-ISOLATION chains implement inter-network isolation, all (unrelated) |
|
| 64 |
+packets are processed by these chains. The rule are inserted at the head of the |
|
| 65 |
+chain when a network is created, in [setINC][20]. |
|
| 66 |
+ - DOCKER-ISOLATION-STAGE-1 jumps to DOCKER-ISOLATION-STAGE-2 for any packet |
|
| 67 |
+ routed to a docker network that has not come from that docker network. |
|
| 68 |
+ - DOCKER-ISOLATION-STAGE-2 processes all packets leaving a bridge network, |
|
| 69 |
+ packets that are destined for any other network are dropped. |
|
| 70 |
+ |
|
| 71 |
+[20]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L369 |
|
| 72 |
+ |
|
| 73 |
+Table nat: |
|
| 74 |
+ |
|
| 75 |
+ {{index . "LNat4"}}
|
|
| 76 |
+ |
|
| 77 |
+<details> |
|
| 78 |
+<summary>iptables commands</summary> |
|
| 79 |
+ |
|
| 80 |
+ {{index . "SNat4"}}
|
|
| 81 |
+ |
|
| 82 |
+</details> |
| ... | ... |
@@ -16,7 +16,8 @@ import ( |
| 16 | 16 |
// host lives in the current network namespace (eg. where dockerd runs). |
| 17 | 17 |
const CurrentNetns = "" |
| 18 | 18 |
|
| 19 |
-func runCommand(t *testing.T, cmd string, args ...string) {
|
|
| 19 |
+func runCommand(t *testing.T, cmd string, args ...string) string {
|
|
| 20 |
+ t.Helper() |
|
| 20 | 21 |
t.Log(strings.Join(append([]string{cmd}, args...), " "))
|
| 21 | 22 |
|
| 22 | 23 |
var b bytes.Buffer |
| ... | ... |
@@ -28,6 +29,7 @@ func runCommand(t *testing.T, cmd string, args ...string) {
|
| 28 | 28 |
t.Log(b.String()) |
| 29 | 29 |
t.Fatalf("Error: %v", err)
|
| 30 | 30 |
} |
| 31 |
+ return b.String() |
|
| 31 | 32 |
} |
| 32 | 33 |
|
| 33 | 34 |
// L3Segment simulates a switched, dual-stack capable network that |
| ... | ... |
@@ -113,15 +115,16 @@ func newHost(t *testing.T, nsName, ifname string) Host {
|
| 113 | 113 |
} |
| 114 | 114 |
} |
| 115 | 115 |
|
| 116 |
-// Run executes the provided command in the host's network namespace. |
|
| 117 |
-func (h Host) Run(t *testing.T, cmd string, args ...string) {
|
|
| 116 |
+// Run executes the provided command in the host's network namespace |
|
| 117 |
+// and returns its combined stdout/stderr. |
|
| 118 |
+func (h Host) Run(t *testing.T, cmd string, args ...string) string {
|
|
| 118 | 119 |
t.Helper() |
| 119 | 120 |
|
| 120 | 121 |
if h.ns != CurrentNetns {
|
| 121 | 122 |
args = append([]string{"netns", "exec", h.ns, cmd}, args...)
|
| 122 | 123 |
cmd = "ip" |
| 123 | 124 |
} |
| 124 |
- runCommand(t, cmd, args...) |
|
| 125 |
+ return runCommand(t, cmd, args...) |
|
| 125 | 126 |
} |
| 126 | 127 |
|
| 127 | 128 |
// Do run the provided function in the host's network namespace. |