Browse code

Generate iptables documentation

In an integration test - run a daemon, capture iptables, and feed them
to a markdown text/template describing them.

Prep for repeating that, for different network configurations.

Fail the test if the generated markdown differs from a "golden" version.

(So, at-least the golden markdown will need to be updated if the
iptables rules are deliberately changed - hopefully the corresponding
description in the template will also be updated.)

Signed-off-by: Rob Murray <rob.murray@docker.com>

Rob Murray authored on 2024/10/11 02:39:40
Showing 6 changed files
1 1
new file mode 100644
... ...
@@ -0,0 +1,159 @@
0
+## iptables for a new Daemon
1
+
2
+When the daemon starts, it creates custom chains, and rules for the
3
+default bridge network.
4
+
5
+Table `filter`:
6
+
7
+    Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
8
+    num   pkts bytes target     prot opt in     out     source               destination         
9
+    
10
+    Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
11
+    num   pkts bytes target     prot opt in     out     source               destination         
12
+    1        0     0 DOCKER-USER  0    --  *      *       0.0.0.0/0            0.0.0.0/0           
13
+    2        0     0 DOCKER-ISOLATION-STAGE-1  0    --  *      *       0.0.0.0/0            0.0.0.0/0           
14
+    3        0     0 ACCEPT     0    --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
15
+    4        0     0 DOCKER     0    --  *      docker0  0.0.0.0/0            0.0.0.0/0           
16
+    5        0     0 ACCEPT     0    --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
17
+    6        0     0 ACCEPT     0    --  docker0 docker0  0.0.0.0/0            0.0.0.0/0           
18
+    
19
+    Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
20
+    num   pkts bytes target     prot opt in     out     source               destination         
21
+    
22
+    Chain DOCKER (1 references)
23
+    num   pkts bytes target     prot opt in     out     source               destination         
24
+    
25
+    Chain DOCKER-ISOLATION-STAGE-1 (1 references)
26
+    num   pkts bytes target     prot opt in     out     source               destination         
27
+    1        0     0 DOCKER-ISOLATION-STAGE-2  0    --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0           
28
+    2        0     0 RETURN     0    --  *      *       0.0.0.0/0            0.0.0.0/0           
29
+    
30
+    Chain DOCKER-ISOLATION-STAGE-2 (1 references)
31
+    num   pkts bytes target     prot opt in     out     source               destination         
32
+    1        0     0 DROP       0    --  *      docker0  0.0.0.0/0            0.0.0.0/0           
33
+    2        0     0 RETURN     0    --  *      *       0.0.0.0/0            0.0.0.0/0           
34
+    
35
+    Chain DOCKER-USER (1 references)
36
+    num   pkts bytes target     prot opt in     out     source               destination         
37
+    1        0     0 RETURN     0    --  *      *       0.0.0.0/0            0.0.0.0/0           
38
+    
39
+
40
+<details>
41
+<summary>iptables commands</summary>
42
+
43
+    -P INPUT ACCEPT
44
+    -P FORWARD ACCEPT
45
+    -P OUTPUT ACCEPT
46
+    -N DOCKER
47
+    -N DOCKER-ISOLATION-STAGE-1
48
+    -N DOCKER-ISOLATION-STAGE-2
49
+    -N DOCKER-USER
50
+    -A FORWARD -j DOCKER-USER
51
+    -A FORWARD -j DOCKER-ISOLATION-STAGE-1
52
+    -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
53
+    -A FORWARD -o docker0 -j DOCKER
54
+    -A FORWARD -i docker0 ! -o docker0 -j ACCEPT
55
+    -A FORWARD -i docker0 -o docker0 -j ACCEPT
56
+    -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
57
+    -A DOCKER-ISOLATION-STAGE-1 -j RETURN
58
+    -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
59
+    -A DOCKER-ISOLATION-STAGE-2 -j RETURN
60
+    -A DOCKER-USER -j RETURN
61
+    
62
+
63
+</details>
64
+
65
+The FORWARD chain's policy shown above is ACCEPT. However:
66
+
67
+   - For IPv4, [setupIPForwarding][1] sets the POLICY to DROP if the sysctl
68
+     net.ipv4.ip_forward was not set to '1', and the daemon set it itself.
69
+   - For IPv6, the policy is always DROP.
70
+
71
+[1]: https://github.com/moby/moby/blob/cff4f20c44a3a7c882ed73934dec6a77246c6323/libnetwork/drivers/bridge/setup_ip_forwarding.go#L44
72
+
73
+The FORWARD chain rules are numbered in the output above, they are:
74
+
75
+  1. Unconditional jump to DOCKER-USER.
76
+     This is set up by libnetwork, in [setupUserChain][10].
77
+     Docker won't add rules to the DOCKER-USER chain, it's only for user-defined rules.
78
+     It's (mostly) kept at the top of the by deleting it and re-creating after each
79
+     new network is created, while traffic may be running for other networks.
80
+  2. Unconditional jump to DOCKER-ISOLATION-STAGE-1.
81
+     Set up during network creation by [setupIPTables][11], which ensures it appears
82
+     after the jump to DOCKER-USER (by deleting it and re-creating, while traffic
83
+     may be running for other networks).
84
+  3. ACCEPT RELATED,ESTABLISHED packets into a specific bridge network.
85
+     Allows responses to outgoing requests, and continuation of incoming requests,
86
+     without needing to process any further rules.
87
+     This rule is also added during network creation, but the code to do it
88
+     is in libnetwork, [ProgramChain][12].
89
+  4. Jump to DOCKER, for any packet destined for a bridge network. Added when
90
+     the network is created, in [ProgramChain][13] ("filterChain" is the DOCKER chain).
91
+     The DOCKER chain implements per-port/protocol filtering for each container.
92
+  5. ACCEPT any packet leaving a network, also set up when the network is created, in
93
+     [setupIPTablesInternal][14].
94
+  6. ACCEPT packets flowing between containers within a network, because by default
95
+     container isolation is disabled. Also set up when the network is created, in
96
+     [setIcc][15].
97
+
98
+[10]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/firewall_linux.go#L50
99
+[11]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L201
100
+[12]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/iptables/iptables.go#L270
101
+[13]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/iptables/iptables.go#L251-L255
102
+[14]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L264
103
+[15]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L343
104
+
105
+_With ICC enabled 5 and 6 could be combined, to ACCEPT anything from the bridge.
106
+But, when ICC is disabled, rule 6 is DROP, so it would need to be placed before
107
+rule 5. Because the rules are generated in different places, that's a slightly
108
+bigger change than it should be._
109
+
110
+The DOCKER chain is empty, because there are no containers with port mappings yet.
111
+
112
+The DOCKER-ISOLATION chains implement inter-network isolation, all (unrelated)
113
+packets are processed by these chains. The rule are inserted at the head of the
114
+chain when a network is created, in [setINC][20].
115
+  - DOCKER-ISOLATION-STAGE-1 jumps to DOCKER-ISOLATION-STAGE-2 for any packet
116
+    routed to a docker network that has not come from that docker network.
117
+  - DOCKER-ISOLATION-STAGE-2 processes all packets leaving a bridge network,
118
+    packets that are destined for any other network are dropped.
119
+
120
+[20]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L369
121
+
122
+Table nat:
123
+
124
+    Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
125
+    num   pkts bytes target     prot opt in     out     source               destination         
126
+    1        0     0 DOCKER     0    --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL
127
+    
128
+    Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
129
+    num   pkts bytes target     prot opt in     out     source               destination         
130
+    
131
+    Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
132
+    num   pkts bytes target     prot opt in     out     source               destination         
133
+    1        0     0 DOCKER     0    --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL
134
+    
135
+    Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
136
+    num   pkts bytes target     prot opt in     out     source               destination         
137
+    1        0     0 MASQUERADE  0    --  *      !docker0  172.17.0.0/16        0.0.0.0/0           
138
+    
139
+    Chain DOCKER (2 references)
140
+    num   pkts bytes target     prot opt in     out     source               destination         
141
+    1        0     0 RETURN     0    --  docker0 *       0.0.0.0/0            0.0.0.0/0           
142
+    
143
+
144
+<details>
145
+<summary>iptables commands</summary>
146
+
147
+    -P PREROUTING ACCEPT
148
+    -P INPUT ACCEPT
149
+    -P OUTPUT ACCEPT
150
+    -P POSTROUTING ACCEPT
151
+    -N DOCKER
152
+    -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
153
+    -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
154
+    -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
155
+    -A DOCKER -i docker0 -j RETURN
156
+    
157
+
158
+</details>
0 159
new file mode 100644
... ...
@@ -0,0 +1,41 @@
0
+# Docker Engine's use of iptables
1
+
2
+> [!WARNING]
3
+> This is intended for development use - the structure of docker's iptables
4
+> (and ip6tables) rules will change between releases, it is not a stable
5
+> interface.
6
+
7
+> [!NOTE]
8
+> This document is generated by `TestBridgeIptablesDoc` by running a
9
+> daemon, creating networks and containers, and capturing iptables.
10
+> The iptables are then merged with a text/template for each section.
11
+> The resulting document is diffed against one in the repo, so the
12
+> test will fail if there are differences in the generated rules (but
13
+> changes in the templates may go unnoticed).
14
+>
15
+> Links to code are permalinks - they will be out of date, and may not
16
+> point to the master branch. But, it's difficult to work out where
17
+> some of the rules come from, the links are intended as hints.
18
+
19
+ip6tables rules follow the same pattern as iptables rules. So, only the
20
+IPv4 rules are shown here.
21
+
22
+The bridge driver deletes its custom chains during its initialisation, in
23
+[configure][100]. Rules are then re-created as networks are restored. However,
24
+the filter-FORWARD chain is not cleared. The order in which networks are
25
+re-created is not the order in which they were originally created. So,
26
+rules may be arranged differently following a daemon restart.
27
+
28
+When firewalld is running, if it's reloaded, iptables rules are cleared.
29
+The daemon registers handlers for its reload event (received via dbus)
30
+to reconstruct the rules.
31
+
32
+The filter-INPUT chain is not used by Docker. Packets arriving from the host's
33
+physical network or the host itself hit the filter-FORWARD chain, as they are
34
+routed into the bridge network. Similarly, filter-OUTPUT is not used.
35
+
36
+[100]: https://github.com/moby/moby/blob/fe09cab7fe04c3911417061f7c7ef60a8acc6bf3/libnetwork/drivers/bridge/bridge_linux.go#L508
37
+
38
+Scenarios:
39
+
40
+  - [New daemon](generated/new-daemon.md)
0 41
new file mode 100644
... ...
@@ -0,0 +1,233 @@
0
+// Package iptablesdoc runs docker, creates networks, runs containers and
1
+// captures iptables output for various configurations.
2
+//
3
+// The iptables output is then used with a markdown text/template from the
4
+// "templates" directory for each configuration (for each "section" in "index"),
5
+// to generate a markdown document for each section.
6
+//
7
+// The newly generated documents are placed in:
8
+//
9
+//	bundles/test-integration/TestBridgeIptablesDoc/iptables.md
10
+//
11
+// If the generated doc differs from the "golden" reference in "generated/",
12
+// the test fails. When that happens:
13
+//
14
+//   - check the iptables rules changes in the diff
15
+//   - update the description in the corresponding "_templ.md" file
16
+//   - re-run with TESTFLAGS='-update' to update the reference docs
17
+package iptablesdoc
18
+
19
+import (
20
+	"context"
21
+	"fmt"
22
+	"net/netip"
23
+	"os"
24
+	"path/filepath"
25
+	"regexp"
26
+	"strings"
27
+	"testing"
28
+	"text/template"
29
+
30
+	containertypes "github.com/docker/docker/api/types/container"
31
+	networktypes "github.com/docker/docker/api/types/network"
32
+	"github.com/docker/docker/integration/internal/container"
33
+	"github.com/docker/docker/integration/internal/network"
34
+	"github.com/docker/docker/internal/testutils/networking"
35
+	"github.com/docker/docker/libnetwork/drivers/bridge"
36
+	"github.com/docker/docker/testutil"
37
+	"github.com/docker/docker/testutil/daemon"
38
+	"github.com/docker/go-connections/nat"
39
+	"gotest.tools/v3/assert"
40
+	"gotest.tools/v3/golden"
41
+	"gotest.tools/v3/skip"
42
+)
43
+
44
+var (
45
+	docNetworks = []string{"192.0.2.0/24", "198.51.100.0/24", "203.0.113.0/24"}
46
+	docGateways = []string{"192.0.2.1", "198.51.100.1", "203.0.113.1"}
47
+)
48
+
49
+type ctr struct {
50
+	name         string
51
+	portMappings nat.PortMap
52
+}
53
+
54
+type bridgeNetwork struct {
55
+	bridge     string
56
+	gwMode     string
57
+	noICC      bool
58
+	internal   bool
59
+	containers []ctr
60
+}
61
+
62
+type section struct {
63
+	name            string
64
+	noUserlandProxy bool
65
+	networks        []bridgeNetwork
66
+}
67
+
68
+var index = []section{
69
+	{
70
+		name: "new-daemon.md",
71
+	},
72
+}
73
+
74
+// iptCmdType is used to look up iptCmds in the markdown (can't use an int
75
+// type, or a new string type, so it's just an alias).
76
+type iptCmdType = string
77
+
78
+const (
79
+	iptCmdLFilter4        iptCmdType = "LFilter4"
80
+	iptCmdSFilter4        iptCmdType = "SFilter4"
81
+	iptCmdSFilterForward4 iptCmdType = "SFilterForward4"
82
+	iptCmdSFilterDocker4  iptCmdType = "SFilterDocker4"
83
+	iptCmdLNat4           iptCmdType = "LNat4"
84
+	iptCmdSNat4           iptCmdType = "SNat4"
85
+)
86
+
87
+var iptCmds = map[iptCmdType][]string{
88
+	iptCmdLFilter4:        {"iptables", "-nvL", "--line-numbers", "-t", "filter"},
89
+	iptCmdSFilter4:        {"iptables", "-S", "-t", "filter"},
90
+	iptCmdSFilterForward4: {"iptables", "-S", "FORWARD"},
91
+	iptCmdSFilterDocker4:  {"iptables", "-S", "DOCKER"},
92
+	iptCmdLNat4:           {"iptables", "-nvL", "--line-numbers", "-t", "nat"},
93
+	iptCmdSNat4:           {"iptables", "-S", "-t", "nat"},
94
+}
95
+
96
+func TestBridgeIptablesDoc(t *testing.T) {
97
+	skip.If(t, testEnv.IsRootless)
98
+	ctx := setupTest(t)
99
+
100
+	// Get the full path for "bundles/TestBridgeIptablesDoc".
101
+	dest := os.Getenv("DOCKER_INTEGRATION_DAEMON_DEST")
102
+	if dest == "" {
103
+		dest = os.Getenv("DEST")
104
+	}
105
+	dest = filepath.Join(dest, t.Name())
106
+
107
+	// Set up an L3Segment, which will have a netns for each "section".
108
+	addr4 := netip.MustParseAddr("192.168.124.1")
109
+	addr6 := netip.MustParseAddr("fdc0:36dc:a4dd::1")
110
+	l3 := networking.NewL3Segment(t, "gen-iptables-doc",
111
+		netip.PrefixFrom(addr4, 24),
112
+		netip.PrefixFrom(addr6, 64),
113
+	)
114
+	t.Cleanup(func() { l3.Destroy(t) })
115
+
116
+	for i, sec := range index {
117
+		// Create a netns for this section.
118
+		addr4 = addr4.Next()
119
+		addr6 = addr6.Next()
120
+		hostname := fmt.Sprintf("docker%d", i)
121
+		l3.AddHost(t, hostname, hostname+"-host", "eth0",
122
+			netip.PrefixFrom(addr4, 24),
123
+			netip.PrefixFrom(addr6, 64),
124
+		)
125
+		host := l3.Hosts[hostname]
126
+		// Stop the interface, to reduce the chances of stray packets getting counted by iptables.
127
+		host.Run(t, "ip", "link", "set", "eth0", "down")
128
+
129
+		t.Run("gen_"+sec.name, func(t *testing.T) {
130
+			// t.Parallel() - doesn't speed things up, startup times just extend
131
+			runTestNet(t, testutil.StartSpan(ctx, t), dest, sec, host)
132
+		})
133
+	}
134
+}
135
+
136
+func runTestNet(t *testing.T, ctx context.Context, bundlesDir string, section section, host networking.Host) {
137
+	var dArgs []string
138
+	if section.noUserlandProxy {
139
+		dArgs = append(dArgs, "--userland-proxy=false")
140
+	}
141
+
142
+	// Start the daemon in its own network namespace.
143
+	var d *daemon.Daemon
144
+	host.Do(t, func() {
145
+		// Run without OTEL because there's no routing from this netns for it - which
146
+		// means the daemon doesn't shut down cleanly, causing the test to fail.
147
+		d = daemon.New(t, daemon.WithEnvVars("OTEL_EXPORTER_OTLP_ENDPOINT="))
148
+		d.StartWithBusybox(ctx, t, dArgs...)
149
+		t.Cleanup(func() { d.Stop(t) })
150
+	})
151
+
152
+	c := d.NewClientT(t)
153
+	t.Cleanup(func() { c.Close() })
154
+
155
+	assert.Assert(t, len(section.networks) < len(docNetworks), "Don't have enough container network addresses")
156
+	for i, nw := range section.networks {
157
+		gwMode := nw.gwMode
158
+		if gwMode == "" {
159
+			gwMode = "nat"
160
+		}
161
+		netOpts := []func(*networktypes.CreateOptions){
162
+			network.WithIPAM(docNetworks[i], docGateways[i]),
163
+			network.WithOption(bridge.BridgeName, nw.bridge),
164
+			network.WithOption(bridge.IPv4GatewayMode, gwMode),
165
+		}
166
+		if nw.noICC {
167
+			netOpts = append(netOpts, network.WithOption(bridge.EnableICC, "false"))
168
+		}
169
+		if nw.internal {
170
+			netOpts = append(netOpts, network.WithInternal())
171
+		}
172
+		network.CreateNoError(ctx, t, c, nw.bridge, netOpts...)
173
+		t.Cleanup(func() { network.RemoveNoError(ctx, t, c, nw.bridge) })
174
+
175
+		for _, ctr := range nw.containers {
176
+			var exposedPorts []string
177
+			for ep := range ctr.portMappings {
178
+				exposedPorts = append(exposedPorts, ep.Port()+"/"+ep.Proto())
179
+			}
180
+			id := container.Run(ctx, t, c,
181
+				container.WithNetworkMode(nw.bridge),
182
+				container.WithExposedPorts(exposedPorts...),
183
+				container.WithPortMap(ctr.portMappings),
184
+			)
185
+			t.Cleanup(func() {
186
+				c.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
187
+			})
188
+		}
189
+	}
190
+
191
+	iptablesOutput := runIptables(t, host)
192
+	generated := generate(t, section.name, iptablesOutput)
193
+
194
+	// Write the output to the 'bundles' directory for easy reference.
195
+	outFile := filepath.Join(bundlesDir, section.name)
196
+	err := os.WriteFile(outFile, []byte(generated), 0o644)
197
+	assert.NilError(t, err)
198
+	t.Log("Wrote ", outFile)
199
+
200
+	// Compare against "golden" results.
201
+	// Use full path so that the directory containing generated docs doesn't
202
+	// have to be called 'testdata'.
203
+	wd, err := os.Getwd()
204
+	assert.NilError(t, err)
205
+	golden.Assert(t, generated, filepath.Join(wd, "generated", section.name))
206
+}
207
+
208
+var rePacketByteCounts = regexp.MustCompile(`\d+ packets, \d+ bytes`)
209
+
210
+func runIptables(t *testing.T, host networking.Host) map[iptCmdType]string {
211
+	host.Run(t, "iptables", "-Z")
212
+	host.Run(t, "iptables", "-Z", "-t", "nat")
213
+	res := map[iptCmdType]string{}
214
+	for k, cmd := range iptCmds {
215
+		d := host.Run(t, cmd[0], cmd[1:]...)
216
+		// In CI, the OUTPUT chain sometimes sees a packet. Remove the counts.
217
+		d = rePacketByteCounts.ReplaceAllString(d, "0 packets, 0 bytes")
218
+		// Indent the result, so that it's treated as preformatted markdown.
219
+		res[k] = strings.ReplaceAll(d, "\n", "\n    ")
220
+	}
221
+	return res
222
+}
223
+
224
+func generate(t *testing.T, name string, data map[iptCmdType]string) string {
225
+	t.Helper()
226
+	templ, err := template.New(name).ParseFiles(filepath.Join("templates", name))
227
+	assert.NilError(t, err)
228
+	wr := strings.Builder{}
229
+	err = templ.ExecuteTemplate(&wr, name, data)
230
+	assert.NilError(t, err)
231
+	return wr.String()
232
+}
0 233
new file mode 100644
... ...
@@ -0,0 +1,56 @@
0
+package iptablesdoc // import "github.com/docker/docker/integration/network/bridge/iptablesdoc"
1
+
2
+import (
3
+	"context"
4
+	"os"
5
+	"testing"
6
+
7
+	"github.com/docker/docker/testutil"
8
+	"github.com/docker/docker/testutil/environment"
9
+	"go.opentelemetry.io/otel"
10
+	"go.opentelemetry.io/otel/codes"
11
+)
12
+
13
+var (
14
+	testEnv     *environment.Execution
15
+	baseContext context.Context
16
+)
17
+
18
+func TestMain(m *testing.M) {
19
+	shutdown := testutil.ConfigureTracing()
20
+	ctx, span := otel.Tracer("").Start(context.Background(), "integration/network/bridge/iptablesdoc.TestMain")
21
+	baseContext = ctx
22
+
23
+	var err error
24
+	testEnv, err = environment.New(ctx)
25
+	if err != nil {
26
+		span.SetStatus(codes.Error, err.Error())
27
+		span.End()
28
+		shutdown(ctx)
29
+		panic(err)
30
+	}
31
+
32
+	err = environment.EnsureFrozenImagesLinux(ctx, testEnv)
33
+	if err != nil {
34
+		span.SetStatus(codes.Error, err.Error())
35
+		span.End()
36
+		shutdown(ctx)
37
+		panic(err)
38
+	}
39
+
40
+	testEnv.Print()
41
+	code := m.Run()
42
+	if code != 0 {
43
+		span.SetStatus(codes.Error, "m.Run() returned non-zero exit code")
44
+	}
45
+	span.End()
46
+	shutdown(ctx)
47
+	os.Exit(code)
48
+}
49
+
50
+func setupTest(t *testing.T) context.Context {
51
+	ctx := testutil.StartSpan(baseContext, t)
52
+	environment.ProtectAll(ctx, t, testEnv)
53
+	t.Cleanup(func() { testEnv.Clean(ctx, t) })
54
+	return ctx
55
+}
0 56
new file mode 100644
... ...
@@ -0,0 +1,83 @@
0
+## iptables for a new Daemon
1
+
2
+When the daemon starts, it creates custom chains, and rules for the
3
+default bridge network.
4
+
5
+Table `filter`:
6
+
7
+    {{index . "LFilter4"}}
8
+
9
+<details>
10
+<summary>iptables commands</summary>
11
+
12
+    {{index . "SFilter4"}}
13
+
14
+</details>
15
+
16
+The FORWARD chain's policy shown above is ACCEPT. However:
17
+
18
+   - For IPv4, [setupIPForwarding][1] sets the POLICY to DROP if the sysctl
19
+     net.ipv4.ip_forward was not set to '1', and the daemon set it itself.
20
+   - For IPv6, the policy is always DROP.
21
+
22
+[1]: https://github.com/moby/moby/blob/cff4f20c44a3a7c882ed73934dec6a77246c6323/libnetwork/drivers/bridge/setup_ip_forwarding.go#L44
23
+
24
+The FORWARD chain rules are numbered in the output above, they are:
25
+
26
+  1. Unconditional jump to DOCKER-USER.
27
+     This is set up by libnetwork, in [setupUserChain][10].
28
+     Docker won't add rules to the DOCKER-USER chain, it's only for user-defined rules.
29
+     It's (mostly) kept at the top of the by deleting it and re-creating after each
30
+     new network is created, while traffic may be running for other networks.
31
+  2. Unconditional jump to DOCKER-ISOLATION-STAGE-1.
32
+     Set up during network creation by [setupIPTables][11], which ensures it appears
33
+     after the jump to DOCKER-USER (by deleting it and re-creating, while traffic
34
+     may be running for other networks).
35
+  3. ACCEPT RELATED,ESTABLISHED packets into a specific bridge network.
36
+     Allows responses to outgoing requests, and continuation of incoming requests,
37
+     without needing to process any further rules.
38
+     This rule is also added during network creation, but the code to do it
39
+     is in libnetwork, [ProgramChain][12].
40
+  4. Jump to DOCKER, for any packet destined for a bridge network. Added when
41
+     the network is created, in [ProgramChain][13] ("filterChain" is the DOCKER chain).
42
+     The DOCKER chain implements per-port/protocol filtering for each container.
43
+  5. ACCEPT any packet leaving a network, also set up when the network is created, in
44
+     [setupIPTablesInternal][14].
45
+  6. ACCEPT packets flowing between containers within a network, because by default
46
+     container isolation is disabled. Also set up when the network is created, in
47
+     [setIcc][15].
48
+
49
+[10]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/firewall_linux.go#L50
50
+[11]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L201
51
+[12]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/iptables/iptables.go#L270
52
+[13]: https://github.com/moby/moby/blob/e05848c0025b67a16aaafa8cdff95d5e2c064105/libnetwork/iptables/iptables.go#L251-L255
53
+[14]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L264
54
+[15]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L343
55
+
56
+_With ICC enabled 5 and 6 could be combined, to ACCEPT anything from the bridge.
57
+But, when ICC is disabled, rule 6 is DROP, so it would need to be placed before
58
+rule 5. Because the rules are generated in different places, that's a slightly
59
+bigger change than it should be._
60
+
61
+The DOCKER chain is empty, because there are no containers with port mappings yet.
62
+
63
+The DOCKER-ISOLATION chains implement inter-network isolation, all (unrelated)
64
+packets are processed by these chains. The rule are inserted at the head of the
65
+chain when a network is created, in [setINC][20].
66
+  - DOCKER-ISOLATION-STAGE-1 jumps to DOCKER-ISOLATION-STAGE-2 for any packet
67
+    routed to a docker network that has not come from that docker network.
68
+  - DOCKER-ISOLATION-STAGE-2 processes all packets leaving a bridge network,
69
+    packets that are destined for any other network are dropped.
70
+
71
+[20]: https://github.com/moby/moby/blob/333cfa640239153477bf635a8131734d0e9d099d/libnetwork/drivers/bridge/setup_ip_tables_linux.go#L369
72
+
73
+Table nat:
74
+
75
+    {{index . "LNat4"}}
76
+
77
+<details>
78
+<summary>iptables commands</summary>
79
+
80
+    {{index . "SNat4"}}
81
+
82
+</details>
... ...
@@ -16,7 +16,8 @@ import (
16 16
 // host lives in the current network namespace (eg. where dockerd runs).
17 17
 const CurrentNetns = ""
18 18
 
19
-func runCommand(t *testing.T, cmd string, args ...string) {
19
+func runCommand(t *testing.T, cmd string, args ...string) string {
20
+	t.Helper()
20 21
 	t.Log(strings.Join(append([]string{cmd}, args...), " "))
21 22
 
22 23
 	var b bytes.Buffer
... ...
@@ -28,6 +29,7 @@ func runCommand(t *testing.T, cmd string, args ...string) {
28 28
 		t.Log(b.String())
29 29
 		t.Fatalf("Error: %v", err)
30 30
 	}
31
+	return b.String()
31 32
 }
32 33
 
33 34
 // L3Segment simulates a switched, dual-stack capable network that
... ...
@@ -113,15 +115,16 @@ func newHost(t *testing.T, nsName, ifname string) Host {
113 113
 	}
114 114
 }
115 115
 
116
-// Run executes the provided command in the host's network namespace.
117
-func (h Host) Run(t *testing.T, cmd string, args ...string) {
116
+// Run executes the provided command in the host's network namespace
117
+// and returns its combined stdout/stderr.
118
+func (h Host) Run(t *testing.T, cmd string, args ...string) string {
118 119
 	t.Helper()
119 120
 
120 121
 	if h.ns != CurrentNetns {
121 122
 		args = append([]string{"netns", "exec", h.ns, cmd}, args...)
122 123
 		cmd = "ip"
123 124
 	}
124
-	runCommand(t, cmd, args...)
125
+	return runCommand(t, cmd, args...)
125 126
 }
126 127
 
127 128
 // Do run the provided function in the host's network namespace.