Browse code

create a troubleshooting guide

Ben Parees authored on 2014/11/25 09:05:57
Showing 1 changed files
1 1
new file mode 100644
... ...
@@ -0,0 +1,104 @@
0
+Troubleshooting
1
+=================
2
+
3
+This document contains some tips and suggestions for troubleshooting an OpenShift v3 deployment.
4
+
5
+System Environment
6
+------------------
7
+
8
+1. Run as root
9
+
10
+   Currently OpenShift v3 must be started as root in order to manipulate your iptables configuration.  The openshift commands (e.g. `openshift kube apply` and `openshift kubectl apply`) do not need to be run as root.
11
+
12
+1. Properly configure or disable firewalld
13
+
14
+   On Fedora or other distributions using firewalld: Add docker0 to the public zone
15
+
16
+        $ firewall-cmd --zone=trusted --change-interface=docker0
17
+        $ systemctl restart firewalld
18
+
19
+    Alternatively you can disable it via:
20
+    
21
+        $ systemctl stop firewalld
22
+        
23
+1. Disable selinux  
24
+
25
+    Eventually this will not be necessary but we are currently focused on features and will be revisiting selinux policies in the future.
26
+
27
+        $ setenforce 0
28
+        
29
+
30
+Build Failures
31
+--------------
32
+
33
+To investigate a build failure, first check the build logs.  You can view the build logs via
34
+
35
+    $ openshift kube buildLogs --id=[buildid]
36
+        
37
+and you can get the build id via:
38
+
39
+    $ openshift kube list builds
40
+
41
+the build id is in the first column.
42
+
43
+If you're unable to retrieve the logs in this way, you can also get them directly from docker.  First you need to find the docker container that ran your build:
44
+
45
+    $ docker ps -a | grep builder
46
+
47
+The most recent container in that list should be the one that ran your build.  The container id is the first column.  You can then run:
48
+
49
+    $ docker logs [container id]
50
+        
51
+Hopefully the logs will provide some indication of what it failed (e.g. failure to find the source repository, an actual build issue, failure to push the resulting image to the docker registry, etc).
52
+
53
+Docker Registry
54
+---------------
55
+
56
+Most of the v3 flows today assume you are running a docker registry pod.  You should ensure that this local registry is running:
57
+
58
+    $ openshift kube list services | grep registry
59
+
60
+If it's not running, you can launch it via:
61
+
62
+    $ openshift kube apply -c examples/sample-app/docker-registry-config.json
63
+
64
+In addition, confirm the IP and Port reported in the services list matches the registry references in your configuration json (e.g. any image tags that contain a registry hostname).  
65
+
66
+Probing Containers
67
+------------------
68
+
69
+In general you may want to investigate a particular container.  You can either gather the logs from a container via `docker logs [container id]` or use `docker exec -it [container id] /bin/sh` to enter the container's namespace and poke around.
70
+
71
+
72
+Benign Errors/Messages
73
+----------------------
74
+
75
+There are a number of suspicious looking messages that appear in the openshift log output which can normally be ignored:
76
+
77
+1. Failed to find an IP for pod (benign as long as it does not continuously repeat)
78
+
79
+        E1125 14:51:49.665095 04523 endpoints_controller.go:74] Failed to find an IP for pod: {{ } {7e5769d2-74dc-11e4-bc62-3c970e3bf0b7 default /api/v1beta1/pods/7e5769d2-74dc-11e4-bc62-3c970e3bf0b7  41 2014-11-25 14:51:48 -0500 EST map[template:ruby-helloworld-sample deployment:database-1 deploymentconfig:database name:database] map[]} {{v1beta1 7e5769d2-74dc-11e4-bc62-3c970e3bf0b7 7e5769d2-74dc-11e4-bc62-3c970e3bf0b7 [] [{ruby-helloworld-database mysql []  [{ 0 3306 TCP }] [{MYSQL_ROOT_PASSWORD rrKAcyW6} {MYSQL_DATABASE root}] 0 0 [] <nil> <nil>  false }] {0x1654910 <nil> <nil>}} Running localhost.localdomain   map[]} {{   [] [] {<nil> <nil> <nil>}} Pending localhost.localdomain   map[]} map[]}
80
+
81
+1. Proxy connection reset 
82
+
83
+        E1125 14:52:36.605423 04523 proxier.go:131] I/O error: read tcp 10.192.208.170:57472: connection reset by peer
84
+
85
+1. No network settings
86
+
87
+        W1125 14:53:10.035539 04523 rest.go:231] No network settings: api.ContainerStatus{State:api.ContainerState{Waiting:(*api.ContainerStateWaiting)(0xc208b29b40), Running:(*api.ContainerStateRunning)(nil), Termination:(*api.ContainerStateTerminated)(nil)}, RestartCount:0, PodIP:"", Image:"kubernetes/pause:latest"}
88
+
89
+Must Gather
90
+-----------
91
+If you find yourself still stuck, before seeking help in #openshift on freenode.net, please recreate your issue with verbose logging and gather the following:
92
+
93
+1. OpenShift logs at level 4 (verbose logging):
94
+
95
+        $ openshift start --loglevel=4 &> /tmp/openshift.log
96
+        
97
+1. Container logs  
98
+    
99
+    The following bit of scripting will pull logs for **all** containers that have been run on your system.  This might be excessive if you don't keep a clean history, so consider manually grabbing logs for the relevant containers instead:
100
+
101
+        for container in $(docker ps -aq); do
102
+            docker logs $container >& $LOG_DIR/container-$container.log
103
+        done