Browse code

initial work to enable systemd service running

During the PTG there was a discussion that the screen developer
workflow wasn't nearly as useful as it once was. There were now too
many services to see them all on one screen, and one of the most
common service restart scenarios was not restarting one service, but a
bunch to get code to take effect.

This implements a 3rd way of running services instead of direct
forking via bash, or running under screen, which is running as systemd
units.

Logging is adjusted because it's redundant to log datetime in oslo.log
when journald has that.

Swift needed to have services launched by absolute path to work.

This is disabled by default, but with instructions on using it. The
long term intent is to make this the way to run devstack, which would
be the same between both the gate and local use.

Some changes were also needed to run_process to pass the run User
in. A hack around the keystone uwsgi launcher was done at the same
time to remove a run_process feature that only keystone uwsgi uses.

Change-Id: I836bf27c4cfdc449628aa7641fb96a5489d5d4e7

Sean Dague authored on 2017/03/22 09:50:24
Showing 7 changed files
1 1
new file mode 100644
... ...
@@ -0,0 +1,177 @@
0
+===========================
1
+ Using Systemd in DevStack
2
+===========================
3
+
4
+.. note::
5
+
6
+   This is an in progress document as we work out the way forward here
7
+   with DevStack and systemd.
8
+
9
+DevStack can be run with all the services as systemd unit
10
+files. Systemd is now the default init system for nearly every Linux
11
+distro, and systemd encodes and solves many of the problems related to
12
+poorly running processes.
13
+
14
+Why this instead of screen?
15
+===========================
16
+
17
+The screen model for DevStack was invented when the number of services
18
+that a DevStack user was going to run was typically < 10. This made
19
+screen hot keys to jump around very easy. However, the landscape has
20
+changed (not all services are stoppable in screen as some are under
21
+Apache, there are typically at least 20 items)
22
+
23
+There is also a common developer workflow of changing code in more
24
+than one service, and needing to restart a bunch of services for that
25
+to take effect.
26
+
27
+To enable this add the following to your local.conf::
28
+
29
+  USE_SYSTEMD=True
30
+
31
+
32
+
33
+Unit Structure
34
+==============
35
+
36
+.. note::
37
+
38
+   Originally we actually wanted to do this as user units, however
39
+   there are issues with running this under non interactive
40
+   shells. For now, we'll be running as system units. Some user unit
41
+   code is left in place in case we can switch back later.
42
+
43
+All DevStack user units are created as a part of the DevStack slice
44
+given the name ``devstack@$servicename.service``. This lets us do
45
+certain operations at the slice level.
46
+
47
+Manipulating Units
48
+==================
49
+
50
+Assuming the unit ``n-cpu`` to make the examples more clear.
51
+
52
+Enable a unit (allows it to be started)::
53
+
54
+  sudo systemctl enable devstack@n-cpu.service
55
+
56
+Disable a unit::
57
+
58
+  sudo systemctl disable devstack@n-cpu.service
59
+
60
+Start a unit::
61
+
62
+  sudo systemctl start devstack@n-cpu.service
63
+
64
+Stop a unit::
65
+
66
+  sudo systemctl stop devstack@n-cpu.service
67
+
68
+Restart a unit::
69
+
70
+  sudo systemctl restart devstack@n-cpu.service
71
+
72
+See status of a unit::
73
+
74
+  sudo systemctl status devstack@n-cpu.service
75
+
76
+
77
+Querying Logs
78
+=============
79
+
80
+One of the other major things that comes with systemd is journald, a
81
+consolidated way to access logs (including querying through structured
82
+metadata). This is accessed by the user via ``journalctl`` command.
83
+
84
+
85
+Logs can be accessed through ``journalctl``. journalctl has powerful
86
+query facilities. We'll start with some common options.
87
+
88
+Follow logs for a specific service::
89
+
90
+  journalctl -f --unit devstack@n-cpu.service
91
+
92
+Following logs for multiple services simultaneously::
93
+
94
+  journalctl -f --unit devstack@n-cpu.service --user-unit
95
+  devstack@n-cond.service
96
+
97
+Use higher precision time stamps::
98
+
99
+  journalctl -f -o short-precise --unit devstack@n-cpu.service
100
+
101
+
102
+Known Issues
103
+============
104
+
105
+Be careful about systemd python libraries. There are 3 of them on
106
+pypi, and they are all very different. They unfortunately all install
107
+into the ``systemd`` namespace, which can cause some issues.
108
+
109
+- ``systemd-python`` - this is the upstream maintained library, it has
110
+  a version number like systemd itself (currently ``233``). This is
111
+  the one you want.
112
+- ``systemd`` - a python 3 only library, not what you want.
113
+- ``python-systemd`` - another library you don't want. Installing it
114
+  on a system will break ansible's ability to run.
115
+
116
+
117
+If we were using user units, the ``[Service]`` - ``Group=`` parameter
118
+doesn't seem to work with user units, even though the documentation
119
+says that it should. This means that we will need to do an explicit
120
+``/usr/bin/sg``. This has the downside of making the SYSLOG_IDENTIFIER
121
+be ``sg``. We can explicitly set that with ``SyslogIdentifier=``, but
122
+it's really unfortunate that we're going to need this work
123
+around. This is currently not a problem because we're only using
124
+system units.
125
+
126
+Future Work
127
+===========
128
+
129
+oslo.log journald
130
+-----------------
131
+
132
+Journald has an extremely rich mechanism for direct logging including
133
+structured metadata. We should enhance oslo.log to take advantage of
134
+that. It would let us do things like::
135
+
136
+  journalctl REQUEST_ID=......
137
+
138
+  journalctl INSTANCE_ID=......
139
+
140
+And get all lines related to the request id or instance id.
141
+
142
+sub targets/slices
143
+------------------
144
+
145
+We might want to create per project slices so that it's easy to
146
+follow, restart all services of a single project (like swift) without
147
+impacting other services.
148
+
149
+log colorizing
150
+--------------
151
+
152
+We lose log colorization through this process. We might want to build
153
+a custom colorizer that we could run journalctl output through
154
+optionally for people.
155
+
156
+user units
157
+----------
158
+
159
+It would be great if we could do services as user units, so that there
160
+is a clear separation of code being run as not root, to ensure running
161
+as root never accidentally gets baked in as an assumption to
162
+services. However, user units interact poorly with devstack-gate and
163
+the way that commands are run as users with ansible and su.
164
+
165
+Maybe someday we can figure that out.
166
+
167
+References
168
+==========
169
+
170
+- Arch Linux Wiki - https://wiki.archlinux.org/index.php/Systemd/User
171
+- Python interface to journald -
172
+  https://www.freedesktop.org/software/systemd/python-systemd/journal.html
173
+- Systemd documentation on service files -
174
+  https://www.freedesktop.org/software/systemd/man/systemd.service.html
175
+- Systemd documentation on exec (can be used to impact service runs) -
176
+  https://www.freedesktop.org/software/systemd/man/systemd.exec.html
... ...
@@ -575,7 +575,9 @@ function vercmp {
575 575
 function setup_logging {
576 576
     local conf_file=$1
577 577
     local other_cond=${2:-"False"}
578
-    if [ "$LOG_COLOR" == "True" ] && [ "$SYSLOG" == "False" ] && [ "$other_cond" == "False" ]; then
578
+    if [[ "$USE_SYSTEMD" == "True" ]]; then
579
+        setup_systemd_logging $conf_file
580
+    elif [ "$LOG_COLOR" == "True" ] && [ "$SYSLOG" == "False" ] && [ "$other_cond" == "False" ]; then
579 581
         setup_colorized_logging $conf_file
580 582
     else
581 583
         setup_standard_logging_identity $conf_file
... ...
@@ -601,6 +603,17 @@ function setup_colorized_logging {
601 601
     iniset $conf_file $conf_section logging_exception_prefix "%(color)s%(asctime)s.%(msecs)03d TRACE %(name)s %(instance)s"
602 602
 }
603 603
 
604
+function setup_systemd_logging {
605
+    local conf_file=$1
606
+    local conf_section="DEFAULT"
607
+    local project_var="project_name"
608
+    local user_var="user_name"
609
+    iniset $conf_file $conf_section logging_context_format_string "%(levelname)s %(name)s [%(request_id)s %("$project_var")s %("$user_var")s] %(instance)s%(message)s"
610
+    iniset $conf_file $conf_section logging_default_format_string "%(levelname)s %(name)s [-] %(instance)s%(color)s%(message)s"
611
+    iniset $conf_file $conf_section logging_debug_format_suffix "from (pid=%(process)d) %(funcName)s %(pathname)s:%(lineno)d"
612
+    iniset $conf_file $conf_section logging_exception_prefix "ERROR %(name)s %(instance)s"
613
+}
614
+
604 615
 function setup_standard_logging_identity {
605 616
     local conf_file=$1
606 617
     iniset $conf_file DEFAULT logging_user_identity_format "%(project_name)s %(user_name)s"
... ...
@@ -1443,6 +1443,59 @@ function _run_process {
1443 1443
     exit 0
1444 1444
 }
1445 1445
 
1446
+function write_user_unit_file {
1447
+    local service=$1
1448
+    local command="$2"
1449
+    local group=$3
1450
+    local user=$4
1451
+    local extra=""
1452
+    if [[ -n "$group" ]]; then
1453
+        extra="Group=$group"
1454
+    fi
1455
+    local unitfile="$SYSTEMD_DIR/$service"
1456
+    mkdir -p $SYSTEMD_DIR
1457
+
1458
+    iniset -sudo $unitfile "Unit" "Description" "Devstack $service"
1459
+    iniset -sudo $unitfile "Service" "User" "$user"
1460
+    iniset -sudo $unitfile "Service" "ExecStart" "$command"
1461
+    if [[ -n "$group" ]]; then
1462
+        iniset -sudo $unitfile "Service" "Group" "$group"
1463
+    fi
1464
+    iniset -sudo $unitfile "Install" "WantedBy" "multi-user.target"
1465
+
1466
+    # changes to existing units sometimes need a refresh
1467
+    $SYSTEMCTL daemon-reload
1468
+}
1469
+
1470
+function _run_under_systemd {
1471
+    local service=$1
1472
+    local command="$2"
1473
+    local cmd=$command
1474
+    local systemd_service="devstack@$service.service"
1475
+    local group=$3
1476
+    local user=${4:-$STACK_USER}
1477
+    write_user_unit_file $systemd_service "$cmd" "$group" "$user"
1478
+
1479
+    $SYSTEMCTL enable $systemd_service
1480
+    $SYSTEMCTL start $systemd_service
1481
+    _journal_log $service $systemd_service
1482
+}
1483
+
1484
+function _journal_log {
1485
+    local service=$1
1486
+    local unit=$2
1487
+    local logfile="${service}.log.${CURRENT_LOG_TIME}"
1488
+    local real_logfile="${LOGDIR}/${logfile}"
1489
+    if [[ -n ${LOGDIR} ]]; then
1490
+        $JOURNALCTL_F $2 > "$real_logfile" &
1491
+        bash -c "cd '$LOGDIR' && ln -sf '$logfile' ${service}.log"
1492
+        if [[ -n ${SCREEN_LOGDIR} ]]; then
1493
+            # Drop the backward-compat symlink
1494
+            ln -sf "$real_logfile" ${SCREEN_LOGDIR}/screen-${service}.log
1495
+        fi
1496
+    fi
1497
+}
1498
+
1446 1499
 # Helper to remove the ``*.failure`` files under ``$SERVICE_DIR/$SCREEN_NAME``.
1447 1500
 # This is used for ``service_check`` when all the ``screen_it`` are called finished
1448 1501
 # Uses globals ``SCREEN_NAME``, ``SERVICE_DIR``
... ...
@@ -1478,16 +1531,24 @@ function run_process {
1478 1478
     local service=$1
1479 1479
     local command="$2"
1480 1480
     local group=$3
1481
-    local subservice=$4
1481
+    local user=$4
1482 1482
 
1483
-    local name=${subservice:-$service}
1483
+    local name=$service
1484 1484
 
1485 1485
     time_start "run_process"
1486 1486
     if is_service_enabled $service; then
1487
-        if [[ "$USE_SCREEN" = "True" ]]; then
1487
+        if [[ "$USE_SYSTEMD" = "True" ]]; then
1488
+            _run_under_systemd "$name" "$command" "$group" "$user"
1489
+        elif [[ "$USE_SCREEN" = "True" ]]; then
1490
+            if [[ "$user" == "root" ]]; then
1491
+                command="sudo $command"
1492
+            fi
1488 1493
             screen_process "$name" "$command" "$group"
1489 1494
         else
1490 1495
             # Spawn directly without screen
1496
+            if [[ "$user" == "root" ]]; then
1497
+                command="sudo $command"
1498
+            fi
1491 1499
             _run_process "$name" "$command" "$group" &
1492 1500
         fi
1493 1501
     fi
... ...
@@ -1618,6 +1679,14 @@ function stop_process {
1618 1618
 
1619 1619
     if is_service_enabled $service; then
1620 1620
         # Kill via pid if we have one available
1621
+        if [[ "$USE_SYSTEMD" == "True" ]]; then
1622
+            # Only do this for units which appear enabled, this also
1623
+            # catches units that don't really exist for cases like
1624
+            # keystone without a failure.
1625
+            $SYSTEMCTL stop devstack@$service.service
1626
+            $SYSTEMCTL disable devstack@$service.service
1627
+        fi
1628
+
1621 1629
         if [[ -r $SERVICE_DIR/$SCREEN_NAME/$service.pid ]]; then
1622 1630
             pkill -g $(cat $SERVICE_DIR/$SCREEN_NAME/$service.pid)
1623 1631
             # oslo.service tends to stop actually shutting down
... ...
@@ -24,12 +24,12 @@ function start_dstat {
24 24
     # To enable memory_tracker add:
25 25
     #    enable_service memory_tracker
26 26
     # to your localrc
27
-    run_process memory_tracker "sudo $TOP_DIR/tools/memory_tracker.sh"
27
+    run_process memory_tracker "$TOP_DIR/tools/memory_tracker.sh" "" "root"
28 28
 
29 29
     # remove support for the old name when it's no longer used (sometime in Queens)
30 30
     if is_service_enabled peakmem_tracker; then
31 31
         deprecated "Use of peakmem_tracker in devstack is deprecated, use memory_tracker instead"
32
-        run_process peakmem_tracker "sudo $TOP_DIR/tools/memory_tracker.sh"
32
+        run_process peakmem_tracker "$TOP_DIR/tools/memory_tracker.sh" "" "root"
33 33
     fi
34 34
 }
35 35
 
... ...
@@ -602,8 +602,11 @@ function start_keystone {
602 602
         tail_log key /var/log/$APACHE_NAME/keystone.log
603 603
         tail_log key-access /var/log/$APACHE_NAME/keystone_access.log
604 604
     else # uwsgi
605
-        run_process key "$KEYSTONE_BIN_DIR/uwsgi $KEYSTONE_PUBLIC_UWSGI_FILE" "" "key-p"
606
-        run_process key "$KEYSTONE_BIN_DIR/uwsgi $KEYSTONE_ADMIN_UWSGI_FILE" "" "key-a"
605
+        # TODO(sdague): we should really get down to a single keystone here
606
+        enable_service key-p
607
+        enable_service key-a
608
+        run_process key-p "$KEYSTONE_BIN_DIR/uwsgi $KEYSTONE_PUBLIC_UWSGI_FILE" ""
609
+        run_process key-a "$KEYSTONE_BIN_DIR/uwsgi $KEYSTONE_ADMIN_UWSGI_FILE" ""
607 610
     fi
608 611
 
609 612
     echo "Waiting for keystone to start..."
... ...
@@ -38,6 +38,15 @@ fi
38 38
 # Set up default directories
39 39
 GITDIR["python-swiftclient"]=$DEST/python-swiftclient
40 40
 
41
+# Swift virtual environment
42
+if [[ ${USE_VENV} = True ]]; then
43
+    PROJECT_VENV["swift"]=${SWIFT_DIR}.venv
44
+    SWIFT_BIN_DIR=${PROJECT_VENV["swift"]}/bin
45
+else
46
+    SWIFT_BIN_DIR=$(get_python_exec_prefix)
47
+fi
48
+
49
+
41 50
 SWIFT_DIR=$DEST/swift
42 51
 SWIFT_AUTH_CACHE_DIR=${SWIFT_AUTH_CACHE_DIR:-/var/cache/swift}
43 52
 SWIFT_APACHE_WSGI_DIR=${SWIFT_APACHE_WSGI_DIR:-/var/www/swift}
... ...
@@ -807,10 +816,10 @@ function start_swift {
807 807
         local proxy_port=${SWIFT_DEFAULT_BIND_PORT}
808 808
         start_tls_proxy swift '*' $proxy_port $SERVICE_HOST $SWIFT_DEFAULT_BIND_PORT_INT
809 809
     fi
810
-    run_process s-proxy "swift-proxy-server ${SWIFT_CONF_DIR}/proxy-server.conf -v"
810
+    run_process s-proxy "$SWIFT_BIN_DIR/swift-proxy-server ${SWIFT_CONF_DIR}/proxy-server.conf -v"
811 811
     if [[ ${SWIFT_REPLICAS} == 1 ]]; then
812 812
         for type in object container account; do
813
-            run_process s-${type} "swift-${type}-server ${SWIFT_CONF_DIR}/${type}-server/1.conf -v"
813
+            run_process s-${type} "$SWIFT_BIN_DIR/swift-${type}-server ${SWIFT_CONF_DIR}/${type}-server/1.conf -v"
814 814
         done
815 815
     fi
816 816
 
... ...
@@ -87,6 +87,23 @@ HORIZON_APACHE_ROOT="/dashboard"
87 87
 # be disabled for automated testing by setting this value to False.
88 88
 USE_SCREEN=$(trueorfalse True USE_SCREEN)
89 89
 
90
+# Whether to use SYSTEMD to manage services
91
+USE_SYSTEMD=$(trueorfalse False USE_SYSTEMD)
92
+USER_UNITS=$(trueorfalse False USER_UNITS)
93
+if [[ "$USER_UNITS" == "True" ]]; then
94
+    SYSTEMD_DIR="$HOME/.local/share/systemd/user"
95
+    SYSTEMCTL="systemctl --user"
96
+    JOURNALCTL_F="journalctl -f -o short-precise --user-unit"
97
+else
98
+    SYSTEMD_DIR="/etc/systemd/system"
99
+    SYSTEMCTL="sudo systemctl"
100
+    JOURNALCTL_F="journalctl -f -o short-precise --unit"
101
+fi
102
+
103
+if [[ "$USE_SYSTEMD" == "True" ]]; then
104
+    USE_SCREEN=False
105
+fi
106
+
90 107
 # When using screen, should we keep a log file on disk?  You might
91 108
 # want this False if you have a long-running setup where verbose logs
92 109
 # can fill-up the host.