Browse code

Merge "initial work to enable systemd service running"

Jenkins authored on 2017/03/29 03:41:38
Showing 7 changed files
1 1
new file mode 100644
... ...
@@ -0,0 +1,177 @@
0
+===========================
1
+ Using Systemd in DevStack
2
+===========================
3
+
4
+.. note::
5
+
6
+   This is an in progress document as we work out the way forward here
7
+   with DevStack and systemd.
8
+
9
+DevStack can be run with all the services as systemd unit
10
+files. Systemd is now the default init system for nearly every Linux
11
+distro, and systemd encodes and solves many of the problems related to
12
+poorly running processes.
13
+
14
+Why this instead of screen?
15
+===========================
16
+
17
+The screen model for DevStack was invented when the number of services
18
+that a DevStack user was going to run was typically < 10. This made
19
+screen hot keys to jump around very easy. However, the landscape has
20
+changed (not all services are stoppable in screen as some are under
21
+Apache, there are typically at least 20 items)
22
+
23
+There is also a common developer workflow of changing code in more
24
+than one service, and needing to restart a bunch of services for that
25
+to take effect.
26
+
27
+To enable this add the following to your local.conf::
28
+
29
+  USE_SYSTEMD=True
30
+
31
+
32
+
33
+Unit Structure
34
+==============
35
+
36
+.. note::
37
+
38
+   Originally we actually wanted to do this as user units, however
39
+   there are issues with running this under non interactive
40
+   shells. For now, we'll be running as system units. Some user unit
41
+   code is left in place in case we can switch back later.
42
+
43
+All DevStack user units are created as a part of the DevStack slice
44
+given the name ``devstack@$servicename.service``. This lets us do
45
+certain operations at the slice level.
46
+
47
+Manipulating Units
48
+==================
49
+
50
+Assuming the unit ``n-cpu`` to make the examples more clear.
51
+
52
+Enable a unit (allows it to be started)::
53
+
54
+  sudo systemctl enable devstack@n-cpu.service
55
+
56
+Disable a unit::
57
+
58
+  sudo systemctl disable devstack@n-cpu.service
59
+
60
+Start a unit::
61
+
62
+  sudo systemctl start devstack@n-cpu.service
63
+
64
+Stop a unit::
65
+
66
+  sudo systemctl stop devstack@n-cpu.service
67
+
68
+Restart a unit::
69
+
70
+  sudo systemctl restart devstack@n-cpu.service
71
+
72
+See status of a unit::
73
+
74
+  sudo systemctl status devstack@n-cpu.service
75
+
76
+
77
+Querying Logs
78
+=============
79
+
80
+One of the other major things that comes with systemd is journald, a
81
+consolidated way to access logs (including querying through structured
82
+metadata). This is accessed by the user via ``journalctl`` command.
83
+
84
+
85
+Logs can be accessed through ``journalctl``. journalctl has powerful
86
+query facilities. We'll start with some common options.
87
+
88
+Follow logs for a specific service::
89
+
90
+  journalctl -f --unit devstack@n-cpu.service
91
+
92
+Following logs for multiple services simultaneously::
93
+
94
+  journalctl -f --unit devstack@n-cpu.service --user-unit
95
+  devstack@n-cond.service
96
+
97
+Use higher precision time stamps::
98
+
99
+  journalctl -f -o short-precise --unit devstack@n-cpu.service
100
+
101
+
102
+Known Issues
103
+============
104
+
105
+Be careful about systemd python libraries. There are 3 of them on
106
+pypi, and they are all very different. They unfortunately all install
107
+into the ``systemd`` namespace, which can cause some issues.
108
+
109
+- ``systemd-python`` - this is the upstream maintained library, it has
110
+  a version number like systemd itself (currently ``233``). This is
111
+  the one you want.
112
+- ``systemd`` - a python 3 only library, not what you want.
113
+- ``python-systemd`` - another library you don't want. Installing it
114
+  on a system will break ansible's ability to run.
115
+
116
+
117
+If we were using user units, the ``[Service]`` - ``Group=`` parameter
118
+doesn't seem to work with user units, even though the documentation
119
+says that it should. This means that we will need to do an explicit
120
+``/usr/bin/sg``. This has the downside of making the SYSLOG_IDENTIFIER
121
+be ``sg``. We can explicitly set that with ``SyslogIdentifier=``, but
122
+it's really unfortunate that we're going to need this work
123
+around. This is currently not a problem because we're only using
124
+system units.
125
+
126
+Future Work
127
+===========
128
+
129
+oslo.log journald
130
+-----------------
131
+
132
+Journald has an extremely rich mechanism for direct logging including
133
+structured metadata. We should enhance oslo.log to take advantage of
134
+that. It would let us do things like::
135
+
136
+  journalctl REQUEST_ID=......
137
+
138
+  journalctl INSTANCE_ID=......
139
+
140
+And get all lines related to the request id or instance id.
141
+
142
+sub targets/slices
143
+------------------
144
+
145
+We might want to create per project slices so that it's easy to
146
+follow, restart all services of a single project (like swift) without
147
+impacting other services.
148
+
149
+log colorizing
150
+--------------
151
+
152
+We lose log colorization through this process. We might want to build
153
+a custom colorizer that we could run journalctl output through
154
+optionally for people.
155
+
156
+user units
157
+----------
158
+
159
+It would be great if we could do services as user units, so that there
160
+is a clear separation of code being run as not root, to ensure running
161
+as root never accidentally gets baked in as an assumption to
162
+services. However, user units interact poorly with devstack-gate and
163
+the way that commands are run as users with ansible and su.
164
+
165
+Maybe someday we can figure that out.
166
+
167
+References
168
+==========
169
+
170
+- Arch Linux Wiki - https://wiki.archlinux.org/index.php/Systemd/User
171
+- Python interface to journald -
172
+  https://www.freedesktop.org/software/systemd/python-systemd/journal.html
173
+- Systemd documentation on service files -
174
+  https://www.freedesktop.org/software/systemd/man/systemd.service.html
175
+- Systemd documentation on exec (can be used to impact service runs) -
176
+  https://www.freedesktop.org/software/systemd/man/systemd.exec.html
... ...
@@ -575,7 +575,9 @@ function vercmp {
575 575
 function setup_logging {
576 576
     local conf_file=$1
577 577
     local other_cond=${2:-"False"}
578
-    if [ "$LOG_COLOR" == "True" ] && [ "$SYSLOG" == "False" ] && [ "$other_cond" == "False" ]; then
578
+    if [[ "$USE_SYSTEMD" == "True" ]]; then
579
+        setup_systemd_logging $conf_file
580
+    elif [ "$LOG_COLOR" == "True" ] && [ "$SYSLOG" == "False" ] && [ "$other_cond" == "False" ]; then
579 581
         setup_colorized_logging $conf_file
580 582
     else
581 583
         setup_standard_logging_identity $conf_file
... ...
@@ -601,6 +603,17 @@ function setup_colorized_logging {
601 601
     iniset $conf_file $conf_section logging_exception_prefix "%(color)s%(asctime)s.%(msecs)03d TRACE %(name)s %(instance)s"
602 602
 }
603 603
 
604
+function setup_systemd_logging {
605
+    local conf_file=$1
606
+    local conf_section="DEFAULT"
607
+    local project_var="project_name"
608
+    local user_var="user_name"
609
+    iniset $conf_file $conf_section logging_context_format_string "%(levelname)s %(name)s [%(request_id)s %("$project_var")s %("$user_var")s] %(instance)s%(message)s"
610
+    iniset $conf_file $conf_section logging_default_format_string "%(levelname)s %(name)s [-] %(instance)s%(color)s%(message)s"
611
+    iniset $conf_file $conf_section logging_debug_format_suffix "from (pid=%(process)d) %(funcName)s %(pathname)s:%(lineno)d"
612
+    iniset $conf_file $conf_section logging_exception_prefix "ERROR %(name)s %(instance)s"
613
+}
614
+
604 615
 function setup_standard_logging_identity {
605 616
     local conf_file=$1
606 617
     iniset $conf_file DEFAULT logging_user_identity_format "%(project_name)s %(user_name)s"
... ...
@@ -1443,6 +1443,59 @@ function _run_process {
1443 1443
     exit 0
1444 1444
 }
1445 1445
 
1446
+function write_user_unit_file {
1447
+    local service=$1
1448
+    local command="$2"
1449
+    local group=$3
1450
+    local user=$4
1451
+    local extra=""
1452
+    if [[ -n "$group" ]]; then
1453
+        extra="Group=$group"
1454
+    fi
1455
+    local unitfile="$SYSTEMD_DIR/$service"
1456
+    mkdir -p $SYSTEMD_DIR
1457
+
1458
+    iniset -sudo $unitfile "Unit" "Description" "Devstack $service"
1459
+    iniset -sudo $unitfile "Service" "User" "$user"
1460
+    iniset -sudo $unitfile "Service" "ExecStart" "$command"
1461
+    if [[ -n "$group" ]]; then
1462
+        iniset -sudo $unitfile "Service" "Group" "$group"
1463
+    fi
1464
+    iniset -sudo $unitfile "Install" "WantedBy" "multi-user.target"
1465
+
1466
+    # changes to existing units sometimes need a refresh
1467
+    $SYSTEMCTL daemon-reload
1468
+}
1469
+
1470
+function _run_under_systemd {
1471
+    local service=$1
1472
+    local command="$2"
1473
+    local cmd=$command
1474
+    local systemd_service="devstack@$service.service"
1475
+    local group=$3
1476
+    local user=${4:-$STACK_USER}
1477
+    write_user_unit_file $systemd_service "$cmd" "$group" "$user"
1478
+
1479
+    $SYSTEMCTL enable $systemd_service
1480
+    $SYSTEMCTL start $systemd_service
1481
+    _journal_log $service $systemd_service
1482
+}
1483
+
1484
+function _journal_log {
1485
+    local service=$1
1486
+    local unit=$2
1487
+    local logfile="${service}.log.${CURRENT_LOG_TIME}"
1488
+    local real_logfile="${LOGDIR}/${logfile}"
1489
+    if [[ -n ${LOGDIR} ]]; then
1490
+        $JOURNALCTL_F $2 > "$real_logfile" &
1491
+        bash -c "cd '$LOGDIR' && ln -sf '$logfile' ${service}.log"
1492
+        if [[ -n ${SCREEN_LOGDIR} ]]; then
1493
+            # Drop the backward-compat symlink
1494
+            ln -sf "$real_logfile" ${SCREEN_LOGDIR}/screen-${service}.log
1495
+        fi
1496
+    fi
1497
+}
1498
+
1446 1499
 # Helper to remove the ``*.failure`` files under ``$SERVICE_DIR/$SCREEN_NAME``.
1447 1500
 # This is used for ``service_check`` when all the ``screen_it`` are called finished
1448 1501
 # Uses globals ``SCREEN_NAME``, ``SERVICE_DIR``
... ...
@@ -1478,16 +1531,24 @@ function run_process {
1478 1478
     local service=$1
1479 1479
     local command="$2"
1480 1480
     local group=$3
1481
-    local subservice=$4
1481
+    local user=$4
1482 1482
 
1483
-    local name=${subservice:-$service}
1483
+    local name=$service
1484 1484
 
1485 1485
     time_start "run_process"
1486 1486
     if is_service_enabled $service; then
1487
-        if [[ "$USE_SCREEN" = "True" ]]; then
1487
+        if [[ "$USE_SYSTEMD" = "True" ]]; then
1488
+            _run_under_systemd "$name" "$command" "$group" "$user"
1489
+        elif [[ "$USE_SCREEN" = "True" ]]; then
1490
+            if [[ "$user" == "root" ]]; then
1491
+                command="sudo $command"
1492
+            fi
1488 1493
             screen_process "$name" "$command" "$group"
1489 1494
         else
1490 1495
             # Spawn directly without screen
1496
+            if [[ "$user" == "root" ]]; then
1497
+                command="sudo $command"
1498
+            fi
1491 1499
             _run_process "$name" "$command" "$group" &
1492 1500
         fi
1493 1501
     fi
... ...
@@ -1618,6 +1679,14 @@ function stop_process {
1618 1618
 
1619 1619
     if is_service_enabled $service; then
1620 1620
         # Kill via pid if we have one available
1621
+        if [[ "$USE_SYSTEMD" == "True" ]]; then
1622
+            # Only do this for units which appear enabled, this also
1623
+            # catches units that don't really exist for cases like
1624
+            # keystone without a failure.
1625
+            $SYSTEMCTL stop devstack@$service.service
1626
+            $SYSTEMCTL disable devstack@$service.service
1627
+        fi
1628
+
1621 1629
         if [[ -r $SERVICE_DIR/$SCREEN_NAME/$service.pid ]]; then
1622 1630
             pkill -g $(cat $SERVICE_DIR/$SCREEN_NAME/$service.pid)
1623 1631
             # oslo.service tends to stop actually shutting down
... ...
@@ -24,12 +24,12 @@ function start_dstat {
24 24
     # To enable memory_tracker add:
25 25
     #    enable_service memory_tracker
26 26
     # to your localrc
27
-    run_process memory_tracker "sudo $TOP_DIR/tools/memory_tracker.sh"
27
+    run_process memory_tracker "$TOP_DIR/tools/memory_tracker.sh" "" "root"
28 28
 
29 29
     # remove support for the old name when it's no longer used (sometime in Queens)
30 30
     if is_service_enabled peakmem_tracker; then
31 31
         deprecated "Use of peakmem_tracker in devstack is deprecated, use memory_tracker instead"
32
-        run_process peakmem_tracker "sudo $TOP_DIR/tools/memory_tracker.sh"
32
+        run_process peakmem_tracker "$TOP_DIR/tools/memory_tracker.sh" "" "root"
33 33
     fi
34 34
 }
35 35
 
... ...
@@ -602,8 +602,11 @@ function start_keystone {
602 602
         tail_log key /var/log/$APACHE_NAME/keystone.log
603 603
         tail_log key-access /var/log/$APACHE_NAME/keystone_access.log
604 604
     else # uwsgi
605
-        run_process key "$KEYSTONE_BIN_DIR/uwsgi $KEYSTONE_PUBLIC_UWSGI_FILE" "" "key-p"
606
-        run_process key "$KEYSTONE_BIN_DIR/uwsgi $KEYSTONE_ADMIN_UWSGI_FILE" "" "key-a"
605
+        # TODO(sdague): we should really get down to a single keystone here
606
+        enable_service key-p
607
+        enable_service key-a
608
+        run_process key-p "$KEYSTONE_BIN_DIR/uwsgi $KEYSTONE_PUBLIC_UWSGI_FILE" ""
609
+        run_process key-a "$KEYSTONE_BIN_DIR/uwsgi $KEYSTONE_ADMIN_UWSGI_FILE" ""
607 610
     fi
608 611
 
609 612
     echo "Waiting for keystone to start..."
... ...
@@ -38,6 +38,15 @@ fi
38 38
 # Set up default directories
39 39
 GITDIR["python-swiftclient"]=$DEST/python-swiftclient
40 40
 
41
+# Swift virtual environment
42
+if [[ ${USE_VENV} = True ]]; then
43
+    PROJECT_VENV["swift"]=${SWIFT_DIR}.venv
44
+    SWIFT_BIN_DIR=${PROJECT_VENV["swift"]}/bin
45
+else
46
+    SWIFT_BIN_DIR=$(get_python_exec_prefix)
47
+fi
48
+
49
+
41 50
 SWIFT_DIR=$DEST/swift
42 51
 SWIFT_AUTH_CACHE_DIR=${SWIFT_AUTH_CACHE_DIR:-/var/cache/swift}
43 52
 SWIFT_APACHE_WSGI_DIR=${SWIFT_APACHE_WSGI_DIR:-/var/www/swift}
... ...
@@ -807,10 +816,10 @@ function start_swift {
807 807
         local proxy_port=${SWIFT_DEFAULT_BIND_PORT}
808 808
         start_tls_proxy swift '*' $proxy_port $SERVICE_HOST $SWIFT_DEFAULT_BIND_PORT_INT
809 809
     fi
810
-    run_process s-proxy "swift-proxy-server ${SWIFT_CONF_DIR}/proxy-server.conf -v"
810
+    run_process s-proxy "$SWIFT_BIN_DIR/swift-proxy-server ${SWIFT_CONF_DIR}/proxy-server.conf -v"
811 811
     if [[ ${SWIFT_REPLICAS} == 1 ]]; then
812 812
         for type in object container account; do
813
-            run_process s-${type} "swift-${type}-server ${SWIFT_CONF_DIR}/${type}-server/1.conf -v"
813
+            run_process s-${type} "$SWIFT_BIN_DIR/swift-${type}-server ${SWIFT_CONF_DIR}/${type}-server/1.conf -v"
814 814
         done
815 815
     fi
816 816
 
... ...
@@ -87,6 +87,23 @@ HORIZON_APACHE_ROOT="/dashboard"
87 87
 # be disabled for automated testing by setting this value to False.
88 88
 USE_SCREEN=$(trueorfalse True USE_SCREEN)
89 89
 
90
+# Whether to use SYSTEMD to manage services
91
+USE_SYSTEMD=$(trueorfalse False USE_SYSTEMD)
92
+USER_UNITS=$(trueorfalse False USER_UNITS)
93
+if [[ "$USER_UNITS" == "True" ]]; then
94
+    SYSTEMD_DIR="$HOME/.local/share/systemd/user"
95
+    SYSTEMCTL="systemctl --user"
96
+    JOURNALCTL_F="journalctl -f -o short-precise --user-unit"
97
+else
98
+    SYSTEMD_DIR="/etc/systemd/system"
99
+    SYSTEMCTL="sudo systemctl"
100
+    JOURNALCTL_F="journalctl -f -o short-precise --unit"
101
+fi
102
+
103
+if [[ "$USE_SYSTEMD" == "True" ]]; then
104
+    USE_SCREEN=False
105
+fi
106
+
90 107
 # When using screen, should we keep a log file on disk?  You might
91 108
 # want this False if you have a long-running setup where verbose logs
92 109
 # can fill-up the host.