Browse code

linux-secure: support xattr file markings

Support xattr 'em' markings to control EMUTRAMP and MPROTECT PaX features
per application. Disabling MPROTECT allows to run java virtual machine.

Change-Id: I5441326d4c36a61ed9c3593ed368e4f076379b97
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Reviewed-on: http://photon-jenkins.eng.vmware.com:8082/c/photon/+/20820
Reviewed-by: Keerthana K <keerthanak@vmware.com>
Tested-by: Michelle Wang <michellew@vmware.com>
Reviewed-on: http://photon-jenkins.eng.vmware.com:8082/c/photon/+/22432
Reviewed-by: Ajay Kaher <akaher@vmware.com>
Tested-by: Ajay Kaher <akaher@vmware.com>

Alexey Makhalov authored on 2023/05/25 06:00:20
Showing 4 changed files
... ...
@@ -4798,7 +4798,7 @@ CONFIG_SQUASHFS_FILE_CACHE=y
4798 4798
 CONFIG_SQUASHFS_DECOMP_SINGLE=y
4799 4799
 # CONFIG_SQUASHFS_DECOMP_MULTI is not set
4800 4800
 # CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU is not set
4801
-# CONFIG_SQUASHFS_XATTR is not set
4801
+CONFIG_SQUASHFS_XATTR=y
4802 4802
 CONFIG_SQUASHFS_ZLIB=y
4803 4803
 # CONFIG_SQUASHFS_LZ4 is not set
4804 4804
 CONFIG_SQUASHFS_LZO=y
... ...
@@ -4960,9 +4960,9 @@ CONFIG_IO_WQ=y
4960 4960
 #
4961 4961
 CONFIG_PAX=y
4962 4962
 CONFIG_PAX_NOWRITEEXEC=y
4963
-# CONFIG_PAX_EMUTRAMP is not set
4964
-CONFIG_EXECSTACK_DISABLED=y
4963
+CONFIG_PAX_EMUTRAMP=y
4965 4964
 CONFIG_PAX_MPROTECT=y
4965
+CONFIG_PAX_XATTR_PAX_FLAGS=y
4966 4966
 CONFIG_PAX_RAP=y
4967 4967
 CONFIG_KEYS=y
4968 4968
 # CONFIG_KEYS_REQUEST_CACHE is not set
... ...
@@ -16,7 +16,7 @@
16 16
 Summary:        Kernel
17 17
 Name:           linux-secure
18 18
 Version:        6.1.28
19
-Release:        4%{?kat_build:.kat}%{?dist}
19
+Release:        5%{?kat_build:.kat}%{?dist}
20 20
 License:        GPLv2
21 21
 URL:            http://www.kernel.org
22 22
 Group:          System Environment/Kernel
... ...
@@ -170,6 +170,9 @@ Requires(pre):    (coreutils or coreutils-selinux)
170 170
 Requires(preun):  (coreutils or coreutils-selinux)
171 171
 Requires(post):   (coreutils or coreutils-selinux)
172 172
 Requires(postun): (coreutils or coreutils-selinux)
173
+# Linux-secure handles user.pax.flags extended attribute
174
+# User must have setfattr/getfattr tools available
175
+Requires: attr
173 176
 
174 177
 %description
175 178
 Security hardened Linux kernel.
... ...
@@ -411,6 +414,8 @@ ln -sf linux-%{uname_r}.cfg /boot/photon.cfg
411 411
 %endif
412 412
 
413 413
 %changelog
414
+* Wed Nov 22 2023 Alexey Makhalov <amakhalov@vmware.com> 6.1.28-5
415
+- PaX: Support xattr 'em' file markings
414 416
 * Sun Nov 19 2023 Shreenidhi Shedi <sshedi@vmware.com> 6.1.28-4
415 417
 - Bump version as a part of openssl upgrade
416 418
 * Tue Oct 03 2023 Kuntal Nayak <nkunal@vmware.com> 6.1.28-3
... ...
@@ -1,26 +1,97 @@
1
-From 2f81e15c64fe8ad3732a71fbf1e4053842e6e1b7 Mon Sep 17 00:00:00 2001
1
+From a4ee0450bab7b133270e99ab236c8996df457178 Mon Sep 17 00:00:00 2001
2 2
 From: Alexey Makhalov <amakhalov@vmware.com>
3 3
 Date: Fri, 3 Feb 2017 07:10:18 -0800
4
-Subject: [PATCH 2/6] NOWRITEEXEC and PAX features: MPROTECT, EMUTRAMP
4
+Subject: [PATCH] NOWRITEEXEC and PAX features: MPROTECT, EMUTRAMP
5 5
 
6
+NOWRITEEXEC: Is an implementation of userspace W^X memory protection policy.
7
+W^X is a security feature in operating systems and virtual machines. It is
8
+a memory protection policy whereby every page in a process's or kernel's
9
+address space may be either writable or executable, but not both. Without
10
+such protection, a program can write (as data "W") CPU instructions in an
11
+area of memory intended for data and then run (as executable "X"; or
12
+read-execute "RX") those instructions. This can be dangerous if the writer
13
+of the memory is malicious. W^X is the Unix-like terminology for a strict
14
+use of the general concept of executable space protection, controlled via
15
+the mprotect system call. Kernel space was already W^X protected by NX
16
+feature. NOWRITEEXEC implements similar protection for userspace processes.
17
+
18
+NOWRITEEXEC disallow ELF program headers with WE (write and execute)
19
+properties set at the same time. In addition NOWRITEEXEC forbids any mappings
20
+(anonymous or file backed) to be both writable and executable.
21
+
22
+All modern toolchain and compilers generate ELF binaries with R, RW, RE flags
23
+only. There is an exception where GNU_STACK may have RWE. Some programs and
24
+libraries that for one reason or another attempt to execute special small code
25
+snippets from stack which is disallowed to be executable by NOWRITEEXEC. Most
26
+notable examples are the signal handler return code generated by the kernel
27
+itself and the GCC trampolines. To make those binaries happy, PaX introduced
28
+trampolines emulation (EMUTRAMP), where kernel traps on stack execution and
29
+emulates known sequence of instruction. For unknown pattern it faults with
30
+W^X violation error.
31
+
32
+MPROTECT: Enabling this option will prevent programs from
33
+ - changing the executable status of memory pages that were not originally
34
+   created as executable,
35
+ - making read-only executable pages writable again,
36
+ - creating executable pages from anonymous memory,
37
+ - making read-only-after-relocations (RELRO) data pages writable again.
38
+
39
+Enabling this option will prevent the injection and execution of 'foreign'
40
+code in a program. This will also break programs that rely on the old
41
+behaviour and expect that dynamically allocated memory via the malloc()
42
+family of functions is executable (which it is not). Notable examples are
43
+the XFree86 4.x server, the java runtime and wine.
44
+
45
+PAX_XATTR_PAX_FLAGS: filesystem extended attributes marking.
46
+Enabling this option will allow you to control PaX features on a per
47
+executable basis via the 'setfattr' utility.  The control flags will
48
+be read from the user.pax.flags extended attribute of the file. The main
49
+drawback is that extended attributes are not supported by some filesystems
50
+(e.g., isofs, udf, vfat) so copying files through such filesystems will lose
51
+the extended attributes and these PaX markings. If you enable none of the
52
+marking options then all applications will run with PaX enabled on them by
53
+default.
54
+
55
+Supported markings:
56
+ e - EMUTRAMP disabled. Executable stack disallowed.
57
+ E - EMUTRAMP enabled. Executable stack disallowed but emulated.
58
+ m - MPROTECT disabled. WE mappings are allowed. Security risk!
59
+ M - MPROTECT enabled. W^E policy is in place for all userspace mappings.
60
+Default PaX control setings:
61
+ "eM" - for most applications
62
+ "EM" - for applications with RWE stack.
63
+
64
+Per file markings can be set using setfattr tool. Example of disabling
65
+MPROTECT for java binary:
66
+setfattr -n user.pax.flags -v "em" /usr/lib/jvm/OpenJDK-17/bin/java
67
+
68
+Current process settings can be fetched from /proc/<pid>/status.
69
+
70
+URL: https://en.wikipedia.org/wiki/W^X
71
+URL: https://lwn.net/Articles/422487/
72
+Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
6 73
 Signed-off-by: Keerthana K <keerthanak@vmware.com>
74
+Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
7 75
 ---
8
- arch/x86/mm/fault.c      | 218 +++++++++++++++++++++++++++++++++++++++
9
- fs/binfmt_elf.c          |  70 +++++++++++++
10
- fs/exec.c                |   5 +
11
- include/linux/binfmts.h  |   3 +
12
- include/linux/elf.h      |   2 +
13
- include/linux/mm_types.h |   3 +
14
- include/linux/sched.h    |   2 +
15
- include/uapi/linux/elf.h |   2 +
16
- ipc/shm.c                |   3 +
17
- mm/mmap.c                |  25 +++++
18
- mm/mprotect.c            |  13 +++
19
- security/Kconfig         |  78 ++++++++++++++
20
- 12 files changed, 424 insertions(+)
76
+ arch/x86/mm/fault.c        | 218 +++++++++++++++++++++++++++++++++++++
77
+ fs/binfmt_elf.c            | 149 +++++++++++++++++++++++++
78
+ fs/exec.c                  |   6 +
79
+ fs/proc/array.c            |  18 +++
80
+ include/linux/binfmts.h    |   3 +
81
+ include/linux/elf.h        |   2 +
82
+ include/linux/mm_types.h   |   3 +
83
+ include/linux/sched.h      |   3 +
84
+ include/uapi/linux/elf.h   |   2 +
85
+ include/uapi/linux/xattr.h |   5 +
86
+ ipc/shm.c                  |   4 +
87
+ mm/mmap.c                  |  26 +++++
88
+ mm/mprotect.c              |  13 +++
89
+ mm/shmem.c                 |  37 +++++++
90
+ security/Kconfig           |  91 ++++++++++++++++
91
+ 15 files changed, 580 insertions(+)
21 92
 
22 93
 diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
23
-index fa71a5d12..ce8af675a 100644
94
+index 7b0d4ab89..9a58c2507 100644
24 95
 --- a/arch/x86/mm/fault.c
25 96
 +++ b/arch/x86/mm/fault.c
26 97
 @@ -166,6 +166,11 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
... ...
@@ -35,7 +106,7 @@ index fa71a5d12..ce8af675a 100644
35 35
  DEFINE_SPINLOCK(pgd_lock);
36 36
  LIST_HEAD(pgd_list);
37 37
  
38
-@@ -724,6 +729,13 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
38
+@@ -745,6 +750,13 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
39 39
  		 */
40 40
  		if (in_interrupt())
41 41
  			return;
... ...
@@ -49,7 +120,7 @@ index fa71a5d12..ce8af675a 100644
49 49
  
50 50
  		/*
51 51
  		 * Per the above we're !in_interrupt(), aka. task context.
52
-@@ -1546,3 +1558,209 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
52
+@@ -1577,3 +1589,209 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
53 53
  
54 54
  	irqentry_exit(regs, state);
55 55
  }
... ...
@@ -260,7 +331,7 @@ index fa71a5d12..ce8af675a 100644
260 260
 +}
261 261
 +#endif
262 262
 diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
263
-index 63c7ebb0d..de5cd38cb 100644
263
+index 444302afc..d9a8f88db 100644
264 264
 --- a/fs/binfmt_elf.c
265 265
 +++ b/fs/binfmt_elf.c
266 266
 @@ -45,6 +45,7 @@
... ...
@@ -292,26 +363,109 @@ index 63c7ebb0d..de5cd38cb 100644
292 292
  	.min_coredump	= ELF_EXEC_PAGESIZE,
293 293
  #endif
294 294
  };
295
-@@ -1006,6 +1014,18 @@ static int load_elf_binary(struct linux_binprm *bprm)
295
+@@ -821,6 +829,77 @@ static int parse_elf_properties(struct file *f, const struct elf_phdr *phdr,
296
+ 	return ret == -ENOENT ? 0 : ret;
297
+ }
298
+ 
299
++#if defined(CONFIG_PAX_XATTR_PAX_FLAGS)
300
++static ssize_t pax_getxattr(struct file * const file, void *value, size_t size)
301
++{
302
++	struct dentry *dentry = file->f_path.dentry;
303
++	struct inode *inode = dentry->d_inode;
304
++	ssize_t error;
305
++
306
++	error = inode_permission(file_mnt_user_ns(file), inode, MAY_EXEC);
307
++	if (error)
308
++		return error;
309
++
310
++	return __vfs_getxattr(dentry, inode, XATTR_NAME_USER_PAX_FLAGS, value, size);
311
++}
312
++
313
++static void pax_parse_xattr_pax(struct file * const file, int *m, int *e)
314
++{
315
++
316
++	ssize_t xattr_size, i;
317
++	unsigned char xattr_value[sizeof("em") - 1];
318
++
319
++	xattr_size = pax_getxattr(file, xattr_value, sizeof xattr_value);
320
++	if (xattr_size < 0 || xattr_size > sizeof xattr_value)
321
++		return;
322
++
323
++	for (i = 0; i < xattr_size; i++)
324
++		switch (xattr_value[i]) {
325
++		case 'm':
326
++			*m = 0;
327
++			break;
328
++		case 'M':
329
++			*m = MF_PAX_MPROTECT;
330
++			break;
331
++		case 'e':
332
++			*e = 0;
333
++			break;
334
++		case 'E':
335
++			*e = MF_PAX_EMUTRAMP;
336
++			break;
337
++		}
338
++}
339
++
340
++static long pax_parse_pax_flags(struct file * const file)
341
++{
342
++	int fm = -1, fe = -1;
343
++	unsigned long pax_flags = current->mm->pax_flags;
344
++
345
++	pax_parse_xattr_pax(file, &fm, &fe);
346
++
347
++	/* MPROTECT: overwrite from xattr */
348
++	if (fm != -1) {
349
++		pax_flags &= ~MF_PAX_MPROTECT;
350
++		pax_flags |= fm;
351
++	}
352
++
353
++	/* EMUTRAMP: sanity check */
354
++	if (fe == MF_PAX_EMUTRAMP) {
355
++		pax_flags |= MF_PAX_EMUTRAMP;
356
++	} else if (!fe){
357
++		if (pax_flags & MF_PAX_EMUTRAMP) {
358
++			pr_err("PAX: %s[%d] needs an executable stack. Can not disable EMUTRAMP. Please fix 'e' bit in "
359
++					XATTR_NAME_USER_PAX_FLAGS " xattr.",
360
++					current->comm, task_pid_nr(current));
361
++			return -EINVAL;
362
++		}
363
++	}
364
++
365
++	current->mm->pax_flags = pax_flags;
366
++	return 0;
367
++}
368
++#endif
369
++
370
+ static int load_elf_binary(struct linux_binprm *bprm)
371
+ {
372
+ 	struct file *interpreter = NULL; /* to shut gcc up */
373
+@@ -1006,6 +1085,23 @@ static int load_elf_binary(struct linux_binprm *bprm)
296 374
  	/* Do this immediately, since STACK_TOP as used in setup_arg_pages
297 375
  	   may depend on the personality.  */
298 376
  	SET_PERSONALITY2(*elf_ex, &arch_state);
299
-+#if defined(CONFIG_PAX)
300
-+	current->mm->pax_flags = 0UL;
301 377
 +#if defined(CONFIG_PAX_NOWRITEEXEC)
378
++	/* Enable MPROTECT by default */
379
++	current->mm->pax_flags = MF_PAX_MPROTECT;
380
++#if defined(CONFIG_PAX_EMUTRAMP)
381
++	/* Enable EMUTRAMP if ELF requires executable stack */
302 382
 +	if (executable_stack == EXSTACK_ENABLE_X)
303 383
 +	{
304
-+#if defined(CONFIG_PAX_EMUTRAMP)
305 384
 +		executable_stack = EXSTACK_DISABLE_X;
306 385
 +		current->mm->pax_flags |= MF_PAX_EMUTRAMP;
307
-+#endif
308 386
 +	}
309 387
 +#endif
388
++#if defined(CONFIG_PAX_XATTR_PAX_FLAGS)
389
++	retval = pax_parse_pax_flags(bprm->file);
390
++	if (retval < 0)
391
++		goto out_free_dentry;
392
++#endif
310 393
 +#endif
311 394
  	if (elf_read_implies_exec(*elf_ex, executable_stack))
312 395
  		current->personality |= READ_IMPLIES_EXEC;
313 396
  
314
-@@ -2329,6 +2349,56 @@ static int elf_core_dump(struct coredump_params *cprm)
397
+@@ -2330,6 +2426,59 @@ static int elf_core_dump(struct coredump_params *cprm)
315 398
  
316 399
  #endif		/* CONFIG_ELF_CORE */
317 400
  
... ...
@@ -330,8 +484,9 @@ index 63c7ebb0d..de5cd38cb 100644
330 330
 +	unsigned long i;
331 331
 +	unsigned long oldflags;
332 332
 +	bool is_relro;
333
++	loff_t pos;
333 334
 +
334
-+	if (!vma->vm_file)
335
++	if (!(vma->vm_mm->pax_flags & MF_PAX_MPROTECT) || !vma->vm_file)
335 336
 +		return;
336 337
 +
337 338
 +	oldflags = vma->vm_flags & (VM_MAYEXEC | VM_MAYWRITE | VM_MAYREAD | VM_EXEC | VM_WRITE | VM_READ);
... ...
@@ -343,7 +498,8 @@ index 63c7ebb0d..de5cd38cb 100644
343 343
 +	if (!is_relro)
344 344
 +		return;
345 345
 +
346
-+	if (sizeof(elf_h) != kernel_read(vma->vm_file, 0UL, (char *)&elf_h, sizeof(elf_h)) ||
346
++	pos = 0UL;
347
++	if (sizeof(elf_h) != kernel_read(vma->vm_file, (char *)&elf_h, sizeof(elf_h), &pos) ||
347 348
 +	    memcmp(elf_h.e_ident, ELFMAG, SELFMAG) ||
348 349
 +	    (elf_h.e_type != ET_DYN && elf_h.e_type != ET_EXEC) ||
349 350
 +	    !elf_check_arch(&elf_h) ||
... ...
@@ -352,7 +508,8 @@ index 63c7ebb0d..de5cd38cb 100644
352 352
 +		return;
353 353
 +
354 354
 +	for (i = 0UL; i < elf_h.e_phnum; i++) {
355
-+		if (sizeof(elf_p) != kernel_read(vma->vm_file, elf_h.e_phoff + i*sizeof(elf_p), (char *)&elf_p, sizeof(elf_p)))
355
++		pos = elf_h.e_phoff + i*sizeof(elf_p);
356
++		if (sizeof(elf_p) != kernel_read(vma->vm_file, (char *)&elf_p, sizeof(elf_p), &pos))
356 357
 +			return;
357 358
 +		if (elf_p.p_type == PT_GNU_RELRO) {
358 359
 +			if (!is_relro)
... ...
@@ -369,27 +526,64 @@ index 63c7ebb0d..de5cd38cb 100644
369 369
  {
370 370
  	register_binfmt(&elf_format);
371 371
 diff --git a/fs/exec.c b/fs/exec.c
372
-index d046dbb9c..61aac7f67 100644
372
+index a0b1f0337..91925ca79 100644
373 373
 --- a/fs/exec.c
374 374
 +++ b/fs/exec.c
375
-@@ -805,7 +805,12 @@ int setup_arg_pages(struct linux_binprm *bprm,
375
+@@ -807,7 +807,13 @@ int setup_arg_pages(struct linux_binprm *bprm,
376 376
  	if (unlikely(executable_stack == EXSTACK_ENABLE_X))
377 377
  		vm_flags |= VM_EXEC;
378 378
  	else if (executable_stack == EXSTACK_DISABLE_X)
379 379
 +	{
380 380
  		vm_flags &= ~VM_EXEC;
381 381
 +#ifdef CONFIG_PAX_MPROTECT
382
-+		vm_flags &= ~VM_MAYEXEC;
382
++		if (mm->pax_flags & MF_PAX_MPROTECT)
383
++			vm_flags &= ~VM_MAYEXEC;
383 384
 +#endif
384 385
 +	}
385 386
  	vm_flags |= mm->def_flags;
386 387
  	vm_flags |= VM_STACK_INCOMPLETE_SETUP;
387 388
  
389
+diff --git a/fs/proc/array.c b/fs/proc/array.c
390
+index 49283b810..1ee4742e4 100644
391
+--- a/fs/proc/array.c
392
+@@ -428,6 +428,19 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
393
+ 	seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
394
+ }
395
+ 
396
++#if defined(CONFIG_PAX_NOWRITEEXEC)
397
++static inline void task_pax(struct seq_file *m, struct task_struct *p)
398
++{
399
++	if (p->mm)
400
++		seq_printf(m, "PaX:\t%c%c\n",
401
++			p->mm->pax_flags & MF_PAX_EMUTRAMP ? 'E' : 'e',
402
++			p->mm->pax_flags & MF_PAX_MPROTECT ? 'M' : 'm');
403
++	else
404
++		seq_printf(m, "PaX:\t--\n");
405
++}
406
++#endif
407
++
408
++
409
+ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
410
+ 			struct pid *pid, struct task_struct *task)
411
+ {
412
+@@ -451,6 +464,11 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
413
+ 	task_cpus_allowed(m, task);
414
+ 	cpuset_task_status_allowed(m, task);
415
+ 	task_context_switch_counts(m, task);
416
++
417
++#if defined(CONFIG_PAX_NOWRITEEXEC)
418
++	task_pax(m, task);
419
++#endif
420
++
421
+ 	return 0;
422
+ }
423
+ 
388 424
 diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
389
-index 3dc20c4f3..ad06adba2 100644
425
+index 8d51f69f9..1e2c70230 100644
390 426
 --- a/include/linux/binfmts.h
391 427
 +++ b/include/linux/binfmts.h
392
-@@ -89,6 +89,9 @@ struct linux_binfmt {
428
+@@ -86,6 +86,9 @@ struct linux_binfmt {
393 429
  	int (*load_shlib)(struct file *);
394 430
  #ifdef CONFIG_COREDUMP
395 431
  	int (*core_dump)(struct coredump_params *cprm);
... ...
@@ -420,28 +614,29 @@ index c9a46c4e1..8646a6b22 100644
420 420
  #endif
421 421
  
422 422
 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
423
-index cf97f3884..f5af71d51 100644
423
+index 247aedb18..8399506d0 100644
424 424
 --- a/include/linux/mm_types.h
425 425
 +++ b/include/linux/mm_types.h
426
-@@ -661,6 +661,9 @@ struct mm_struct {
426
+@@ -684,6 +684,9 @@ struct mm_struct {
427 427
  		atomic_long_t hugetlb_usage;
428 428
  #endif
429 429
  		struct work_struct async_put_work;
430
-+#if defined(CONFIG_PAX)
431
-+	unsigned long pax_flags;
430
++#if defined(CONFIG_PAX_NOWRITEEXEC)
431
++		unsigned long pax_flags;
432 432
 +#endif
433 433
  
434 434
  #ifdef CONFIG_IOMMU_SVA
435 435
  		u32 pasid;
436 436
 diff --git a/include/linux/sched.h b/include/linux/sched.h
437
-index e7b2f8a5c..0714baf8c 100644
437
+index ffb6eb55c..7e80d3e6a 100644
438 438
 --- a/include/linux/sched.h
439 439
 +++ b/include/linux/sched.h
440
-@@ -1291,6 +1291,8 @@ struct task_struct {
440
+@@ -1306,6 +1306,9 @@ struct task_struct {
441 441
  	unsigned long			numa_pages_migrated;
442 442
  #endif /* CONFIG_NUMA_BALANCING */
443 443
  
444 444
 +#define MF_PAX_EMUTRAMP		0x02000000	/* Emulate trampolines */
445
++#define MF_PAX_MPROTECT		0x04000000	/* Restrict mprotect() */
445 446
 +
446 447
  #ifdef CONFIG_RSEQ
447 448
  	struct rseq __user *rseq;
... ...
@@ -459,72 +654,89 @@ index c7b056af9..c10d0910e 100644
459 459
  #define DT_ENCODING	32
460 460
  #define OLD_DT_LOOS	0x60000000
461 461
  #define DT_LOOS		0x6000000d
462
+diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h
463
+index 9463db2df..d4264c8df 100644
464
+--- a/include/uapi/linux/xattr.h
465
+@@ -81,5 +81,10 @@
466
+ #define XATTR_POSIX_ACL_DEFAULT  "posix_acl_default"
467
+ #define XATTR_NAME_POSIX_ACL_DEFAULT XATTR_SYSTEM_PREFIX XATTR_POSIX_ACL_DEFAULT
468
+ 
469
++/* User namespace */
470
++#define XATTR_PAX_PREFIX "pax."
471
++#define XATTR_PAX_FLAGS_SUFFIX "flags"
472
++#define XATTR_NAME_USER_PAX_FLAGS XATTR_USER_PREFIX XATTR_PAX_PREFIX XATTR_PAX_FLAGS_SUFFIX
473
++#define XATTR_NAME_PAX_FLAGS XATTR_PAX_PREFIX XATTR_PAX_FLAGS_SUFFIX
474
+ 
475
+ #endif /* _UAPI_LINUX_XATTR_H */
462 476
 diff --git a/ipc/shm.c b/ipc/shm.c
463
-index b3048ebd5..41ec32df8 100644
477
+index bd2fcc4d4..ac5e177c2 100644
464 478
 --- a/ipc/shm.c
465 479
 +++ b/ipc/shm.c
466
-@@ -1556,6 +1556,9 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
480
+@@ -1572,6 +1572,10 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
467 481
  		f_flags = O_RDWR;
468 482
  	}
469 483
  	if (shmflg & SHM_EXEC) {
470
-+#ifdef CONFIG_PAX_NOWRITEEXEC
471
-+		goto out;
484
++#ifdef CONFIG_PAX_MPROTECT
485
++		if (current->mm->pax_flags & MF_PAX_MPROTECT)
486
++			goto out;
472 487
 +#endif
473 488
  		prot |= PROT_EXEC;
474 489
  		acc_mode |= S_IXUGO;
475 490
  	}
476 491
 diff --git a/mm/mmap.c b/mm/mmap.c
477
-index 9d780f415..07439b77e 100644
492
+index 14ca25918..59e5c74f1 100644
478 493
 --- a/mm/mmap.c
479 494
 +++ b/mm/mmap.c
480
-@@ -1435,6 +1435,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
495
+@@ -1306,6 +1306,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
481 496
  	vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
482 497
  			mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
483 498
  
484
-+#ifdef CONFIG_PAX_NOWRITEEXEC
485
-+	if ((vm_flags & (VM_WRITE | VM_EXEC)) == (VM_WRITE | VM_EXEC))
486
-+		return -EPERM;
487 499
 +#ifdef CONFIG_PAX_MPROTECT
488
-+	if (!(vm_flags & VM_EXEC))
489
-+		vm_flags &= ~VM_MAYEXEC;
490
-+	else
491
-+		vm_flags &= ~VM_MAYWRITE;
492
-+#endif
500
++	if (mm->pax_flags & MF_PAX_MPROTECT) {
501
++		if ((vm_flags & (VM_WRITE | VM_EXEC)) == (VM_WRITE | VM_EXEC))
502
++			return -EPERM;
503
++		if (!(vm_flags & VM_EXEC))
504
++			vm_flags &= ~VM_MAYEXEC;
505
++		else
506
++			vm_flags &= ~VM_MAYWRITE;
507
++	}
493 508
 +#endif
494 509
 +
495 510
  	if (flags & MAP_LOCKED)
496 511
  		if (!can_do_mlock())
497 512
  			return -EPERM;
498
-@@ -2947,6 +2947,9 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
513
+@@ -2974,6 +2985,10 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
499 514
  	 * Note: This happens *after* clearing old mappings in some code paths.
500 515
  	 */
501 516
  	flags |= VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags;
502 517
 +#ifdef CONFIG_PAX_MPROTECT
503
-+	flags &= ~VM_MAYEXEC;
518
++	if (mm->pax_flags & MF_PAX_MPROTECT)
519
++		flags &= ~VM_MAYEXEC;
504 520
 +#endif
505 521
  	if (!may_expand_vm(mm, flags, len >> PAGE_SHIFT))
506 522
  		return -ENOMEM;
507 523
  
508
-@@ -3385,6 +3399,17 @@ static struct vm_area_struct *__install_special_mapping(
524
+@@ -3433,6 +3448,17 @@ static struct vm_area_struct *__install_special_mapping(
509 525
  	vma->vm_start = addr;
510 526
  	vma->vm_end = addr + len;
511 527
  
512
-+#ifdef CONFIG_PAX_NOWRITEEXEC
513
-+	if ((vm_flags & (VM_WRITE | VM_EXEC)) == (VM_WRITE | VM_EXEC))
514
-+		return ERR_PTR(-EPERM);
515 528
 +#ifdef CONFIG_PAX_MPROTECT
516
-+	if (!(vm_flags & VM_EXEC))
517
-+		vm_flags &= ~VM_MAYEXEC;
518
-+	else
519
-+		vm_flags &= ~VM_MAYWRITE;
520
-+#endif
529
++	if (mm->pax_flags & MF_PAX_MPROTECT) {
530
++		if ((vm_flags & (VM_WRITE | VM_EXEC)) == (VM_WRITE | VM_EXEC))
531
++			return ERR_PTR(-EPERM);
532
++		if (!(vm_flags & VM_EXEC))
533
++			vm_flags &= ~VM_MAYEXEC;
534
++		else
535
++			vm_flags &= ~VM_MAYWRITE;
536
++	}
521 537
 +#endif
522 538
 +
523 539
  	vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY;
524 540
  	vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
525 541
  	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
526 542
 diff --git a/mm/mprotect.c b/mm/mprotect.c
527
-index bc6bddd15..2bd5837d0 100644
543
+index 668bfaa6e..4fe86436b 100644
528 544
 --- a/mm/mprotect.c
529 545
 +++ b/mm/mprotect.c
530 546
 @@ -26,6 +26,10 @@
... ...
@@ -538,7 +750,7 @@ index bc6bddd15..2bd5837d0 100644
538 538
  #include <linux/uaccess.h>
539 539
  #include <linux/mm_inline.h>
540 540
  #include <linux/pgtable.h>
541
-@@ -622,6 +626,10 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
541
+@@ -631,6 +635,10 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma,
542 542
  	 * held in write mode.
543 543
  	 */
544 544
  	vma->vm_flags = newflags;
... ...
@@ -549,7 +761,7 @@ index bc6bddd15..2bd5837d0 100644
549 549
  	/*
550 550
  	 * We want to check manually if we can change individual PTEs writable
551 551
  	 * if we can't do that automatically for all PTEs in a mapping. For
552
-@@ -747,6 +747,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
552
+@@ -739,6 +747,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
553 553
  	else
554 554
  		prev = mas_prev(&mas, 0);
555 555
  
... ...
@@ -561,11 +773,73 @@ index bc6bddd15..2bd5837d0 100644
561 561
  	tlb_gather_mmu(&tlb, current->mm);
562 562
  	for (nstart = start ; ; ) {
563 563
  		unsigned long mask_off_old_flags;
564
+diff --git a/mm/shmem.c b/mm/shmem.c
565
+index a8d9fd039..e682ad202 100644
566
+--- a/mm/shmem.c
567
+@@ -3315,6 +3315,31 @@ static int shmem_xattr_handler_set(const struct xattr_handler *handler,
568
+ 	return err;
569
+ }
570
+ 
571
++#ifdef CONFIG_PAX_XATTR_PAX_FLAGS
572
++static int shmem_user_xattr_handler_set(const struct xattr_handler *handler,
573
++				   struct user_namespace *mnt_userns,
574
++				   struct dentry *unused, struct inode *inode,
575
++				   const char *name, const void *value,
576
++				   size_t size, int flags)
577
++{
578
++	struct shmem_inode_info *info = SHMEM_I(inode);
579
++	int err;
580
++
581
++	if (strcmp(name, XATTR_NAME_PAX_FLAGS))
582
++		return -EOPNOTSUPP;
583
++	if (size > 2)
584
++		return -EINVAL;
585
++
586
++	name = xattr_full_name(handler, name);
587
++	err = simple_xattr_set(&info->xattrs, name, value, size, flags, NULL);
588
++	if (!err) {
589
++		inode->i_ctime = current_time(inode);
590
++		inode_inc_iversion(inode);
591
++	}
592
++	return err;
593
++}
594
++#endif
595
++
596
+ static const struct xattr_handler shmem_security_xattr_handler = {
597
+ 	.prefix = XATTR_SECURITY_PREFIX,
598
+ 	.get = shmem_xattr_handler_get,
599
+@@ -3327,6 +3352,14 @@ static const struct xattr_handler shmem_trusted_xattr_handler = {
600
+ 	.set = shmem_xattr_handler_set,
601
+ };
602
+ 
603
++#ifdef CONFIG_PAX_XATTR_PAX_FLAGS
604
++static const struct xattr_handler shmem_user_xattr_handler = {
605
++	.prefix = XATTR_USER_PREFIX,
606
++	.get = shmem_xattr_handler_get,
607
++	.set = shmem_user_xattr_handler_set,
608
++};
609
++#endif
610
++
611
+ static const struct xattr_handler *shmem_xattr_handlers[] = {
612
+ #ifdef CONFIG_TMPFS_POSIX_ACL
613
+ 	&posix_acl_access_xattr_handler,
614
+@@ -3334,6 +3367,10 @@ static const struct xattr_handler *shmem_xattr_handlers[] = {
615
+ #endif
616
+ 	&shmem_security_xattr_handler,
617
+ 	&shmem_trusted_xattr_handler,
618
++#ifdef CONFIG_PAX_XATTR_PAX_FLAGS
619
++	/* Allow pax xattr for tmpfs */
620
++	&shmem_user_xattr_handler,
621
++#endif
622
+ 	NULL
623
+ };
624
+ 
564 625
 diff --git a/security/Kconfig b/security/Kconfig
565
-index e6db09a77..1bd57b5d5 100644
626
+index e6db09a77..0a1c517da 100644
566 627
 --- a/security/Kconfig
567 628
 +++ b/security/Kconfig
568
-@@ -5,6 +5,84 @@
629
+@@ -5,6 +5,97 @@
569 630
  
570 631
  menu "Security options"
571 632
  
... ...
@@ -594,42 +868,31 @@ index e6db09a77..1bd57b5d5 100644
594 594
 +	  are the XFree86 4.x server, the java runtime and wine.
595 595
 +
596 596
 +if PAX_NOWRITEEXEC
597
-+choice
598
-+	prompt "Executable stack"
599
-+
597
++config PAX_EMUTRAMP
598
++	bool "Executable stack emulation"
600 599
 +	help
601
-+	  Select the security model for the binaries with executable stack.
602
-+
603
-+	config PAX_EMUTRAMP
604
-+		bool "emulate"
605
-+		help
606
-+		  There are some programs and libraries that for one reason or
607
-+		  another attempt to execute special small code snippets from
608
-+		  non-executable memory pages.  Most notable examples are the
609
-+		  signal handler return code generated by the kernel itself and
610
-+		  the GCC trampolines.
611
-+
612
-+		  If you enabled CONFIG_NOWRITEEXEC then such programs will no
613
-+		  longer work under your kernel.
614
-+
615
-+		  As a remedy you can say Y here enable trampoline emulation for
616
-+		  the affected programs yet still have the protection provided by
617
-+		  the non-executable pages.
618
-+
619
-+		  NOTE: enabling this feature *may* open up a loophole in the
620
-+		  protection provided by non-executable pages that an attacker
621
-+		  could abuse.  Therefore the best solution is to not have any
622
-+		  files on your system that would require this option.  This can
623
-+		  be achieved by not using libc5 (which relies on the kernel
624
-+		  signal handler return code) and not using or rewriting programs
625
-+		  that make use of the nested function implementation of GCC.
626
-+		  Skilled users can just fix GCC itself so that it implements
627
-+		  nested function calls in a way that does not interfere with PaX.
628
-+
629
-+	config EXECSTACK_DISABLED
630
-+		bool "disabled"
631
-+
632
-+endchoice
600
++	  There are some programs and libraries that for one reason or
601
++	  another attempt to execute special small code snippets from
602
++	  non executable stack.  Most notable examples are the
603
++	  signal handler return code generated by the kernel itself and
604
++	  the GCC trampolines.
605
++
606
++	  If you enabled CONFIG_NOWRITEEXEC then such programs will no
607
++	  longer work under your kernel.
608
++
609
++	  As a remedy you can say Y here enable trampoline emulation for
610
++	  the affected programs yet still have the protection provided by
611
++	  the non-executable pages.
612
++
613
++	  NOTE: enabling this feature *may* open up a loophole in the
614
++	  protection provided by non-executable pages that an attacker
615
++	  could abuse.  Therefore the best solution is to not have any
616
++	  files on your system that would require this option.  This can
617
++	  be achieved by not using libc5 (which relies on the kernel
618
++	  signal handler return code) and not using or rewriting programs
619
++	  that make use of the nested function implementation of GCC.
620
++	  Skilled users can just fix GCC itself so that it implements
621
++	  nested function calls in a way that does not interfere with PaX.
633 622
 +
634 623
 +config PAX_MPROTECT
635 624
 +	bool "Restrict mprotect()"
... ...
@@ -644,6 +907,30 @@ index e6db09a77..1bd57b5d5 100644
644 644
 +	  You should say Y here to complete the protection provided by
645 645
 +	  the enforcement of non-executable pages.
646 646
 +
647
++config PAX_XATTR_PAX_FLAGS
648
++	bool 'Use filesystem extended attributes marking'
649
++	select CIFS_XATTR if CIFS
650
++	select EXT2_FS_XATTR if EXT2_FS
651
++	select EXT3_FS_XATTR if EXT3_FS
652
++	select F2FS_FS_XATTR if F2FS_FS
653
++	select JFFS2_FS_XATTR if JFFS2_FS
654
++	select REISERFS_FS_XATTR if REISERFS_FS
655
++	select SQUASHFS_XATTR if SQUASHFS
656
++	select TMPFS_XATTR if TMPFS
657
++	help
658
++	  Enabling this option will allow you to control PaX features on
659
++	  a per executable basis via the 'setfattr' utility.  The control
660
++	  flags will be read from the user.pax.flags extended attribute of
661
++	  the file.  This marking has the benefit of supporting binary-only
662
++	  applications that self-check themselves (e.g., skype) and would
663
++	  not tolerate chpax/paxctl changes.  The main drawback is that
664
++	  extended attributes are not supported by some filesystems (e.g.,
665
++	  isofs, udf, vfat) so copying files through such filesystems will
666
++	  lose the extended attributes and these PaX markings.
667
++
668
++	  If you enable none of the marking options then all applications
669
++	  will run with PaX enabled on them by default.
670
++
647 671
 +endif
648 672
 +endif
649 673
 +
... ...
@@ -651,5 +938,5 @@ index e6db09a77..1bd57b5d5 100644
651 651
  
652 652
  config SECURITY_DMESG_RESTRICT
653 653
 -- 
654
-2.37.3
654
+2.39.0
655 655
 
... ...
@@ -5108,8 +5108,8 @@ diff --git a/security/Kconfig b/security/Kconfig
5108 5108
 index 1bd57b5d5..dc8fbe4fa 100644
5109 5109
 --- a/security/Kconfig
5110 5110
 +++ b/security/Kconfig
5111
-@@ -81,6 +81,26 @@ config PAX_MPROTECT
5112
- 	  the enforcement of non-executable pages.
5111
+@@ -94,6 +94,26 @@ config PAX_XATTR_PAX_FLAGS
5112
+ 	  will run with PaX enabled on them by default.
5113 5113
  
5114 5114
  endif
5115 5115
 +