Linux 内核骨架：struct、函数指针与 container_of

Linux 内核如今已超过 4000 万行代码，是人类历史上规模最大的开源协作项目之一。这 4000 万行里，藏着同一套反复出现的骨架：巨大的 struct、嵌套的函数指针、神秘的 container_of 宏。搞懂它们，你就摸到了这座庞然大物的脊梁。

一、struct 是内核的"对象"

C 没有类，内核用 struct 模拟一切。

以网络层为例，struct sk_buff 是一个描述网络数据包的巨型结构体，源码里长达 200+ 字段。它承载了数据包从网卡驱动到 TCP 栈再到用户空间的全程生命周期。

// include/linux/skbuff.h（简化）
struct sk_buff {
    struct sk_buff      *next;
    struct sk_buff      *prev;

    ktime_t             tstamp;
    struct net_device   *dev;

    unsigned int        len, data_len;
    __u16               protocol;

    unsigned char       *head, *data, *tail, *end;
    // ... 200+ 个字段
};

再看字符设备驱动：

// include/linux/cdev.h
struct cdev {
    struct kobject          kobj;
    struct module           *owner;
    const struct file_operations *ops;  // ← 函数指针表
    struct list_head        list;
    dev_t                   dev;
    unsigned int            count;
};

cdev 嵌套了 kobject（设备模型基类）、list_head（侵入式链表节点）、file_operations（操作函数表）。这三种模式几乎贯穿整个内核，值得单独展开。

二、函数指针：内核的"虚函数表"

内核大量使用函数指针结构体实现多态，最典型的是 file_operations：

// include/linux/fs.h（节选）
struct file_operations {
    struct module *owner;
    loff_t  (*llseek)   (struct file *, loff_t, int);
    ssize_t (*read)     (struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write)    (struct file *, const char __user *, size_t, loff_t *);
    long    (*unlocked_ioctl)(struct file *, unsigned int, unsigned long);
    int     (*open)     (struct inode *, struct file *);
    int     (*release)  (struct inode *, struct file *);
    // ...
};

驱动开发者只需要填充自己关心的字段：

static const struct file_operations mydev_fops = {
    .owner   = THIS_MODULE,
    .open    = mydev_open,
    .read    = mydev_read,
    .write   = mydev_write,
    .release = mydev_release,
};

当用户调用 read(fd, buf, len)，内核路径是：

sys_read()
  → vfs_read()
    → file->f_op->read()   ← 函数指针调用，分发到驱动
      → mydev_read()

这就是内核的"动态派发"。不同类型的文件（字符设备、块设备、socket、管道……）共享同一套 VFS 接口，靠函数指针实现差异化行为。

类似的还有：

结构体	用途
`struct net_device_ops`	网卡驱动操作集
`struct inode_operations`	文件系统 inode 操作
`struct bus_type`	总线驱动模型
`struct platform_driver`	platform 总线驱动
`struct irq_chip`	中断控制器抽象

三、侵入式链表与 container_of

内核链表是"侵入式"的：不是链表持有数据，而是数据结构里嵌入链表节点。

// include/linux/list.h
struct list_head {
    struct list_head *next, *prev;
};

你的结构体这样用：

struct task_struct {
    // ...
    struct list_head    tasks;      // 进程链表节点
    struct list_head    children;   // 子进程链表
    struct list_head    sibling;    // 兄弟进程链表
    // ...
};

所有进程通过 tasks 链表串在一起。遍历时你拿到的是 list_head *，但你需要整个 task_struct——这时候就需要 container_of。

container_of 的魔法

// include/linux/kernel.h
#define container_of(ptr, type, member) ({          \
    void *__mptr = (void *)(ptr);                   \
    ((type *)(__mptr - offsetof(type, member))); })

三个参数：

ptr：指向嵌入成员的指针
type：外层结构体类型
member：该成员在结构体中的字段名

原理：利用 offsetof 算出成员在结构体内的偏移，用指针减去偏移，得到结构体首地址。

内存布局：
┌─────────────────────────┐ ← task_struct 首地址（我们要找的）
│  ...其他字段...           │
│  pid                    │
│  ...                    │
├─────────────────────────┤ ← tasks 的地址（ptr 指向这里）
│  tasks.next             │
│  tasks.prev             │
├─────────────────────────┤
│  ...后续字段...           │
└─────────────────────────┘

container_of(ptr, struct task_struct, tasks)
= ptr - offsetof(struct task_struct, tasks)
= task_struct 首地址 ✓

实际使用时，内核封装了 list_entry（本质是 container_of 的别名）和便捷的遍历宏：

// 遍历所有进程（init_task 是进程 0）
struct task_struct *task;
list_for_each_entry(task, &init_task.tasks, tasks) {
    printk("pid=%d comm=%s\n", task->pid, task->comm);
}

list_for_each_entry 展开后就是循环 + container_of，将链表节点指针还原为完整结构体。

四、三者如何协作：以字符驱动为例

// 完整的字符设备驱动骨架
struct mydev_data {
    struct cdev     cdev;       // ← 嵌入 cdev，用于 container_of
    struct list_head list;      // ← 嵌入链表节点
    int             minor;
    // 私有数据...
};

static int mydev_open(struct inode *inode, struct file *file)
{
    struct mydev_data *data;

    // inode->i_cdev 指向 cdev 成员
    // container_of 找回外层的 mydev_data
    data = container_of(inode->i_cdev, struct mydev_data, cdev);

    file->private_data = data;  // 存到 file，后续 read/write 直接用
    return 0;
}

static const struct file_operations mydev_fops = {
    .owner   = THIS_MODULE,
    .open    = mydev_open,
    // ...
};

整个流程：

注册时把 mydev_data.cdev 注册到内核，内核记住 cdev *
用户 open() 时内核找到对应 cdev *，调用 ops->open
驱动用 container_of 从 cdev * 找回整个 mydev_data，拿到私有数据

这是内核里最常见的模式，i2c_client、platform_device、net_device 都是这么玩的。

五、为什么这样设计？

没有 vtable overhead：函数指针结构体是纯 C 实现的虚表，没有 C++ RTTI 开销，适合对性能和内存极其敏感的内核环境。

侵入式链表零额外分配：节点嵌在对象里，不需要为链表额外 malloc 节点，减少碎片，也减少缓存 miss（数据和链表指针在同一块内存）。

container_of 零运行时开销：offsetof 在编译期计算，运行时只有一次减法，性能与直接指针相当。

统一接口，无限多态：同一个 file_operations 接口，背后可以是 ext4、tmpfs、/proc、socket——VFS 不需要知道细节，函数指针搞定一切。

小结

Linux 内核用三个 C 语言技巧构建了完整的面向对象体系：

struct 嵌套：组合代替继承，kobject 是所有设备的"基类"
函数指针表：file_operations/inode_operations 等是编译期确定的虚函数表
container_of：从嵌入成员指针找回完整对象，是侵入式数据结构的核心手段

读懂这三件套，内核里大量"魔法"代码就变得直白了。下次看到 container_of，不要慌——减一个偏移，如此而已。