数据结构基础：从数组到红黑树

选数据结构不是为了背面试题——是为了做出正确的工程决策。

用错了数据结构，代码不会报错，但会慢。慢到你追不上来。

一个真实案例：某游戏引擎在每帧遍历 std::list<Entity> 做碰撞检测，改成 std::vector 之后帧率翻倍，原因是链表节点在内存里不连续，CPU cache miss 代价在高频循环里被放大了数十倍。

数据结构的价值在于：同样的逻辑，不同的结构，性能可以差 100 倍。

数组 Array

核心思路：连续内存块，下标直接寻址。静态数组大小固定；动态数组（std::vector）在容量不足时以 2x 因子扩容，均摊 O(1) push_back。

操作	时间复杂度
随机访问（按下标）	O(1)
尾部插入（均摊）	O(1)
中间插入 / 删除	O(n)
搜索（无序）	O(n)

Cache Locality 优势：数组元素在内存中紧密排列，CPU 预取一次缓存行（64 bytes）可以装下多个元素。链表节点散落各处，每次指针跳转都可能触发 cache miss，在密集遍历场景下代价极高。

#include <vector>
#include <algorithm>

// 动态数组：预分配容量避免频繁扩容
std::vector<int> v;
v.reserve(1000);           // 预分配，不改变 size
v.push_back(42);           // O(1) 均摊
v.insert(v.begin(), 0);    // O(n)，头插代价高

// 随机访问
int x = v[3];              // O(1)，不检查边界
int y = v.at(3);           // O(1)，抛 out_of_range

// 排序后二分查找
std::sort(v.begin(), v.end());
bool found = std::binary_search(v.begin(), v.end(), 42);  // O(log n)

适用场景：

频繁随机访问、下标遍历
数据量可预估，需要 cache 友好的密集计算（矩阵、图像像素）
作为其他结构（堆、哈希表槽）的底层存储

链表 Linked List

核心思路：节点通过指针串联，无需连续内存。单链表每节点存 next，双链表额外存 prev，支持 O(1) 双向删除。

操作	时间复杂度
随机访问	O(n)
头 / 尾插入（已知位置）	O(1)
中间插入（已知迭代器）	O(1)
搜索	O(n)

list vs vector 的权衡：

#include <list>

std::list<int> lst = {1, 2, 3, 4, 5};

// 在已知迭代器位置插入是 O(1)
auto it = std::find(lst.begin(), lst.end(), 3);
lst.insert(it, 99);   // 在 3 前面插入 99，不移动其他元素

// 删除也是 O(1)（已知迭代器）
lst.erase(it);        // 删除 3

// splice：O(1) 把另一个 list 的元素移过来，不拷贝
std::list<int> other = {10, 20};
lst.splice(lst.end(), other);  // 把 other 全部接到 lst 末尾

何时选 std::list 而不是 std::vector：

需要在容器中间频繁插入/删除，且持有迭代器不失效（如 LRU Cache 实现）
使用 splice 做 O(1) 节点转移（如任务调度队列）

何时坚持用 std::vector：

几乎所有顺序遍历场景——cache locality 优势通常超过 list 的插入优势
数据量 < 几百个元素时，vector 搜索甚至比 map 快（SIMD 顺序扫描）

栈 Stack

核心思路：LIFO（Last In First Out）。只操作栈顶，push/pop 均 O(1)。

操作	时间复杂度
push / pop	O(1)
peek（查看栈顶）	O(1)

Call Stack 类比：函数调用天然是栈结构——调用函数时压栈（保存局部变量 + 返回地址），返回时弹栈。递归爆栈（stack overflow）就是压栈次数超出系统限制。

#include <stack>
#include <string>
#include <stdexcept>

// 括号匹配：经典栈应用
bool isValid(const std::string& s) {
    std::stack<char> st;
    for (char c : s) {
        if (c == '(' || c == '[' || c == '{') {
            st.push(c);
        } else {
            if (st.empty()) return false;
            char top = st.top(); st.pop();
            if (c == ')' && top != '(') return false;
            if (c == ']' && top != '[') return false;
            if (c == '}' && top != '{') return false;
        }
    }
    return st.empty();
}

// 单调栈：O(n) 解决"下一个更大元素"
std::vector<int> nextGreater(const std::vector<int>& nums) {
    int n = nums.size();
    std::vector<int> res(n, -1);
    std::stack<int> st;  // 存下标
    for (int i = 0; i < n; i++) {
        while (!st.empty() && nums[i] > nums[st.top()]) {
            res[st.top()] = nums[i];
            st.pop();
        }
        st.push(i);
    }
    return res;
}

适用场景：括号/表达式解析、DFS 迭代实现、撤销/重做（Undo/Redo）、函数调用模拟。

队列 Queue

核心思路：FIFO（First In First Out）。队尾入队，队头出队，均 O(1)。双端队列（deque）两端都可操作。

操作	时间复杂度
enqueue（push_back）	O(1)
dequeue（pop_front）	O(1)
peek	O(1)

#include <queue>
#include <vector>

// BFS 模板：图的层序遍历
void bfs(int start, const std::vector<std::vector<int>>& adj) {
    int n = adj.size();
    std::vector<bool> visited(n, false);
    std::queue<int> q;

    visited[start] = true;
    q.push(start);

    while (!q.empty()) {
        int node = q.front(); q.pop();
        // 处理 node
        for (int neighbor : adj[node]) {
            if (!visited[neighbor]) {
                visited[neighbor] = true;
                q.push(neighbor);
            }
        }
    }
}

适用场景：BFS、任务调度（先来先服务）、消息队列、滑动窗口（配合 std::deque）。

哈希表 Hash Table

核心思路：通过哈希函数将 key 映射到数组槽，实现近似 O(1) 的增删查。冲突解决主要有两种策略：

链地址法（Chaining）：每个槽是链表，std::unordered_map 采用此方案
开放地址法（Open Addressing）：冲突时探测其他槽，cache 友好但删除复杂

操作	平均	最坏（哈希碰撞退化）
插入	O(1)	O(n)
删除	O(1)	O(n)
查找	O(1)	O(n)

Load Factor（负载因子）：元素数 / 槽数。unordered_map 默认最大 load factor 为 1.0，超过时触发 rehash（扩容 + 重新哈希），代价 O(n)。

#include <unordered_map>
#include <string>

// 词频统计
std::unordered_map<std::string, int> freq;
std::string words[] = {"hello", "world", "hello", "cpp"};
for (const auto& w : words) {
    freq[w]++;   // 不存在则默认构造为 0
}

// 预分配桶，减少 rehash
freq.reserve(1000);          // 至少 1000 个桶
freq.max_load_factor(0.7f);  // 提前触发扩容，减少碰撞

// 遍历
for (const auto& [key, val] : freq) {
    // key: "hello"=2, "world"=1, "cpp"=1
}

// 自定义哈希（用于自定义 key 类型）
struct PairHash {
    size_t operator()(const std::pair<int,int>& p) const {
        return std::hash<int>{}(p.first) ^ (std::hash<int>{}(p.second) << 32);
    }
};
std::unordered_map<std::pair<int,int>, int, PairHash> grid;

适用场景：O(1) 查找（缓存、去重）、词频/计数、图的邻接表、两数之和类问题。

注意：哈希表无序，需要有序遍历时用 std::map（红黑树）。

时间复杂度汇总

二叉搜索树 BST

核心思路：左子树所有节点 < 根 < 右子树所有节点。中序遍历输出有序序列。

操作	平均	最坏（退化为链表）
插入	O(log n)	O(n)
删除	O(log n)	O(n)
查找	O(log n)	O(n)

退化问题：按有序序列插入时，BST 退化为链表，所有操作变 O(n)。这是 AVL 树和红黑树的动机。

struct TreeNode {
    int val;
    TreeNode* left = nullptr;
    TreeNode* right = nullptr;
    TreeNode(int v) : val(v) {}
};

// BST 插入
TreeNode* insert(TreeNode* root, int val) {
    if (!root) return new TreeNode(val);
    if (val < root->val)
        root->left = insert(root->left, val);
    else if (val > root->val)
        root->right = insert(root->right, val);
    return root;
}

// 中序遍历（输出有序序列）
void inorder(TreeNode* root, std::vector<int>& result) {
    if (!root) return;
    inorder(root->left, result);
    result.push_back(root->val);
    inorder(root->right, result);
}

AVL 树 vs 红黑树

两者都是自平衡 BST，保证树高 O(log n)，但平衡策略不同：

特性	AVL 树	红黑树
平衡条件	左右子树高度差 ≤ 1（严格）	5 条颜色规则（宽松）
查找性能	略优（树更矮）	略差
插入 / 删除旋转次数	多	少（最多 3 次旋转）
适合场景	读多写少	读写均衡

为什么 std::map / std::set 用红黑树：标准库容器需要同时支持高效插入、删除、查找。红黑树的插入/删除旋转次数有严格上界（插入最多 2 次，删除最多 3 次），而 AVL 树在删除时可能需要 O(log n) 次旋转。对于通用容器，红黑树在修改操作上更稳定。

#include <map>
#include <set>

// std::map 底层红黑树，有序，O(log n) 操作
std::map<std::string, int> scores;
scores["Alice"] = 95;
scores["Bob"] = 87;

// 范围查询（利用有序性）
auto it = scores.lower_bound("Alice");  // >= "Alice" 的第一个

// std::set：唯一有序集合
std::set<int> nums = {3, 1, 4, 1, 5, 9};  // 自动去重并排序
// 结果：{1, 3, 4, 5, 9}

堆 Heap

核心思路：完全二叉树，用数组存储（节点 i 的左孩子为 2i+1，右孩子为 2i+2）。最大堆保证父节点 ≥ 子节点，根节点是最大值。

操作	时间复杂度
插入（push）	O(log n)
删除最大值（pop）	O(log n)
查看最大值（peek）	O(1)
建堆（heapify）	O(n)

Heapify O(n) 建堆原理：从最后一个非叶节点向上 sift-down，大部分节点在树的底部高度小，总代价收敛到 O(n)（而非 n 次 O(log n) 插入的 O(n log n)）。

#include <queue>
#include <vector>
#include <functional>  // std::greater

// 最大堆（默认）
std::priority_queue<int> maxHeap;
maxHeap.push(3);
maxHeap.push(1);
maxHeap.push(4);
int top = maxHeap.top();  // 4
maxHeap.pop();

// 最小堆
std::priority_queue<int, std::vector<int>, std::greater<int>> minHeap;

// 自定义比较（按第二元素排序）
using P = std::pair<int, int>;
auto cmp = [](const P& a, const P& b) { return a.second > b.second; };
std::priority_queue<P, std::vector<P>, decltype(cmp)> pq(cmp);

// 手动 heapify（从无序数组 O(n) 建堆）
std::vector<int> arr = {3, 1, 4, 1, 5, 9, 2, 6};
std::make_heap(arr.begin(), arr.end());  // 最大堆，O(n)
std::sort_heap(arr.begin(), arr.end()); // 堆排序，O(n log n)

适用场景：Top-K 问题、Dijkstra 最短路、任务调度（优先级队列）、中位数维护（双堆）。

图 Graph

核心思路：节点（Vertex）+ 边（Edge）。存储方式影响空间和遍历效率：

存储方式	空间	查询边是否存在	遍历邻居
邻接矩阵	O(V²)	O(1)	O(V)
邻接表	O(V+E)	O(degree)	O(degree)

选择原则：稠密图（E 接近 V²）用矩阵；稀疏图（E 远小于 V²，如路网、社交网络）用邻接表。

#include <vector>
#include <queue>
#include <stack>

// 邻接表建图
int V = 6;
std::vector<std::vector<int>> adj(V);
auto addEdge = [&](int u, int v) {
    adj[u].push_back(v);
    adj[v].push_back(u);  // 无向图
};

// BFS：最短路径（无权图）
std::vector<int> bfsShortestPath(int src, int dst) {
    std::vector<int> dist(V, -1);
    std::vector<int> parent(V, -1);
    std::queue<int> q;

    dist[src] = 0;
    q.push(src);

    while (!q.empty()) {
        int u = q.front(); q.pop();
        for (int v : adj[u]) {
            if (dist[v] == -1) {
                dist[v] = dist[u] + 1;
                parent[v] = u;
                q.push(v);
            }
        }
    }

    // 回溯路径
    std::vector<int> path;
    for (int v = dst; v != -1; v = parent[v])
        path.push_back(v);
    std::reverse(path.begin(), path.end());
    return path;
}

// DFS：迭代实现（避免递归栈溢出）
void dfsIterative(int start) {
    std::vector<bool> visited(V, false);
    std::stack<int> st;

    st.push(start);
    while (!st.empty()) {
        int u = st.top(); st.pop();
        if (visited[u]) continue;
        visited[u] = true;
        // 处理 u
        for (int v : adj[u]) {
            if (!visited[v]) st.push(v);
        }
    }
}

适用场景：

BFS：最短路径（无权）、层序遍历、连通分量
DFS：拓扑排序、强连通分量、迷宫/路径搜索
带权图：Dijkstra（非负权）、Bellman-Ford（含负权）

选型速查表

场景	推荐数据结构	原因
频繁随机访问，顺序遍历	`std::vector`	cache 友好，O(1) 随机访问
频繁中间插入/删除，持有迭代器	`std::list`	O(1) 插删（已知位置）
LIFO、表达式解析、DFS	`std::stack`	语义清晰，底层 deque
BFS、任务队列	`std::queue`	FIFO 语义
O(1) 查找，无序	`std::unordered_map`	哈希 O(1) 均摊
有序遍历，范围查询	`std::map` / `std::set`	红黑树，有序 O(log n)
Top-K，优先调度	`std::priority_queue`	堆，O(log n) push/pop
最短路径，连通性	邻接表 + BFS/Dijkstra	稀疏图 O(V+E)
数学矩阵，稠密图	二维 `std::vector`	邻接矩阵 O(1) 查边
区间查询，前缀和	Segment Tree / BIT	O(log n) 区间操作

理解了这些结构的本质，再看算法题或工程代码，会有一种”看穿骨架”的感觉——不是在背答案，是在做选择。