优化你的C++代码：何时使用`set`？

2025-3-4

优化你的C++代码：何时使用`set`？

在C++标准模板库（STL）中，std::set 是一种关联容器，它以一种特定的排序顺序存储唯一的元素。理解 set 的特性以及何时使用它是优化C++代码性能和可读性的关键。本文将深入探讨 set 的内部工作原理、优势、劣势，并通过详细的示例代码来展示其在各种场景下的应用，以及与其他容器（如 vector、unordered_set 和 map）的对比。

1. `std::set` 的基础

std::set 通常实现为红黑树，这是一种自平衡二叉搜索树。这种数据结构确保了以下关键特性：

有序性： set 中的元素始终按照键（key）进行排序。默认情况下，它使用 < 运算符进行比较（升序），但你可以提供自定义的比较函数或函数对象。
唯一性： set 中不允许有重复的元素。当你尝试插入一个已存在的元素时，插入操作将被忽略。
对数时间复杂度： 插入、删除和查找操作的平均时间复杂度为 O(log n)，其中 n 是 set 中元素的数量。这使得 set 在处理大量数据时仍然能保持高效。

2. `std::set` 的基本操作

以下是 std::set 的一些基本操作：

插入： 使用 insert() 方法插入元素。
删除： 使用 erase() 方法删除元素。可以按值或迭代器删除。
查找： 使用 find() 方法查找元素。如果找到，返回指向该元素的迭代器；否则，返回 set::end()。
遍历： 使用迭代器（begin() 和 end()）遍历 set 中的元素。由于 set 是有序的，遍历将按照排序顺序进行。
大小： 使用 size() 方法获取 set 中元素的数量。
清空： 使用 clear() 方法清空 set。

```c++

include

int main() {
std::set mySet;

// 插入
mySet.insert(3);
mySet.insert(1);
mySet.insert(4);
mySet.insert(1); // 重复元素，将被忽略

// 遍历
std::cout << "Elements in set: ";
for (const auto& element : mySet) {
    std::cout << element << " ";
}
std::cout << std::endl; // 输出: Elements in set: 1 3 4

// 查找
auto it = mySet.find(3);
if (it != mySet.end()) {
    std::cout << "Element 3 found." << std::endl;
}

// 删除
mySet.erase(1);

// 大小
std::cout << "Size of set: " << mySet.size() << std::endl; // 输出: Size of set: 2

return 0;

}
```

3. 何时使用 `std::set`？

std::set 在以下场景中特别有用：

需要存储唯一元素： 当你需要一个不允许重复元素的集合时，set 是一个自然的选择。
需要有序集合： 如果你需要一个始终按特定顺序排列的元素集合，set 提供了自动排序的功能。
频繁的插入、删除和查找操作： set 的对数时间复杂度使得这些操作在大型数据集上仍然高效。
需要进行范围查询： 可以使用 lower_bound() 和 upper_bound() 方法高效地查找 set 中特定范围内的元素。
需要集合运算： 可以使用 set_union、set_intersection、set_difference 等算法对 set 进行集合运算（并集、交集、差集等）。

4. `std::set` 的使用场景示例

4.1. 去重

set 最简单的用途之一就是从数据源中去除重复项。

```c++

include

int main() {
std::vector data = {1, 2, 2, 3, 4, 4, 4, 5};
std::set uniqueData(data.begin(), data.end());

std::cout << "Unique elements: ";
for (const auto& element : uniqueData) {
    std::cout << element << " ";
}
std::cout << std::endl; // 输出: Unique elements: 1 2 3 4 5

return 0;

}
```

4.2. 查找存在的元素

set 可以用来快速确定一个元素是否存在于一个集合中。

```c++

include

int main() {
std::set names = {"Alice", "Bob", "Charlie"};

if (names.find("Bob") != names.end()) {
    std::cout << "Bob is in the set." << std::endl;
} else {
    std::cout << "Bob is not in the set." << std::endl;
}

return 0;

}
```

4.3. 维护有序列表

set 自动维护元素的排序，这在需要有序输出或进行范围查询时非常有用。

```c++

include

int main() {
std::set numbers;
numbers.insert(5);
numbers.insert(2);
numbers.insert(8);
numbers.insert(1);

std::cout << "Sorted numbers: ";
for (const auto& number : numbers) {
    std::cout << number << " ";
}
std::cout << std::endl; // 输出: Sorted numbers: 1 2 5 8

// 查找大于等于 3 的第一个元素
auto it = numbers.lower_bound(3);
if (it != numbers.end()) {
    std::cout << "First number greater than or equal to 3: " << *it << std::endl;
     // 输出: First number greater than or equal to 3: 5
}

return 0;

}
```

4.4. 集合运算

set 可以方便地进行集合运算，如并集、交集和差集。

```c++

include

int main() {
std::set set1 = {1, 2, 3, 4, 5};
std::set set2 = {3, 4, 5, 6, 7};

std::set<int> unionSet;
std::set_union(set1.begin(), set1.end(), set2.begin(), set2.end(),
               std::inserter(unionSet, unionSet.begin()));

std::cout << "Union: ";
for (const auto& element : unionSet) {
    std::cout << element << " ";
}
std::cout << std::endl; // 输出: Union: 1 2 3 4 5 6 7

std::set<int> intersectionSet;
std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(),
                    std::inserter(intersectionSet, intersectionSet.begin()));

std::cout << "Intersection: ";
for (const auto& element : intersectionSet) {
    std::cout << element << " ";
}
std::cout << std::endl; // 输出: Intersection: 3 4 5
return 0;

}
```

5. `std::set` 的优势与劣势

优势：

自动排序： 元素始终保持有序，无需手动排序。
唯一性保证： 自动去重，无需额外逻辑。
高效的查找、插入和删除： 对数时间复杂度。
支持集合运算： 方便进行并集、交集、差集等操作。

劣势：

内存开销： 由于红黑树的实现，set 通常比 vector 或 unordered_set 占用更多的内存。
插入和删除可能涉及树的重新平衡： 虽然平均时间复杂度是对数级的，但在最坏情况下（例如，按顺序插入元素），可能需要进行多次树的旋转操作来保持平衡。
不支持随机访问： 不能像 vector 那样通过索引直接访问元素。

6. `std::set` 与其他容器的比较

6.1. `std::set` vs. `std::vector`

| 特性 | std::set | std::vector |
| -------------- | ------------------------ | ------------------------- |
| 排序 | 有序 | 无序（可手动排序） |
| 唯一性 | 保证 | 不保证 |
| 查找 | O(log n) | O(n)（线性查找） |
| 插入/删除 | O(log n) | O(n)（平均，可能需要移动元素） |
| 内存 | 较高 | 较低 |
| 随机访问 | 不支持 | 支持 |

何时使用 std::vector：
- 需要存储重复元素。
- 需要频繁地通过索引访问元素。
- 对内存使用有严格限制。
- 插入和删除操作主要发生在序列的末尾。

6.2. `std::set` vs. `std::unordered_set`

| 特性 | std::set | std::unordered_set |
| -------------- | ----------------- | -------------------- |
| 排序 | 有序 | 无序 |
| 唯一性 | 保证 | 保证 |
| 查找 | O(log n) | O(1)（平均） |
| 插入/删除 | O(log n) | O(1)（平均） |
| 内存 | 较高 | 较高 |
| 迭代器失效 | 插入不会使迭代器失效 | 插入可能使迭代器失效 |

何时使用 std::unordered_set：
- 不需要元素有序。
- 对查找、插入和删除操作的性能要求极高（平均 O(1)）。
- 不关心元素的存储顺序。

6.3. `std::set` vs. `std::map`

| 特性 | std::set | std::map |
| -------- | ------------------------------- | -------------------------------- |
| 存储 | 仅存储键（key） | 存储键值对（key-value pair） |
| 排序 | 按键排序 | 按键排序 |
| 唯一性 | 键唯一 | 键唯一 |
| 查找 | O(log n) | O(log n) |
| 插入/删除 | O(log n) | O(log n) |

何时使用 std::map：
- 需要存储键值对。
- 需要根据键快速查找对应的值。
- 需要按键排序。

7. 更进一步：自定义比较函数

std::set 默认使用 < 运算符进行比较。你可以通过提供自定义的比较函数或函数对象来改变排序规则。

```c++

include

// 自定义比较函数对象（按字符串长度排序）
struct CompareByLength {
bool operator()(const std::string& a, const std::string& b) const {
return a.length() < b.length();
}
};

int main() {
std::set mySet;

mySet.insert("apple");
mySet.insert("banana");
mySet.insert("kiwi");

std::cout << "Elements sorted by length: ";
for (const auto& element : mySet) {
    std::cout << element << " ";
}
std::cout << std::endl; // 输出: Elements sorted by length: kiwi apple banana

return 0;

}
`` 在这个例子中,CompareByLength是一个函数对象, 定义了()运算符, 它告诉set按照string` 的长度进行排序。

8. 对`std::set`性能的进一步思考

尽管std::set在大多数情况下都能提供良好的性能，但在某些特定场景下，你可能需要进一步考虑性能优化：

预估set的大小： 如果你能预估set最终会包含多少个元素，可以使用reserve()方法来预分配足够的内存。这可以减少插入过程中的内存重新分配次数。尽管set没有reserve()方法，但你可以通过计算预估的红黑树节点数量来间接进行内存优化（这比较复杂，通常不建议这样做，除非你有非常明确的性能需求和对红黑树内部结构的深刻理解）。
批量插入： 如果你有大量数据需要一次性插入到set中，使用范围插入构造函数（接受两个迭代器作为参数）或insert()方法的范围插入版本通常比逐个插入更高效。

```c++
std::vector data = { / ... 大量数据 ... / };

// 范围插入
std::set mySet(data.begin(), data.end());
```

避免不必要的拷贝： 如果set中存储的是大型对象，考虑使用移动语义（std::move）来避免不必要的拷贝。

```c++

include

class MyObject {
public:
std::string data;
//其他成员

MyObject(const std::string& d) : data(d) {}

// 移动构造函数
MyObject(MyObject&& other) : data(std::move(other.data)) {}

// 移动赋值运算符
MyObject& operator=(MyObject&& other) {
    if (this != &other) {
        data = std::move(other.data);
    }
    return *this;
}

// 其他方法...

bool operator<(const MyObject& other) const{
return data < other.data;
}
};

int main() {
std::set mySet;

MyObject obj1("LargeString1");
MyObject obj2("LargeString2");

// 使用移动语义插入
mySet.insert(std::move(obj1));
mySet.insert(std::move(obj2));

return 0;

}
```

使用emplace： C++11 引入了 emplace 方法，它允许你直接在 set 中构造元素，而无需创建临时对象。这可以进一步提高效率。

```c++

include

int main(){
std::set my_set;
my_set.emplace("foo");
my_set.emplace("bar");
for (const auto& x : my_set) {
std::cout << x << std::endl;
}
}
`` 在这个例子中,emplace直接在set中构造了string对象，避免了先创建临时string对象再拷贝/移动到set` 中的开销。

9. 替代方案：何时不使用 `std::set`

尽管std::set功能强大，但在某些情况下，其他容器可能是更好的选择：

只需要唯一性，不需要排序： 如果你只需要存储唯一的元素，但不需要它们按特定顺序排列，std::unordered_set通常是更好的选择，因为它的查找、插入和删除操作的平均时间复杂度是O(1)。
需要频繁访问元素，且元素可以重复： 如果你需要存储可以重复的元素，并且需要频繁地通过索引访问它们，std::vector是更好的选择。
需要键值对，但不需要按键排序： 如果你需要存储键值对，但不需要它们按键排序，std::unordered_map是更好的选择。
数据量极小： 对于非常小的数据集（例如，只有几个元素），各种容器之间的性能差异可能微不足道。在这种情况下，选择最能表达你意图的容器即可。

走向卓越：精通 `std::set` 的关键

掌握 std::set 的关键在于理解其底层数据结构（红黑树）、时间复杂度以及与其他容器的比较。通过本文提供的详细示例和分析，你应该能够自信地在各种场景下选择和使用 std::set，从而编写出更高效、更易读的C++代码。记住，没有一种数据结构是万能的，最佳选择取决于你的具体需求。通过不断实践和探索，你将能够更深入地理解 std::set 的细微之处，并在你的项目中充分发挥它的潜力。

作者：admin

链接：https://hostlocvps.com/2025/03/04/%e4%bc%98%e5%8c%96%e4%bd%a0%e7%9a%84c%e4%bb%a3%e7%a0%81%ef%bc%9a%e4%bd%95%e6%97%b6%e4%bd%bf%e7%94%a8set%ef%bc%9f/

文章版权归作者所有，未经允许请勿转载。

THE END

掌握Kotlin协程：简化Android异步编程

<<上一篇

Z-Library国内镜像入口：畅享免费电子书

下一篇>>

优化你的C++代码：何时使用`set`？

优化你的C++代码：何时使用set？

1. std::set 的基础

2. std::set 的基本操作

include

include

3. 何时使用 std::set？

4. std::set 的使用场景示例

4.1. 去重

include

include

include

4.2. 查找存在的元素

include

include

include

4.3. 维护有序列表

include

include

4.4. 集合运算

include

include

include

include

5. std::set 的优势与劣势

6. std::set 与其他容器的比较

6.1. std::set vs. std::vector

6.2. std::set vs. std::unordered_set

6.3. std::set vs. std::map

7. 更进一步：自定义比较函数

include

include

include

8. 对std::set性能的进一步思考

include

include

include

include

include

include

9. 替代方案：何时不使用 std::set

走向卓越：精通 std::set 的关键

优化你的C++代码：何时使用`set`？

1. `std::set` 的基础

2. `std::set` 的基本操作

3. 何时使用 `std::set`？

4. `std::set` 的使用场景示例

5. `std::set` 的优势与劣势

6. `std::set` 与其他容器的比较

6.1. `std::set` vs. `std::vector`

6.2. `std::set` vs. `std::unordered_set`

6.3. `std::set` vs. `std::map`

8. 对`std::set`性能的进一步思考

9. 替代方案：何时不使用 `std::set`

走向卓越：精通 `std::set` 的关键