目录

理想的并行排序算法

一个知乎问题,如果某CPU有无限个核心,那么时间复杂度最小的排序算法是怎么样的? (opens new window)

# 1. 最优结论

不需要无限个,只要 O(n)O(n) 个核心,时间复杂度最小是 O(logn)O(\log n)

这篇论文:链接 (opens new window)

摘要:

The purpose of this paper is to describe a sorting network of size 0(n log n) and depth 0(log n).

A natural way of sorting is through consecutive halvings: determine the upper and lower halves of the set, proceed similarly within the halves, and so on. Unfortunately, while one can halve a set using only 0(n) comparisons, this cannot be done in less than log n (parallel) time, and it is known that a halving network needs (½)n log n comparisons.

It is possible, however, to construct a network of 0(n) comparisons which halves in constant time with high accuracy. This procedure (ε-halving) and a derived procedure (ε-nearsort) are described below, and our sorting network will be centered around these elementary steps.

摘要提到,排序网络 "size 0(n log n) and depth 0(log n)","0(n) comparisons",核心数应该就是比较器数 O(n)O(n) 个,深度也就是时间复杂度 O(logn)O(\log n)

但是具体怎么做,俺就不细看了,就当指个路。

另外时间复杂度一定不小于 O(logn)O(\log n),因为小于这个复杂度我们无法判断这个数组是否有序,而排序肯定比这个任务要强。

(感觉是这样的,不知道怎么严谨证明)

# 2. 时间复杂度不是最优

如果时间复杂度放宽一些,O(log2n)O(\log^2n) 复杂度,这个其他回答提到过,就是双调排序(Bitonic sort),很好理解。

# 3. 核心数不是最优

如果核心数放宽一些,O(n2)O(n^2) 核心数,那么可以这么做:

假设核心排列成一个 n×nn\times n 矩阵,第 i 行第 j 列判断 ai<aja_i<a_j,然后第 i 行的信息 all-reduce(一种通讯算法,复杂度 O(logn)O(\log n) 就行)一下,就能得到 aia_i 的排名了,将 aia_i 写到排名位置即可。

这样总复杂度还是 O(logn)O(\log n)