底层视角的返回值优化

# 1. RVO 是什么

Return Value Optimization（返回值优化），通过把返回值构造在相应位置，从而消除返回值的拷贝和移动。

底层原理简单来说，简单类型的返回值，用寄存器传递；复杂类型由调用方（函数调用前）开辟一段内存，把地址传进来，函数只负责把对象写进去。显然复杂类型的实现机制和 RVO 里的“构造在返回值位置”完美对应。

下面就正式开始水文章。

# 2. 简单类型

根据 system v abi x86-64 (opens new window) 的描述，一般来说不超过 16 字节、平凡拷贝移动构造、平凡析构类型可以用寄存器传递（其实细节上不是这样的，abi 给的描述非常复杂），本文将其称为简单类型。

对于简单类型，其实无所谓 RVO，不管是复制还是移动，还是复制消除，最终是没有区别的。

我们判断返回值有没有被 RVO 要自定义复制移动构造，但是一旦这么做就不是简单类型了（违反了平凡可复制类型这一条）。一旦观测就会坍缩为复杂类型，我愿称之为薛定谔的简单类型。

（更新）

其实不对，这个问题 (opens new window)给了一种判断 RVO 的方法，就是构造函数把 this 传播出去。

由于对象通过寄存器传递，this 自然会失效，所以一定发生了复制。而标准也是允许这一行为（允许平凡拷贝移动构造、平凡析构类型的 prvalue 不 RVO）。

# 3. 复杂类型

复杂类型返回值的地址是传进来的，也就是说下面两个函数汇编可能是一样的（都是地址用 rdi 传入，写入 114514，然后这个地址原样用 rax 返回，GCC 15.2 -O3）：

struct A {  // 24 字节，是复杂类型
    int data[6];
};

A f1() {
    return A{114514, 114514, 114514, 114514, 114514, 114514};
}
//    mov     rdx, QWORD PTR .LC1[rip]
//    mov     ecx, 114514
//    mov     rax, rdi
//    movd    xmm0, ecx
//    pshufd  xmm0, xmm0, 0
//    mov     QWORD PTR [rdi+16], rdx
//    movups  XMMWORD PTR [rdi], xmm0
//    ret

A* f2(A* x) {
    *x = A{114514, 114514, 114514, 114514, 114514, 114514};
    return x;
}
//    mov     rdx, QWORD PTR .LC1[rip]
//    mov     ecx, 114514
//    mov     rax, rdi
//    movd    xmm0, ecx
//    pshufd  xmm0, xmm0, 0
//    mov     QWORD PTR [rdi+16], rdx
//    movups  XMMWORD PTR [rdi], xmm0
//    ret

//    .LC1:
//    .long   114514
//    .long   114514

所以判断返回值能不能 RVO 等价于返回值能不能在某个地址上构造，这样就清晰很多。

# 4. 举例

# 4.1. 返回纯右值

如果 return 纯右值，那么标准是保证可以 RVO 的（C++17 后）。

因为返回值出现的时候，函数就返回了，那直接在 rdi 地址上构造谁也拦不住。

#include <print>

struct A {  // 自定义拷贝 移动 析构，是复杂类型
    std::string_view name;
    A(std::string_view name) : name(name) {
        std::println("construct {}", name);
    }
    A(const A &b) : name(b.name) { std::println("copy {}", name); }
    A(A &&b) : name(b.name) { std::println("move {}", name); }
    ~A() { std::println("destruct {}", name); }
};

A foo() { return A{"a"}; }  // RVO

int main() { foo(); }

// construct a
// destruct a

# 4.2. 返回局部变量

标准对局部变量的地址没有要求（例如局部变量地址不需要从大到小），所以可以特定局部变量构造在 rdi 地址上。

#include <print>

struct A {  // 自定义拷贝 移动 析构，是复杂类型
    std::string_view name;
    A(std::string_view name) : name(name) {
        std::println("construct {}", name);
    }
    A(const A &b) : name(b.name) { std::println("copy {}", name); }
    A(A &&b) : name(b.name) { std::println("move {}", name); }
    ~A() { std::println("destruct {}", name); }
};

A foo() {
    A a{"a"};
    A b{"b"};
    A c{"c"};
    return b;  // RVO，我去析构顺序不是构造的逆序
}

int main() { foo(); }

// construct a
// construct b
// construct c
// destruct c
// destruct a
// destruct b

# 4.3. 条件返回局部变量

这是一个例外，因为构造时并不确定哪个局部变量是返回值，所以 RVO 失败了。不过会按移动操作处理。

#include <print>

struct A {  // 自定义拷贝 移动 析构，是复杂类型
    std::string_view name;
    A(std::string_view name) : name(name) {
        std::println("construct {}", name);
    }
    A(const A &b) : name(b.name) { std::println("copy {}", name); }
    A(A &&b) : name(b.name) { std::println("move {}", name); }
    ~A() { std::println("destruct {}", name); }
};

A foo(bool flag) {
    A a{"a"};
    A b{"b"};
    if (flag) {
        return a;  // move
    } else {
        return b;  // move
    }
}

int main() {
    foo(0);
    std::println("***");
    foo(1);
}

// construct a
// construct b
// move b
// destruct b
// destruct a
// destruct b
// ***
// construct a
// construct b
// move a
// destruct b
// destruct a
// destruct a

# 4.4. 返回参数

参数不管是值传递，引用传递，都在函数进入前已经构造好了，和返回值是不同的对象，不能 RVO。

不过没关系，拷贝还是移动编译器已经安排的明明白白了，都很符合直觉。

#include <print>

struct A {  // 自定义拷贝 移动 析构，是复杂类型
    std::string_view name;
    A(std::string_view name) : name(name) {
        std::println("construct {}", name);
    }
    A(const A &b) : name(b.name) { std::println("copy {}", name); }
    A(A &&b) : name(b.name) { std::println("move {}", name); }
    ~A() { std::println("destruct {}", name); }
};

A f1(A a) { return a; }  // move

A f2(const A &a) { return a; }  // copy

A f3(A &&a) { return a; }  // move

A f4(A &a) { return a; }  // copy

int main() {
    f1({"a1"});
    std::println("***");
    f2({"a2"});
    std::println("***");
    f3({"a3"});
    std::println("***");
    A a4{"a4"};
    f4(a4);
}

// construct a1
// move a1
// destruct a1
// destruct a1
// ***
// construct a2
// copy a2
// destruct a2
// destruct a2
// ***
// construct a3
// move a3
// destruct a3
// destruct a3
// ***
// construct a4
// copy a4
// destruct a4
// destruct a4

# 4.5. 返回子对象

这是最容易踩的坑，所有对象的子对象返回时都不能 RVO。

这是因为 rdi 只提供了子对象的内存，没法构造完整对象。

特别要注意这个结构化绑定也是子对象，它只是看起来是独立变量，实则不然。

f1 f2 f3 返回值有 copy，工程上应该用 return std::move(...); 优化成移动。

#include <print>

struct A {  // 自定义拷贝 移动 析构，是复杂类型
    std::string_view name;
    A(std::string_view name) : name(name) {
        std::println("construct {}", name);
    }
    A(const A& b) : name(b.name) { std::println("copy {}", name); }
    A(A&& b) : name(b.name) { std::println("move {}", name); }
    ~A() { std::println("destruct {}", name); }
};

struct B {
    A a;
    int b;
};

B g() { return {{"a"}, 114514}; }  // 这个被 RVO 了

A f1() {
    B b{g()};
    return b.a;  // copy
}

A f2() {
    auto [a, b] = g();
    return a;  // copy
}

A f3() {
    std::optional<A> a{{"a"}};  // 这里构造有一次 move
    return *a;  // copy
}

A f4() {
    return g().a;  // 这不是纯右值，g() 被实质化成将亡值，会走 move
}

int main() {
    f1();
    std::println("***");
    f2();
    std::println("***");
    f3();
    std::println("***");
    f4();
}

// construct a
// copy a
// destruct a
// destruct a
// ***
// construct a
// copy a
// destruct a
// destruct a
// ***
// construct a
// move a
// destruct a
// copy a
// destruct a
// destruct a
// ***
// construct a
// move a
// destruct a
// destruct a

← C++为什么没有虚的成员函数模板