首页猿问为什么将0.1F改为0会使性能降低...

为什么将0.1F改为0会使性能降低10倍？

C++

慕仙森 2019-06-25 11:12:53

为什么将0.1F改为0会使性能降低10倍？为什么这段代码，const float x[16] = { 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6}; const float z[16] = {1.123, 1.234, 1.345, 156.467, 1.578, 1.689, 1.790, 1.812, 1.923, 2.034, 2.145, 2.256, 2.367, 2.478, 2.589, 2.690};float y[16];for (int i = 0; i < 16; i++){ y[i] = x[i];}for (int j = 0; j < 9000000; j++){ for (int i = 0; i < 16; i++) { y[i] *= x[i]; y[i] /= z[i]; y[i] = y[i] + 0.1f; // <-- y[i] = y[i] - 0.1f; // <-- }}运行速度超过10倍，比下面的位(相同的，但注意到)？const float x[16] = { 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6}; const float z[16] = {1.123, 1.234, 1.345, 156.467, 1.578, 1.689, 1.790, 1.812, 1.923, 2.034, 2.145, 2.256, 2.367, 2.478, 2.589, 2.690};float y[16];for (int i = 0; i < 16; i++){ y[i] = x[i];}for (int j = 0; j < 9000000; j++){ for (int i = 0; i < 16; i++) { y[i] *= x[i]; y[i] /= z[i]; y[i] = y[i] + 0; // <-- y[i] = y[i] - 0; // <-- }}使用VisualStudio 2010 SP1编译时。(我没有用其他编译器进行测试。)

查看完整描述

3 回答

慕斯709654

TA贡献1840条经验获得超5个赞

欢迎来到世界非正态浮点!他们会破坏表演！

Denormal(或亚正常)数字是一种从浮点表示中获得接近于零的额外值的一种方法。对非规范化浮点的操作可以是十到数百倍而不是标准化浮点。这是因为许多处理器无法直接处理它们，必须使用微码捕获和解析它们。

如果您在10，000次迭代后打印出数字，您将看到它们已经收敛到不同的值，具体取决于0或0.1被利用了。

下面是在x64上编译的测试代码：

int main() {

    double start = omp_get_wtime();

    const float x[16]={1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0,2.1,2.2,2.3,2.4,2.5,2.6};
    const float z[16]={1.123,1.234,1.345,156.467,1.578,1.689,1.790,1.812,1.923,2.034,2.145,2.256,2.367,2.478,2.589,2.690};
    float y[16];
    for(int i=0;i<16;i++)
    {
        y[i]=x[i];
    }
    for(int j=0;j<9000000;j++)
    {
        for(int i=0;i<16;i++)
        {
            y[i]*=x[i];
            y[i]/=z[i];#ifdef FLOATING
            y[i]=y[i]+0.1f;
            y[i]=y[i]-0.1f;#else
            y[i]=y[i]+0;
            y[i]=y[i]-0;#endif

            if (j > 10000)
                cout << y[i] << "  ";
        }
        if (j > 10000)
            cout << endl;
    }

    double end = omp_get_wtime();
    cout << end - start << endl;

    system("pause");
    return 0;}

产出：

#define FLOATING1.78814e-007  1.3411e-007  1.04308e-007  0  7.45058e-008  6.70552e-008  6.70552e-008  5.58794e-007  3.05474e-007  
2.16067e-007  1.71363e-007  1.49012e-007  1.2666e-007  1.11759e-007  1.04308e-007  1.04308e-0071.78814e-007  1.3411e-007  
1.04308e-007  0  7.45058e-008  6.70552e-008  6.70552e-008  5.58794e-007  3.05474e-007  2.16067e-007  1.71363e-007  1.49012e-007 
 1.2666e-007  1.11759e-007  1.04308e-007  1.04308e-007//#define FLOATING6.30584e-044  3.92364e-044  3.08286e-044  0  1.82169e-044 
  1.54143e-044  2.10195e-044  2.46842e-029  7.56701e-044  4.06377e-044  3.92364e-044  3.22299e-044  3.08286e-044  2.66247e-044 
   2.66247e-044  2.24208e-0446.30584e-044  3.92364e-044  3.08286e-044  0  1.82169e-044  1.54143e-044  2.10195e-044  2.45208e-029
     7.56701e-044  4.06377e-044  3.92364e-044  3.22299e-044  3.08286e-044  2.66247e-044  2.66247e-044  2.24208e-044

注意，在第二次运行中，数字非常接近于零。

非正态化的数字通常很少见，因此大多数处理器无法有效地处理它们。

来证明这与非正态数有关，如果我们平移数为零通过将其添加到代码的开头：

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

然后版本与0不再慢10倍，实际上变得更快。(这要求在启用SSE的情况下编译代码。)

这意味着，与其使用这些奇怪的、精度较低的几乎为零的值，我们只是将其舍入为零。

计时：核心i7 920@3.5 GHz：

//  Don't flush denormals to zero.0.1f: 0.5640670   : 26.7669//  Flush denormals to zero.0.1f: 0.5871170   : 0.341406

最后，这与它是整数还是浮点无关。这个0或0.1f被转换/存储在两个循环之外的寄存器中。所以这对性能没有影响。

反对回复 2019-06-25

隔江千里

TA贡献1906条经验获得超10个赞

这是由于非规范化浮点的使用。如何消除它和表现的惩罚？在互联网上搜索了杀死非正常数字的方法之后，似乎还没有“最佳”的方法来做到这一点。我发现这三种方法在不同的环境中最有效：

可能在GCC的环境下不起作用：

// Requires #include <fenv.h>fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV);

可能无法在某些VisualStudio环境中工作：1

// Requires #include <xmmintrin.h>_mm_setcsr( _mm_getcsr() | (1<<15) | (1<<6) );
// Does both FTZ and DAZ bits. You can also use just hex value 0x8040 to do both.
// You might also want to use the underflow mask (1<<11)

在GCC和VisualStudio中都可以使用：

// Requires #include <xmmintrin.h>// Requires #include <pmmintrin.h>_MM_SET_FLUSH_ZERO_MODE
(_MM_FLUSH_ZERO_ON);_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);

Intel编译器可以在现代的Intel CPU上默认禁用非正常值。这里有更多的细节
编译器开关-ffast-math, -msse或-mfpmath=sse将禁用取消，并使其他一些事情更快，但不幸的是，也做了许多其他的近似，可能破坏您的代码。仔细测试！与VisualStudio编译器的快速数学等价的是/fp:fast但我还没能证实这是否也是禁用的。1

反对回复 2019-06-25

3 回答
0 关注
551 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

为什么将0.1F改为0会使性能降低10倍？

为什么将0.1F改为0会使性能降低10倍？

3 回答

添加回答