Cython 的高性能阵列操作 |设置 2

先决条件：使用 Cython 进行高性能阵列操作 |设置 1
第一部分中生成的代码运行速度很快。在本文中，我们将比较代码的性能与 NumPy 库中的clip()函数。
令人惊讶的是，与用 C 编写的 NumPy 相比，我们的程序运行得更快。
代码#1：比较性能。

Python3

a = timeit('numpy.clip(arr2, -5, 5, arr3)',
       'from __main__ import b, c, numpy', number = 1000)
 
print ("\nTime for NumPy clip program : ", a)
 
b = timeit('sample.clip(arr2, -5, 5, arr3)',
           'from __main__ import b, c, sample', number = 1000)
 
print ("\nTime for our program : ", b)

Python3

# decorators
@cython.boundscheck(False)
@cython.wraparound(False)
 
cpdef clip(double[:] a, double min, double max, double[:] out):
     
    if min > max:
        raise ValueError("min must be <= max")
     
    if a.shape[0] != out.shape[0]:
        raise ValueError
        ("input and output arrays must be the same size")
     
    for i in range(a.shape[0]):
        out[i] = (a[i]
        if a[i] < max else max)
        if a[i] > min else min

输出：

Time for NumPy clip program : 8.093049556000551

Time for our program :, 3.760528204000366

那么文章中的代码需要Cython 类型的 memoryviews来简化对数组进行操作的代码。声明cpdef clip() 将 clip()声明为 C 级和 Python 级函数。这意味着函数调用由其他 Cython 函数更有效地调用（例如，如果您想从不同的 Cython函数调用 clip() ）。
代码中使用了两个装饰器—— @cython.boundscheck(False)和@cython.wraparound(False) 。这是少数可选的性能优化。
@cython.boundscheck(False) ：消除所有数组边界检查，如果索引不会超出范围，则使用。
@cython.wraparound(False) ：消除了将负数组索引处理为环绕到数组末尾的处理（就像Python列表一样）。包含这些装饰器可以使代码运行得更快（在这个例子中测试时几乎快 2.5 倍）。
代码 #2：使用条件表达式的 clip()函数的变体

Python3

# decorators
@cython.boundscheck(False)
@cython.wraparound(False)
 
cpdef clip(double[:] a, double min, double max, double[:] out):
     
    if min > max:
        raise ValueError("min must be <= max")
     
    if a.shape[0] != out.shape[0]:
        raise ValueError
        ("input and output arrays must be the same size")
     
    for i in range(a.shape[0]):
        out[i] = (a[i]
        if a[i] < max else max)
        if a[i] > min else min

经过测试，这个版本的代码运行速度提高了 50% 以上。但是这段代码如何与手写的 C 版本叠加。经过实验，可以测试到手工制作的 C 扩展运行速度比 Cython 创建的版本慢 10% 以上。