第9章性能优化技术*

9.1 Python程序性能分析

程序性能分析的作用
确定程序运行效率低的原因
寻找代码的运行瓶颈
对症下药选用最恰当的性能优化技术

9.1.1 `time`与`timeit`

`time`

使用time模块获取程序的运行时间常用如下几种方式：

time.time：获取系统当前时间的时间戳；
time.perf_counter：获取当前程序的高精度的CPU级运行时间；
time.process_time：获取当前程序的有效进程时间

import time

def fun():
    sum_value = 0
    for i in range(1000000):
        sum_value += i

start = time.perf_counter()
fun()
time.sleep(1)
end = time.perf_counter()
print('perf_counter:', end - start)

1	`perf_counter: 1.0736476539996147`

start = time.process_time()
fun()
time.sleep(1)
end = time.process_time()
print('process_time:', end - start)  # process_time中不包含time.sleep(1)造成的程序休眠时间

1	`process_time: 0.07273000000000018`

`timeit`

timeit常用于测试较小的代码片段的执行时间
运行方式
命令行方式
Python接口
命令行方式

$ python -m timeit '"-".join(str(n) for n in range(100))'
10000 loops, best of 5: 23.5 usec per loop

命令行常用的参数如下表所示。

参数	缩写	功能
`--number`	`-n`	代码片段执行次数
`--repeat`	`-r`	重复次数
`--setup`	`-s`	准备运行环境，如`import`语句等
`--process`	`-p`	测量进程时间
`--unit`	`-u`	时间单位，可以取值`nsec`、`usec`、`msec`或`sec`

$ python -m timeit -n 2000 -r 10 '"-".join(str(n) for n in range(100))'
2000 loops, best of 10: 23.5 usec per loop

Python接口
timeit.timit函数
timeit.repeat函数

import timeit
timeit.timeit('"-".join(str(n) for n in range(100))', number=10000)

1	`0.2476290219992734`

timeit.repeat('"-".join(str(n) for n in range(100))', number=10000, repeat=5)

[0.2479564710001796,
2286210049996953,
2277683940001225,
24715779699999985,
23120317899974907]

在脚本代码中测试函数执行时间

import timeit

def bubble_sort(lst):
    """冒泡排序"""
    for i in range(len(lst)-1, 0, -1):
        for j in range(0, i):
            if lst[j] > lst[j + 1]:
                lst[j], lst[j+1] = lst[j+1], lst[j]
    return lst

t1 = timeit.timeit('bubble_sort(lst)',
                  setup='''from __main__ import bubble_sort; import random; lst = list(range(1000)); random.shuffle(lst)''',
                  number=10)

t2 = timeit.repeat('bubble_sort(lst)',
                  setup='''from __main__ import bubble_sort; import random; lst = list(range(1000)); random.shuffle(lst)''',
                  number=10,
                  repeat=3)
print(t1)
print(t2)

0.5808773019998625
[0.5691154360001747, 0.5654969030001666, 0.5676186800001233]

9.1.2 `profile`

profile是Python内置一组用于收集和分析Python 程序执行过程的工具
能够程序运行过程中每个函数调用时间的详细统计数据
profile的两种实现
profile
- Python内置标准库中的模块
- 会显著增加程序的运行开销，适用于需要对分析功能进行扩展的场景
cProfile
- profile的C扩展插件
- 自身运行开销较小，适合于分析长时间运行的程序

`cProfile`

test_profile.py

def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)

def fib_list(n):
    seq = []
    if n > 0:
        seq.extend(fib_list(n - 1))
    seq.append(fib(n))
    return seq

if __name__ == '__main__':
    fib_list(30)

$ python -m cProfile test_profile.py

         7049218 function calls (96 primitive calls) in 2.015 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.015    2.015 test_profile.py:1(<module>)
7049123/31    2.014    0.000    2.014    0.065 test_profile.py:1(fib)
     31/1    0.000    0.000    2.015    2.015 test_profile.py:10(fib_list)
        1    0.000    0.000    2.015    2.015 {built-in method builtins.exec}
       31    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       30    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}

共检测了7049218个函数调用，其中96个是原始调用（非递归调用）。函数fib的递归调用次数为7049123，原始调用次数为31，耗费总时间是2.014
分析结果中各列数值的含义如下表所示。

列名	含义
ncalls	调用次数（若存在递归调用则分别显示递归调用次数和原始调用次数）
tottime	函数调用总时间（不包括调用子函数的时间）
percall	平均调用时间（ tottime 除以 ncalls ）
cumtime	函数及其所有子函数消耗的累积时间（对于递归函数来说是准确的）
percall	函数运行一次的平均时间（cumtime 除以原始调用次数）
filename:lineno(function)	函数所在的文件、行数及函数名

def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)

def fib_list(n):
    seq = []
    if n > 0:
        seq.extend(fib_list(n - 1))
    seq.append(fib(n))
    return seq

import cProfile
cProfile.run('fib_list(30)')

         7049218 function calls (96 primitive calls) in 1.998 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
7049123/31    1.997    0.000    1.997    0.064 <ipython-input-6-3823c2297ba6>:1(fib)
     31/1    0.000    0.000    1.998    1.998 <ipython-input-6-3823c2297ba6>:9(fib_list)
        1    0.000    0.000    1.998    1.998 <string>:1(<module>)
        1    0.000    0.000    1.998    1.998 {built-in method builtins.exec}
       31    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       30    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}

import cProfile
p = cProfile.Profile()
p.enable()
fib_list(30)
p.disable()
p.print_stats(sort='tottime')

         7049260 function calls (138 primitive calls) in 2.040 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
7049123/31    2.040    0.000    2.040    0.066 <ipython-input-6-3823c2297ba6>:1(fib)
     31/1    0.000    0.000    2.040    2.040 <ipython-input-6-3823c2297ba6>:9(fib_list)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.compile}
        2    0.000    0.000    2.040    1.020 interactiveshell.py:3293(run_code)
       30    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
       31    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        2    0.000    0.000    0.000    0.000 codeop.py:135(__call__)
        2    0.000    0.000    0.000    0.000 contextlib.py:82(__init__)
        4    0.000    0.000    0.000    0.000 compilerop.py:138(extra_flags)
        1    0.000    0.000    0.000    0.000 <ipython-input-7-9649c2a4d5ce>:5(<module>)
        2    0.000    0.000    0.000    0.000 hooks.py:103(__call__)
        2    0.000    0.000    0.000    0.000 contextlib.py:117(__exit__)
        2    0.000    0.000    0.000    0.000 contextlib.py:108(__enter__)
        4    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        2    0.000    0.000    2.040    1.020 {built-in method builtins.exec}
        2    0.000    0.000    0.000    0.000 traitlets.py:545(__get__)
        2    0.000    0.000    0.000    0.000 contextlib.py:238(helper)
        2    0.000    0.000    0.000    0.000 traitlets.py:526(get)
        1    0.000    0.000    2.040    2.040 <ipython-input-7-9649c2a4d5ce>:4(<module>)
        2    0.000    0.000    0.000    0.000 interactiveshell.py:3231(compare)
        4    0.000    0.000    0.000    0.000 {built-in method builtins.next}
        2    0.000    0.000    0.000    0.000 ipstruct.py:125(__getattr__)
        2    0.000    0.000    0.000    0.000 interactiveshell.py:1276(user_global_ns)
        2    0.000    0.000    0.000    0.000 hooks.py:168(pre_run_code_hook)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

`pstats`

cProfile或profile的分析结果可以保存为二进制文件，pstats用于对文件中的分析结果进行进一步的统计分析
将分析结果保存为文件有两种方式：
在命令行中：python -m cProfile -o result.out test_profile.py)
在代码中：cProfile.run(‘fib_list(30)'), filename=”result.out”)
pstats模块中常用的函数有：
strip_dirs：去掉无关的路径信息
sort_stats：对分析结果进行排序
print_stats：输出分析结果，可以指定输出行数
pstats模块中常用的函数有：
strip_dirs：去掉无关的路径信息
sort_stats：对分析结果进行排序
print_stats：输出分析结果，可以指定输出行数

import pstats
p = pstats.Stats('result.out')
p.strip_dirs().sort_stats('name').print_stats()
print('-'*100)
p.sort_stats('cumulative').print_stats()
print('-'*100)
p.sort_stats('time').print_stats()

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

<ipython-input-8-85ff513ed0d6> in <module>
      1 import pstats
----> 2 p = pstats.Stats('result.out')
      3 p.strip_dirs().sort_stats('name').print_stats()
      4 print('-'*100)
      5 p.sort_stats('cumulative').print_stats()


~/anaconda3/envs/lesson/lib/python3.8/pstats.py in __init__(self, stream, *args)
     94             arg = args[0]
     95             args = args[1:]
---> 96         self.init(arg)
     97         self.add(*args)
     98


~/anaconda3/envs/lesson/lib/python3.8/pstats.py in init(self, arg)
    108         self.stats = {}
    109         self.sort_arg_dict = {}
--> 110         self.load_stats(arg)
    111         try:
    112             self.get_top_level_stats()


~/anaconda3/envs/lesson/lib/python3.8/pstats.py in load_stats(self, arg)
    121             return
    122         elif isinstance(arg, str):
--> 123             with open(arg, 'rb') as f:
    124                 self.stats = marshal.load(f)
    125             try:


FileNotFoundError: [Errno 2] No such file or directory: 'result.out'

9.2 即时编译技术

9.2.1 即时编译的概念

计算机程序常见运行方式
编译执行
- 优点：编译一次可多次执行，运行速度快
- 缺点：难以支持许多动态特性
解释执行
- 优点：针对每条语句进行解释，容易实现语言的动态特性
- 缺点：每次执行语句都需要重新编译，运行速度慢
即时（Just In Time, JIT）编译
兼具编译执行和解释执行特点
需重复执行的代码被编译、优化并缓存供后续使用
即时编译的过程

基于JIT的Python优化方案
PyPy
Numba
Jyson
pyston

9.2.2 PyPy

PyPy是使用RPython实现的Python解释器
RPython是CPython的子集
安装
https://www.pypy.org/download.html
Windows: https://bitbucket.org/pypy/pypy/downloads/pypy3.6-v7.3.1-win32.zip
MacOS: brew install pypy3
特点
PyPy能够对CPython的部分版本提供较好的支持
对第三方工具包的支持比较差
- 有独立的包管理器pip_pypy

# pypy_test.py
import timeit
def bubble_sort(lst):
    """冒泡排序"""
    for i in range(len(lst)-1, 0, -1):
        for j in range(0, i):
            if lst[j] > lst[j + 1]:
                lst[j], lst[j+1] = lst[j+1], lst[j]
    return lst

t = timeit.timeit('bubble_sort(lst)',
                  setup='''from __main__ import bubble_sort; \
                  import random; lst = list(range(1000)); random.shuffle(lst)''',
                  number=10)
print(t)

1	`0.5926239299997178`

$ python pypy_test.py 
0.571357171
$
$ pypy3 pypy_test.py 
0.022235398006159812

9.2.3 Numba

Numba
是一种以Python第三方工具包的形式实现的即时编译器
适用于包含了Numpy数组、函数和循环的代码
pip install numba
使用
利用装饰器修饰函数或类
- 在程序执行的过程中相应的代码片段不再使用Python解释器执行，而是使用Numba的即时编译器执行
适用场景
- 对于数学运算类型的任务、包含了较多循环语句的代码

基本用法

import timeit
from numba import jit

def factorial(n):
    '''阶乘'''
    fac = 1
    for i in range(1, n+1):
        fac = fac*i
    return fac

@jit
def factorial_jit(n):
    fac = 1
    for i in range(1, n+1):
        fac = fac*i
    return fac

timeit.timeit('factorial(10000)', setup='from __main__ import factorial', number=100)

1	`2.0420013419998213`

timeit.timeit('factorial_jit(10000)', setup='from __main__ import factorial_jit', number=100)

1	`7.113244510999721`

Eager编译

在jit装饰器中指定被修饰函数的签名
即时编译器不用再推测参数的数据类型，因而在脚本被导入或运行的时候就可以对函数进行编译

from numba import jit, int32

@jit(int32(int32, int32))
def f(x, y):
    return x + y

函数的实参类型与签名不一致时可能导致意外的运算结果

f(2**31, 2**31 + 1)

numba中常用的数据类型

类型	含义
`void`	无返回值（或返回`None`）的函数的返回类型
`intc`, `uintc`	相当于 C语言中的 `int`和`unsigned int`
`int8`, `uint8`, `int16`, `uint16`, `int32`, `uint32`, `int64`, `uint64`	相应宽度的有符号或无符号整数
`float32`, `float64`	单精度和双精度浮点数
`complex64`, `complex128`	单精度和双精度复数
`int32[:]`, `float[:32]`	数组，可以是其他任意类型的数组

nopython 模式

Numba即时编译器有两种编译模式
nopython 模式
- 编译器对函数进行编译，在运行过程不需要Python解释器的参与
对象模式
- 编译能够识别的部分，其余的部分还交给Python解释器执行
默认情况下
当nopython模式编译失败时，numba可以切换至对象模式
在jit装饰器中指定nopython=True
可强制使用nopython模式，如果编译失败不会切换至对象模式，而是会抛出错误
@jit(nopython=True)相当于@njit装饰器

@jit(nopython=True)
def f(x, y):
    return x + y

缓存编译结果

jit装饰器中指定参数cache=True
numba会将函数编译的结果保存至文件缓存，再次执行时不必重新编译

@jit(cache=True)
def f(x, y):
    return x + y

9.3 混合编程概念及环境搭建

混合编程

混合编程
将计算密集型的任务利用其他语言实现然后交由 Python 调用，从而实现性能的提升
Python 与常见的计算机编程语言如 Java、C#，甚至 R 和 Matlab 等都能够实现混合编程
混合编程方法
在 Python 中调用 C/C++ 的动态库文件
- 利用 Python 标准库中的ctypes实现
利用 C/C++ 编写 Python 的扩展，使 C/C++ 库能够像普通 Python 模块一样使用
- 可以使用 Python 内置的 C 语言 API 实现
- 也可以使用第三方工具如 Cython、Boost.Python、SWIG、pybind11 等来实现

环境搭建

Windows 系统（https://wiki.python.org/moin/WindowsCompilers）
MingW-w64工具链
- 新建 Anaconda 虚拟环境并激活
- 安装MingW-w64工具链：conda install libpython m2w64-toolchain -c msys2
- 配置默认编译器：
- 在虚拟环境所在路径中找到Lib\distutils文件夹，在其中创建 distutils.cfg 文件并写入如下内容
  1 2 3 4
  [build] compiler=mingw32 [build_ext] compiler=mingw32
Linux
Debian/Ubuntu：apt-get install build-essential
Redhat/CentOS：yum groupinstall "development tools"
macOS
安装Xcode命令行工具即可：xcode-select --install

9.4 利用 ctypes 实现混合编程

ctypes
Python 标准库中用于调用 C 动态链接库函数的功能模块
实现混合编程的一种基础的方法，适用于不太复杂的混合编程应用场景

9.4.1 C 函数库的调用

一般 C 函数的调用

将如下代码保存为文件add.c

// 文件add.c
double add(double x, double y) {
    return x + y;
}

将add.c编译为动态链接库文件
gcc -o libadd.so -shared -fPIC add.c
- - o选项用于指定输出的动态链接库文件名
- - shared选项用于指定将源代码编译为动态链接库
- - fPIC选项用于指定将动态链接库编译为位置无关的代码
在相同路径中的Python代码中调用

import ctypes
lib = ctypes.cdll.LoadLibrary('./libadd.so')
add = lib.add
add.argtypes = (ctypes.c_double, ctypes.c_double)  # 参数的数据类型
add.restype = ctypes.c_double                      # 返回值的数据类型
print(add(1.0, 2))

ctypes 数据类型

ctypes 类型	C 类型	Python 类型
`c_bool`	`_Bool`	`bool`
`c_char`	`char`	长为1的`bytes`
`c_wchar`	`wchar_t`	长为1的`str`
`c_byte`	`char`	`int`
`c_short`	`short`	`int`
`c_int`	`int`	`int`
`c_long`	`long`	`int`
`c_longlong`	`long long`	`int`
`c_float`	`float`	`float`
`c_double`	`double`	`float`
`c_longdouble`	`long double`	`float`
`c_char_p`	`char *`	`bytes` 或 `None`
`c_wchar_p`	`wchar_t *`	`str` 或 `None`
`c_void_p`	`void *`	`int` 或 `None`

指针作为参数

C语言文件

// divide.c
int divide(int a, int b, int *remainder) {
    int quot = a / b;
    *remainder = a % b;
    return quot;
}

编译

1	`gcc -o libdivide.so -shared -fPIC divide.c`

调用

import ctypes
_lib = ctypes.cdll.LoadLibrary('libdivide.so')
_divide = _lib.divide
_divide.argtypes = (ctypes.c_int, ctypes.c_int, ctypes.POINTER(ctypes.c_int))
_divide.restype = ctypes.c_int

def divide(x, y):
    rem = ctypes.c_int()
    quot = _divide(x, y, rem)
    return quot, rem.value

>>> divide(42, 8)
(5, 2)

数组作为参数

C语言文件

// avg.c
double avg(double *a, int n) {
    int i;
    double total = 0.0;
    for (i = 0; i < n; i++) {
        total += a[i];
    }
    return total / n;
}

编译

1	`gcc -o libavg.so -shared -fPIC avg.c`

调用

import ctypes
_lib = ctypes.cdll.LoadLibrary('libavg.so')

class DoubleArray:
    def from_param(self, param):
        return ((ctypes.c_double)*len(param))(*param)

array = DoubleArray()
_avg = _lib.avg
_avg.argtypes = (array, ctypes.c_int)
_avg.restype = ctypes.c_double

def avg(values):
    return _avg(values, len(values))

>>> avg([1, 2, 3, 4, 5, 6])
3.5

结构体作为参数

C语言文件

// dist.c

#include <math.h>

typedef struct Point {
    double x,y;
} Point;

double distance(Point *p1, Point *p2) {
    return hypot(p1->x - p2->x, p1->y - p2->y);
}

编译

1	`gcc -o libdist.so -shared -fPIC dist.c`

调用

import ctypes
_lib = ctypes.cdll.LoadLibrary('libdist.so')

class Point(ctypes.Structure):
    _fields_ = [('x', ctypes.c_double),
                ('y', ctypes.c_double)]

distance = _lib.distance
distance.argtypes = (ctypes.POINTER(Point), ctypes.POINTER(Point))
distance.restype = ctypes.c_double

>>> p1 = Point(1, 2)
>>> p2 = Point(4, 5)
>>> distance(p1, p2)
4.242640687119285

9.4.2 C++ 类的包装

ctypes不支持将C++代码，因此需要将C++类转换为C函数来调用
如果要以类的方式使用，需要手动将动态链接库中暴露出的C函数映射为Python的类
C++文件

// rectangle.cpp

class Rectangle
{
public:
    Rectangle(float, float);
    double area();
    double perimeter();

private:
    float width_;
    float height_;
};

Rectangle::Rectangle(float width, float height)
{
    width_ = width;
    height_ = height;
}

double Rectangle::area()
{
    return width_ * height_;
}

double Rectangle::perimeter()
{
    return 2 * width_ + 2 * height_;
}

// 以C的方式编译如下函数
extern "C"
{
    Rectangle *Rectangle_new(double width, double height)
    {
        return new Rectangle(width, height);
    }
    double area(Rectangle *rect)
    {
        return rect->area();
    }
    double perimeter(Rectangle *rect)
    {
        return rect->perimeter();
    }
}

编译

1	`g++ -o librectangle.so -shared -fPIC rectangle.cpp`

映射Python类

import ctypes

_lib = ctypes.cdll.LoadLibrary('librectangle.so')

class Rectangle:
    def __init__(self, width, height):
        self._methods = dict()
        self._methods['area'] = _lib.area
        self._methods['perimeter'] = _lib.perimeter

        _lib.Rectangle_new.argtypes = (ctypes.c_double, ctypes.c_double)
        _lib.Rectangle_new.restype = ctypes.c_void_p

        _lib.area.argtypes = (ctypes.c_void_p,)
        _lib.area.restype = ctypes.c_double

        _lib.perimeter.argtypes = (ctypes.c_void_p,)
        _lib.perimeter.restype = ctypes.c_double

        self.obj = _lib.Rectangle_new(width, height)
        self._m_name = None

    def __getattr__(self, attr):
        self._m_name = attr
        return self.__call_method

    def __call_method(self, *args):
        return self._methods[self._m_name](self.obj, *args)

调用

>>> rect = Rectangle(3, 5)
>>> rect.area()
15.0
>>> rect.perimeter()
16.0

9.5 利用 C API 构建 Python 扩展

9.5.1 构建 Python 扩展的步骤

核心
将Python数据类型传入C环境，在C环境中调用相应的C功能函数，并将结果转换为Python类型返回Python环境
将C函数以Python能够识别的形式暴露出来
步骤
编写并编译C函数库
- gcc
编写扩展函数
- static PyObject *extend_fun(PyObject *self, PyObject *args);
编写模块配置信息
- static struct PyModuleDef module_name = {...}
- 模块函数列表
  - static PyMethodDef methods[] = {{},...}
编写模块初始化函数
- PyMODINIT_FUNC PyInit_testlib(void){return PyModule_Create(&module_name);}
构建（安装）扩展
- setup.py
- python setup.py build_ext --inplace -f

9.5.2 扩展函数

扩展函数的定义

参数与返回值
扩展函数就是一个接受一个Python对象元组作为参数，并返回一个新Python对象的C函数
- static PyObject *extend_fun(PyObject *self, PyObject *args);
PyObject表示任何Python对象的C数据类型

参数解析

PyArg_ParseTuple函数用于将Python中的值转换成C中对应表示类型，其参数包括格式字符串和用于存储传入数据的C变量的地址
格式字符串的作用是对传入的Python类型进行描述，比如i代表整数，d代表双精度浮点数

格式字符	Python类型	C类型	含义
`s`	`str`	`const char *`	将一个 Unicode 对象转换成一个指向字符串的 C 指针
`s*`	`str`或`bytes`	`Py_buffer`	既接受 Unicode 对象也接受类字节类型对象
`s#`	`str`或只读`bytes`	`const char *, int 或 Py_ssize_t`	结果存储在两个 C 变量中，第一个是指向 C 字符串的指针，第二个是它的长度
`z`	`str`或`None`	`const char *`	ython 对象为 `None`时C 指针设置为 `NULL`
`U`	`str`	`PyObject *`	Python 对象是一个 Unicode 对象，且不进行转换
`S`	`bytes`	`PyBytesObject *`	要求 Python 对象是一个 bytes 类型对象，且不进行转换
`b`	`int`	`unsigned char`	将一个非负的 Python 整型转化成`unsigned char`
`h`	`int`	`short int`	将一个 Python 整型转化成一个 C `short int` 短整型
`i`	`int`	`int`	将一个 Python 整型转化成一个 C `int` 整型
`l`	`int`	`long int`	将一个 Python 整型转化成一个 C `long int` 长整型
`c`	`bytes`或`bytearray`	`char`	将一个__长度为 1__ 的 Python bytes 或者 bytearray 转化成 C `char` 字符类型
`C`	`str`	`char`	将一个__长度为 1 __的Python `str` 转化成一个 C`int` 整型类型
`f`	`float`	`float`	将一个 Python 浮点数转化成一个 C `float` 浮点数
`d`	`float`	`double`	将一个Python浮点数转化成一个C `double` 双精度浮点数
`D`	`complex`	`Py_complex`	将一个 Python 复数类型转化成一个 C Py_complex Python 复数类型
`O`	`object`	`PyObject *`	将 Python 对象（不进行转换）存储在 C 对象指针中
`O!`	`object`	`typeobject, PyObject *`	和 `O` 类似，但是需要两个 C 参数：第一个是 Python 类型对象的地址，第二个是存储对象指针的 C 变量的地址
`O&`	`object`	`converter, anything`	通过一个 `converter`函数将一个 Python 对象转换 C 变量。需要两个参数：第一个是函数，第二个是 C 变量的地址(任意类型的)，转化为 `void ` 类型
`p`	`bool`	`int`	结果Python布尔类型转化为 C 整型值1或0。

// (1) Python参数列表包含两个整数
int x, y;
PyArg_ParseTuple(args, "ii", &x, &y);

// (2) Python参数列表包含三个参数，分别为字符串、整数和浮点数
const char* x;
int y;
double z;
PyArg_ParseTuple(args, "sid", &x, &y, &z)

// (2) Python参数为列表
PyObject *seq;
PyArg_ParseTuple(args, "O", &seq);
seq = PySequence_List(seq);
int seqlen = PySequence_Length(seq);

构造返回值

Py_BuildValue函数根据C数据类型创建Python对象，它也接受一个格式字符串来指定期望类型，其过程与PyArg_ParseTuple相反

格式	对应Python值
`Py_BuildValue("")`	`None`
`Py_BuildValue("i", 123)`	`123`
`Py_BuildValue("iii", 123, 456, 789)`	`(123, 456, 789)`
`Py_BuildValue("s", "hello")`	`'hello'`
`Py_BuildValue("y", "hello")`	`b'hello'`
`Py_BuildValue("ss", "hello", "world")`	`('hello', 'world')`
`Py_BuildValue("s#", "hello", 4)`	`'hell'`
`Py_BuildValue("y#", "hello", 4)`	`b'hell'`
`Py_BuildValue("()")`	`()`
`Py_BuildValue("(i)", 123)`	`(123,)`
`Py_BuildValue("(ii)", 123, 456)`	`(123, 456)`
`Py_BuildValue("(i,i)", 123, 456)`	`(123, 456)`
`Py_BuildValue("[i,i]", 123, 456)`	`[123, 456]`
`Py_BuildValue("{s:i,s:i}", "abc", 123, "def", 456)`	`{'abc': 123, 'def': 456}`
`Py_BuildValue("((ii)(ii)) (ii))", 1, 2, 3, 4, 5, 6)`	`(((1, 2), (3, 4)), (5, 6))`

9.5.3 模块配置与初始化

模块的配置信息是一个PyModuleDef类型的特殊结构体，重要的成员有
m_base，用于初始化模块的公共信息，取值通常为PyModuleDef_HEAD_INIT
m_name，模块名称;
m_doc，模块的文档字符串;
m_size，解释器状态大小，-1 表示用全局变量保存状态;
m_methods，模块函数列表，为一个PyMethodsDef类型的结构体数组
- ml_name，Python 可见的扩展模块中的函数名;
- ml_meth，扩展函数;
- ml_flags，调用扩展函数时的参数传递方式，取值常为METH_VARARGS，表示扩展函数传入self和args两个参数(参见第9.5.2小节第1部分);
- ml_doc，函数的文档字符串。
定义模块初始化函数
Python 中使用import语句导入模块时被调用
必须命名为PyInit_XXX，其中XXX为模块名

9.5.4 扩展的构建与安装

编写setup.py文件
在命令行中使用python setup.py install命令编译扩展模块并安装至 Python 环境之中
也可以使用python setup.py build_ext --inplace仅生成可被 Python 直接调用的动态链接库文件

9.5.5 示例

C函数库

// testlib.c
#include <math.h>

// 最大公约数
int gcd(int x, int y) {
    int g = y;
    while (x > 0) {
        g = x;
        x = y % x;
        y = g;
    }
    return g;
}

// 除法
int divide(int a, int b, int *remainder) {
    int quot = a / b;
    *remainder = a % b;
    return quot;
}

// 列表求均值
double avg(double *a, int n) {
    int i;
    double total = 0.0;
    for (i = 0; i < n; i++) {
        total += a[i];
    }
    return total / n;
}

// 结构体
typedef struct Point {
    double x,y;
} Point;

// 结构体参数
double distance(Point *p1, Point *p2) {
    return hypot(p1->x - p2->x, p1->y - p2->y);
}

编译

1	`gcc -o testlib.o -c testlib.c`

C函数库的头文件用于在扩展函数中调用C函数库

// testlib.h

#include <math.h>

extern int gcd(int, int);
extern int divide(int a, int b, int *remainder);
extern double avg(double *a, int n);

typedef struct Point {
    double x,y;
} Point;

extern double distance(Point *p1, Point *p2);

编写扩展模块

// testlibpy.c

#include "Python.h"
#include "testlib.h"

// 函数
static PyObject *py_gcd(PyObject *self, PyObject *args)
{
    int x, y, result;

    if (!PyArg_ParseTuple(args, "ii", &x, &y))
    {
        return NULL;
    }
    result = gcd(x, y);
    return Py_BuildValue("i", result);
}

// 指针作为参数
static PyObject *py_divide(PyObject *self, PyObject *args)
{
    int a, b, quotient, remainder;
    if (!PyArg_ParseTuple(args, "ii", &a, &b))
    {
        return NULL;
    }
    quotient = divide(a, b, &remainder);
    return Py_BuildValue("(ii)", quotient, remainder);
}

// 列表作为参数
static PyObject *py_avg(PyObject *self, PyObject *args)
{
    PyObject *seq;
    double *dbar, result;
    int seqlen;

    // 获取序列参数
    if (!PyArg_ParseTuple(args, "O", &seq))
        return 0;
    seq = PySequence_List(seq);

    // 将序列复制为double数组
    seqlen = PySequence_Length(seq);
    dbar = malloc(seqlen * sizeof(double));
    for (int i = 0; i < seqlen; i++)
    {
        PyObject *item = PyList_GetItem(seq, i);
        dbar[i] = PyFloat_AsDouble(PyNumber_Float(item));
    }

    // 释放空间，计算并返回结果
    Py_DECREF(seq);
    result = avg(dbar, seqlen);
    free(dbar);
    return Py_BuildValue("d", result);
}

// Point 对象的 Destructor 函数
static void del_Point(PyObject *obj)
{
    free(PyCapsule_GetPointer(obj, "Point"));
}

// 结构体：创建 Point 对象
static PyObject *py_Point(PyObject *self, PyObject *args)
{
    Point *p;
    double x, y;
    if (!PyArg_ParseTuple(args, "dd", &x, &y))
    {
        return NULL;
    }
    p = (Point *)malloc(sizeof(Point));
    p->x = x;
    p->y = y;

    // 创建capsule。capsule类似于指针
    return PyCapsule_New(p, "Point", del_Point);
}

// 结构体作为参数
static PyObject *py_distance(PyObject *self, PyObject *args)
{
    Point *p1, *p2;
    PyObject *py_p1, *py_p2;
    double result;

    if (!PyArg_ParseTuple(args, "OO", &py_p1, &py_p2))
    {
        return NULL;
    }

    // 提取capsule中的指针
    p1 = (Point *)PyCapsule_GetPointer(py_p1, "Point");
    p2 = (Point *)PyCapsule_GetPointer(py_p2, "Point");

    result = distance(p1, p2);
    return Py_BuildValue("d", result);
}

// 模块函数列表
static PyMethodDef methods[] = {
    {"gcd", py_gcd, METH_VARARGS, "greatest common divisor"},
    {"divide", py_divide, METH_VARARGS, "integer division"},
    {"avg", py_avg, METH_VARARGS, "list average"},
    {"distance", py_distance, METH_VARARGS, "point distance"},
    {"Point", py_Point, METH_VARARGS, "point"},
    {NULL, NULL, 0, NULL}};

// 配置模块
static struct PyModuleDef module = {
    PyModuleDef_HEAD_INIT,
    "testlib",        // 模块名
    "A lib for test", // 模块文档字符串
    -1,               // 解释器状态大小，-1表示用全局变量保存状态
    methods};

// 模块初始化函数
PyMODINIT_FUNC PyInit_testlib(void)
{
    return PyModule_Create(&module);
}

setup文件

# setup.py
from distutils.core import setup, Extension

setup(name='testlib',
      ext_modules=[
          Extension('testlib',
                    ['testlibpy.c'],
                    include_dirs=['/path/to/python/env/include/python3.8'],
                    library_dirs=['/path/to/file/testlib.o/'],
                    libraries=['testlib.o']
                    )
      ]
      )

扩展模块构建

1	`python setup.py build_ext --inplace`

调用

>>> from testlib import *
>>> gcd(35,42)
7
>>> divide(42, 8)
(5, 2)
>>> avg([1, 2, 3, 4, 5, 6])
3.5
>>> p1 = Point(1, 2)
>>> p2 = Point(4, 5)
>>> distance(p1, p2)
4.242640687119285

9.6 项目打包与发布

PyPI
Python 第三方工具包的官方仓库
使用pip安装的第三方工具包就来自 PyPI

9.6.1 打包与发布的流程

打包与发布的流程
注册 PyPI 帐户:在 PyPI 官方注册帐户信息
项目配置:编写与配置setup.py和其他相关文件
打包:执行setup.py文件，将项目打包为所需要的格式
发布:利用twine上传工具包。首次上传会自动创建项目，并且要求与 PyPI 仓库中已有的工具包不重复;以后每次上传要求版本号不重复
注册 PyPI 帐户

项目配置
setup 常用配置参数

参数	功能
`description`	工具包简要描述
`long_description`	工具包详细描述，常读取自 Markdown 文件
`long_description_content_type`	详细描述的格式，常取值为 text/markdown
`classifiers`	项目的分类标识，作为 PyPI 对项目进行分类的依据
`keywords`	项目关键字列表
`packages`	项目所中包的列表(包含 `__init__.py` 的文件夹)
`py_modules`	包之外的独立模块文件名
`package_data`	工具包所需的数据文件
`data_files`	需要打包的数据文件，如图片，配置文件等
`ext_modules`	扩展模块配置信息
`install_requires`	该工具包所依赖的其他工具包列表

打包
执行setup.py
常用子命令
- install:打包并将build目录中的文件安装至当前 Python 环境;
- build:构建生成安装工具包所需的所有文件
- clean:清除打包过程中生成的临时文件
- check:检测配置信息是否有误
- build_py:构建纯 Python 模块
- build_ext:构建 C/C++ 扩展模块
- sdist:创建源代码发布文件
- bdist:创建二进制发布文件
- bdist_wheel:创建 wheel 格式的发布文件;
- bdist_egg:创建 egg 格式的发布文件
发布
python -m twine upload dist/*

9.6.2 项目打包与发布示例

# 文件 testlibpy/functions.py
import numpy as np

def add(x, y):
    return x + y

def average(lst):
    return np.average(lst)

class Point:
    def __init__(self, x, y):
        self.x, self.y = x, y

def distance(p1, p2):
    return np.sqrt((p1.x - p2.x)**2 + (p1.y - p2.y)**2)

# 文件 testlibpy/__init__.py
from .functions import *

# 文件 setup.py
from setuptools import setup, find_packages

desc = '工具包的简要说明'
long_description = open('./README.md').read()

setup(
    name="testlibpy",
    version="0.0.1",
    author="pystudy",
    author_email="xxxxxx@xxxxxx.com",
    license='MIT',
    description=desc,
    long_description=long_description,
    long_description_content_type='text/markdown',
    url="https://xxxxxx.com/testlibpy",
    classifiers=[ 'Development Status :: 3 - Alpha',
                  'Programming Language :: Python :: 3',
                  'Operating System :: OS Independent'],
    packages=find_packages(include=['testlibpy']), 
    install_requires=['numpy > 1.15'],
    python_requires='>=3.6',
)

打包：$ python setup.py sdist bdist_wheel
发布：$ python -m twine upload dist/*

本页面的全部内容在 生信资料 bio.0594codes.cn 和莆田青少年编程俱乐部 0594codes.cn 协议之条款下提供，附加条款亦可能应用

第9章 性能优化技术*

9.1 Python程序性能分析

9.1.1 time与timeit

time

timeit

9.1.2 profile

cProfile

pstats

9.2 即时编译技术

9.2.1 即时编译的概念

9.2.2 PyPy

9.2.3 Numba

基本用法

Eager编译

nopython 模式

缓存编译结果

9.3 混合编程概念及环境搭建

混合编程

环境搭建

9.4 利用 ctypes 实现混合编程

9.4.1 C 函数库的调用

一般 C 函数的调用

指针作为参数

数组作为参数

结构体作为参数

9.4.2 C++ 类的包装

9.5 利用 C API 构建 Python 扩展

9.5.1 构建 Python 扩展的步骤

9.5.2 扩展函数

扩展函数的定义

参数解析

构造返回值

9.5.3 模块配置与初始化

9.5.4 扩展的构建与安装

9.5.5 示例

9.6 项目打包与发布

9.6.1 打包与发布的流程

9.6.2 项目打包与发布示例

第9章性能优化技术*

9.1.1 `time`与`timeit`

`time`

`timeit`

9.1.2 `profile`

`cProfile`

`pstats`