使用Python进行分子动力学模拟

不完全翻译自https://klyshko.github.io/teaching/2019-03-01-teaching

1. 介绍

本文的目的是介绍大分子的分子动力学模拟，将会学习怎样使用python语言进行模拟，同时文章会使用numpy等几个新的模块例如openmm进行MD模拟。

2.牛顿运动定理

牛顿第二定律将物体的动力学与力学联系起来，并且定义了物体位置的动态变化：

$$ m\frac{d^2r(t)}{dt^2} = F = - \nabla{U(r)}, $$

其中$m$ 表示质量，$r$ 表示位置，$F$表示力，${U(r)}$为势能，这只依赖于物体的位置。如果知道作用在物体上的力，那么可以随时发现物体的位置$r(t)$ ,例如预测其动力学。这可以通过求解牛顿公式来实现。他是一个二阶的常微分方程(ODE)，可以通过几个简单的例子来分解。常力，谐振子，周期性力，拖拽力(constant force, harmonic oscillator, periodic force, drag force)等等,当然一种变通方法是用计算机来求解ODE。

3. 粒子的动力学模拟

这里有许多方法进行ODE的解法。将两个二阶方程转化为一个一阶方程可以使用如下方法: $$ \frac{dr(t)}{dt} = v(t) $$

$$ m\frac{dv(t)}{dt} = F(t) $$

我们可以使用有限差来近似的得到一个简单的正向欧拉算法(Euler Algorithm)： $$ v_{n+1} = v_n + \frac{F_n}{m} dt $$

$$ r_{n+1} = r_n + v_{n+1} dt $$

在这里我们将时间t离散化为时间步长$dt$ , 所以$t_{n+1} = t_n + dt$并且$r_{n} = r(t_n)$ ,$v_{n} = v(t_n)$ 。其中n是时间步数。

例子 3.1 模拟在地球上的抛物线

我们想知道一个0.3kg的绿色苹果从553米高的多伦多CN铁塔水平以10cm/s的速度抛下，求前10s的动力学。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50


import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation

fig, ax = plt.subplots(figsize=(8, 8))

ax.set(xlim=(-2, 2), ylim=(0, 600), xlabel="Position, meters", ylabel='Height, meters',
       title='Apple falling from CN tower')

# 问题的参数
T = 10.  # s
m = 0.3  # kg
g = 9.8  # m/s^2
v0x = -0.1  # m/s
H = 553.  # m

# 设置步长为50ms
dt = 0.05  # s
N = int(T / dt)

# 分配数组
v = np.zeros((N + 1, 2))
r = np.zeros((N + 1, 2))
f = np.zeros((N + 1, 2))

# 初始条件
r[0] = np.array([0., H])
v[0] = np.array([-v0x, 0.])

# 重力
f[:] = np.array([0., -m * g])

## 运行动力学:正向欧拉算法
for n in range(N):
    v[n + 1] = v[n] + f[n] / m * dt
    r[n + 1] = r[n] + v[n + 1] * dt

## 绘制第一个数据点
scat = ax.scatter(r[0, 0], r[0, 1], marker='o', c='g', s=200)


## animating
def animate(i):
    scat.set_offsets(r[i])


ani = animation.FuncAnimation(fig, func=animate, frames=N)
ani.save('CNtower.html', writer=animation.HTMLWriter(fps=1 // dt))

plt.close()

当一个封闭的系统中的粒子之间相互联系是通过两两的电势，每个粒子i上的力依赖于其相对于其他粒子j的相对位置： $$ m_i\frac{d^2r_i(t)}{dt^2} = \sum_jF_{ij}(t) = -\sum_j\nabla_i{U(|r_{ij}(t)|)} $$ 其中$r_{ij} = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2 + (z_i - z_j)^2}$为粒子i和j之间的距离，$i,j \in (1,N)$

例子 3.2 用胡克定律模拟3体问题

我们想知道3个$m=1kg$的粒子用看不见的$K_s = 5N/m$弹簧进行连接的前10秒运动轨迹，$r_0=1$，3个粒子的初始位置分别为(0, 2), (2, 0) 和 (-1, 0) 。

提示: 成对电势用胡克定律公式为: $U(r_{ij}) = \frac{K_s}{2}(r_{ij} - r_0)^2$

势的负梯度(negative gradient)是从第j到第i之间的力: $$ \mathbf{F_{ij}} = - \nabla_i{U(r_{ij})} = - K_s (r_{ij} - r_0) \nabla_i r_{ij} = - K_s (r_{ij} - r_0) \mathbf{r_{ij}} / r_{ij} $$

4. 蛋白，结构和功能

4.1 蛋白质结构等级

4.2 功能

抗体 - 结合外来颗粒，ex: IgG
酶 - 加速化学催化, ex: Lysozyme
信使- 传递信号, ex: Growth hormone
结构组件- 支撑细胞, ex: Tubulin
运输/储存 - 结合和搬运小分子, ex: Hemoglobin

5. 分子机制

既然我们现在知道了蛋白质是什么，以及为什么这些分子机制是重要的，我们考虑他们的建模方法。其基本思想是创建与我们在3体模拟中使用的相同的方法。现在，我们的系统由数千个粒子(蛋白质的原子加上周围水的原子)组成，它们都通过一个复杂的势能函数连接在一起。

全原子势能函数$V$ 通常由所有的键能($V_b$)和非键能($V_{nb}$)之和。例如: $$ V = V_{b} + V_{nb}, $$ 键能包括谐波键（harmonic (covalent) bond），角和两种类型的扭转角（二面角，dihedral）: 恰当和不恰当的(?proper and improper)。 $$ V_{b} = \sum_{bonds}\frac{1}{2}K_b(b-b_0)^2 + \sum_{angles}K_{\theta}(\theta-\theta_0)^2 + \sum_{dihedrals}K_{\phi}(1-cos(n\phi - \phi_0)) + \sum_{impropers}K_{\psi}(\psi-\psi_0)^2 $$ $b$和$\theta$表示两个原子之间的距离和两个相邻键之间的夹角。$\phi$和$\psi$为二面角。这些可以从所有原子的当前位置进行计算。$K_b$,$K_{\theta}$,$K_{\phi}$和$K_{\psi}$为弹簧常数，与键的振动，键的弯曲，以及在一些平衡值附近的二面角和不恰当角的构象波动有关构象的波动相关，分别用$b_0$,$\theta_0$，$\phi_0$和$\psi_0$相关

势能函数的非成键部分用静电势和范德华势表示，即:

$$ V_{nb} = \sum_{i,j}\left(\frac{q_{i}q_{j}}{4\pi\varepsilon_{0}\varepsilon r_{ij}} + \varepsilon_{ij}\left[\left(\frac{\sigma^{min}{ij}}{r{ij}}\right)^{12}-2\left(\frac{\sigma^{min}{ij}}{r{ij}}\right)^{6}\right]\right) $$

$r_{ij}$表示两个联系的原子之间的距离，$q_i$和$q_j$表示他们的电荷，$\varepsilon$和$\varepsilon_0$为电常数和介电常数，$\varepsilon_{ij} = \sqrt{\varepsilon_i\varepsilon_j}$和$\sigma_{ij} = \frac{\sigma_i + \sigma_j}{2}$表示原子$i$和$j$之间的范德华力参数.

重要的是，每个力场都有自己的一组参数，这些参数对于不同类型的原子是不同的。

6. 蛋白的分子机制

以下内容不是我关心的，所以代码内容没有进行详实的翻译。

分子模拟是一种研究原子和分子物理运动的计算机模拟方法。

在最常见的版本中，原子和分子的轨迹是通过数值求解牛顿的相互作用粒子系统的运动方程来确定的，通常使用分子力学力场计算粒子及其势能之间的力。

那么现在我们可以进行分子动力学模拟，我们仅需要初始的蛋白结构和软件来计算其动态过程。

步骤如下:

加载初始坐标的蛋白原子
选择力场参数
选择实验参数，例如温度，压力，盒子大小，溶剂，边界条件
选择积分器，即求解运动的算法
运行模拟，保存每个时间的坐标
可视化轨迹
进行分析

本文将会使用如下软件:

nglview 用于可视化分子
MDAnalysis 用于分析模拟轨迹
openMM 用于进行模拟

1
2
3
4
5
6


from simtk.openmm.app import *
from simtk.openmm import *
from simtk.unit import *
import MDAnalysis as md
import nglview as ng
from sys import stdout

赋予不同文件

链接：https://pan.baidu.com/s/1fvXKY6kXdloR-Gi_uK0A7A 提取码：794y –来自百度网盘超级会员V7的分享

1
2
3
4


pdb0_file = 'data/villin_water.pdb'
pdb1_file = 'data/polyALA.pdb'
pdb2_file = 'data/polyGLY.pdb'
pdb3_file = 'data/polyGV.pdb'

pdb文件类似如下:

1
2
3
4
5
6


file0 = open(pdb0_file, 'r')
counter = 0
for line in file0:
    if counter < 10:
        print(line)
    counter += 1

输出:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


REMARK    GENERATED BY TRJCONV

HEADER    Villin N68H in explicit water

REMARK    THIS IS A SIMULATION BOX

CRYST1   49.163   45.981   38.869  90.00  90.00  90.00 P 1           1

MODEL        0

ATOM      1  N   LEU     1      25.160  14.160  19.440  1.00  0.00

ATOM      2  H1  LEU     1      24.350  13.730  19.870  1.00  0.00

ATOM      3  H2  LEU     1      25.980  13.680  19.760  1.00  0.00

ATOM      4  H3  LEU     1      25.180  15.100  19.810  1.00  0.00

ATOM      5  CA  LEU     1      25.090  13.920  17.980  1.00  0.00

我们可以通过nglview进行查看:

1
2


u = md.Universe(pdb0_file)
ng.show_mdanalysis(u, gui=True)

6.1 蛋白折叠成α螺旋的模拟

运行完全伸展的ployA polyALA.pdb 在真空环境中运行400ps，温度为T=300，查看是否能够折叠成二级结构:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32


### 1.loading initial coordinates
pdb = PDBFile(pdb3_file) 

### 2.choosing a forcefield parameters
ff = ForceField('amber10.xml')  
system = ff.createSystem(pdb.topology, nonbondedMethod=CutoffNonPeriodic)

### 3. Choose parameters of the experiment: temperature, pressure, box size, solvation, boundary conditions, etc
temperature = 300*kelvin
frictionCoeff = 1/picosecond
time_step = 0.002*picoseconds
total_steps = 400*picoseconds / time_step

### 4. Choose an algorithm (integrator)
integrator = LangevinIntegrator(temperature, frictionCoeff, time_step)

### 5. Run simulation, saving coordinates time to time:

### 5a. Create a simulation object
simulation = Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

### 5b. Minimize energy
simulation.minimizeEnergy()

### 5c. Save coordinates to dcd file and energies to a standard output console:
simulation.reporters.append(DCDReporter('data/polyALA_traj.dcd', 1000))
simulation.reporters.append(StateDataReporter(stdout, 5000, step=True, potentialEnergy=True,\
                                              temperature=True, progress=True, totalSteps = total_steps))

### 5d. Run!
simulation.step(total_steps)

可视化

查看轨迹:

1
2
3


### 6. Visualization
sys = md.Universe(pdb3_file, 'data/polyALA_traj.dcd')
ng.show_mdanalysis(sys, gui=True)

6.2 分析MD轨迹

端对端距离:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


### analysis of end-to-end distance

## choose terminal atoms 
N_terminus = sys.select_atoms('resid 1 and name N')
C_terminus = sys.select_atoms('resid 25 and name C')

## go through the whole trajectory and compute distance between them for every frame
dist = []
for frame in sys.trajectory:
    dist.append(np.linalg.norm(N_terminus.positions - C_terminus.positions))

## the result is in the dist array    
dist = np.array(dist) 

氢键数量:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


from MDAnalysis.analysis import hbonds ## module for analysis of hydrogen bonds

## compute information about hbonds and write it to the 'hb.timeseries'
hb = hbonds.hbond_analysis.HydrogenBondAnalysis(sys)
hb.run()

## print information for the first 10 frames
for frame in hb.timeseries[:10]:
    print(frame)
    
    
## go through the 'hb.timeseries' file and calculate number of bonds for each time frame 
## (it's the length of array frame)
hb_number = []
for frame in hb.timeseries:
    hb_number.append(len(frame))
    
## the result is in the number array     
hb_number = np.array(hb_number)

我们可以绘制端到端距离和氢键的数量与时间：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


plt.figure(figsize=(15,5))

plt.subplot(121)
plt.plot( dist, '-k' )
plt.xlabel('timesteps')
plt.ylabel('end-to-end distance, A')

plt.subplot(122)
plt.plot(hb_number, 'g-')
plt.ylabel('# of hydrogen bonds')
plt.xlabel('timesteps')

plt.show()

绘制拉式图:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


from MDAnalysis.analysis import dihedrals  ## module for dihedrals analysis

ram1 = dihedrals.Ramachandran(sys).run(0,30) ## analyse for first 30 steps (black color)
ram2 = dihedrals.Ramachandran(sys).run(170,200) ## analyse for last 30 steps (blue color)

## ramachandran plot
fig, ax = plt.subplots(figsize=(8,8))
ram1.plot(ax=ax, color='k', marker='.')
ram2.plot(ax=ax, color='b', marker='.')
ax.arrow(20, 20, -40, -40, width=2, head_width=8, head_length=12, fc='b', ec='b')
ax.text(30, 20, 'alpha region', color='blue', fontsize=20)
ax.arrow(20, 150, -40, 0, width=2, head_width=8, head_length=12, fc='k', ec='k')
ax.text(30, 150, 'beta region', fontsize=20)