Back

Modeller文档学习笔记

c此笔记暂未完成,完整更新请关注阅读原文或者查看Modeller文档

0. 安装

建议使用conda安装

1
2
conda config --add channels salilab
conda install modeller

之后在按照cmd上的提示编辑相应文件写入授权

1. 介绍

Modeller是一个蛋白三维建模的软件,其在学术上是免费的,但是商业上是收费的。其一般用于同源建模或者三维结构比较,许多计算软件如Discovery Studio,薛定谔,Model都内置了该软件。

当然其也可以进行限制配体建模,结合NMR实验进行建模等等,截止笔记时其版本为10.0,目前已经可以使用Python3.X脚本进行执行,原来的mod9.x的内置Python运行脚本方式已经被弃用了。

1.1 方法简介

  1. 模板3D结构迭代进入靶标
  2. 约束某些参数,如Calpha-Calpha 距离,氢键等等
  3. 3D建模所有满足的可能性

2. 使用AutoModel快速建模

2.1 基本使用

文档里演示包含如下几个文件1fdx序列,5fd1结构文件和alignment.ali比对文件,脚本如下:

model.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#导入类
from modeller import *
from modeller.automodel import *

log.verbose()  #记录日志
env = Environ()  #创建MODELLER环境

# 搜寻文件文件夹
env.io.atom_files_directory = ['.','../atom_files']

# 初始化automodel类
a = AutoModel(env,
			alnfile = 'alignment.ali',
			knowns = '5fd1',
			sequence = '1fdx')

#这里只构建了一个模型
a.starting_model = 1
a.ending_model  =  1

a.make()

alignment.ali

1
2
3
4
5
6
7
>P1;5fd1
structureX:5fd1:1    :A:106  :A:ferredoxin:Azotobacter vinelandii: 1.90: 0.19
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELAEVWPNITEKKDPLPDAEDWDGVKGKLQHLER*

>P1;1fdx
sequence:1fdx:1    :A:54   :A:ferredoxin:Peptococcus aerogenes: 2.00:-1.00
AYVINDSC--IACGACKPECPVNIIQGS--IYAIDADSCIDCGSCASVCPVGAPNPED------------------------------------------------*

2.2 高级用法

2.2.1 保留水分子,HETATM残基和氢键

只需要在ali文件中用点(.)作为标识,然后在脚本中设置hetatm=True即可

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#导入类
from modeller import *
from modeller.automodel import *

log.verbose()  #记录日志
env = Environ()  #创建MODELLER环境

# 搜寻文件文件夹
env.io.atom_files_directory = ['.','../atom_files']

#读取HETATM文件
env.io.hetatm = True

# 初始化automodel类
a = AutoModel(env,
			alnfile = 'alignment.ali',
			knowns = '5fd1',
			sequence = '1fdx')

#这里只构建了一个模型
a.starting_model = 1
a.ending_model  =  1

a.make()

addligand.ali

1
2
3
4
5
6
7
>P1;5fd1
structureX:5fd1:1    :A:106  :A:ferredoxin:Azotobacter vinelandii: 1.90: 0.19
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELAEVWPNITEKKDPLPDAEDWDGVKGKLQHLER..*

>P1;1fdx
sequence:1fdx:1    :A:54   :A:ferredoxin:Peptococcus aerogenes: 2.00:-1.00
AYVINDSC--IACGACKPECPVNIIQGS--IYAIDADSCIDCGSCASVCPVGAPNPED------------------------------------------------..*

2.2.2 改变默认optimization和refinement工具

先看例子:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#导入类
from modeller import *
from modeller.automodel import *

log.verbose()  #记录日志
env = Environ()  #创建MODELLER环境

# 降低soft-sphere的约束权重
env.schedule_scale = physical.Values(default=1.0, soft_sphere=0.7)
# 搜寻文件文件夹
env.io.atom_files_directory = ['.','../atom_files']

#读取HETATM文件
env.io.hetatm = True

# 初始化automodel类
a = AutoModel(env,
			alnfile = 'alignment.ali',
			knowns = '5fd1',
			sequence = '1fdx')

#这里只构建了一个模型
a.starting_model = 1
a.ending_model  =  1

# 通过VTFM优化:
a.library_schedule = autosched.slow
a.max_var_iterations = 300

# MD优化
a.md_level = refine.slow

# 重复循环2次,并且除非obj.func>1E6否则不停止
a.repeat_optimization = 2
a.max_molpdf = 1e6

a.make()

library_schedule:

VTFM优化程度,(autosched.slow, autosched.normal[默认], autosched.fast, autosched.very fast, autosched.fastest)

max_var_iterations:

迭代数,数值越大精度越高

max_molpdf:

模型阈值,超过此值则VTFM则即刻终止

repeat_optimization:

重复优化次数,默认只重复一次

md_level:

分子动力学模拟, (None[默认], refine.very fast, refine.fast, refine.slow, refine.very slow or refine.slow large)

2.2.3 快速模型优化

仅需在make前运行very_fast()方法

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#导入类
from modeller import *
from modeller.automodel import *

log.verbose()  #记录日志
env = Environ()  #创建MODELLER环境

# 降低soft-sphere的约束权重
env.schedule_scale = physical.Values(default=1.0, soft_sphere=0.7)
# 搜寻文件文件夹
env.io.atom_files_directory = ['.','../atom_files']

#读取HETATM文件
env.io.hetatm = True

# 初始化automodel类
a = AutoModel(env,
			alnfile = 'alignment.ali',
			knowns = '5fd1',
			sequence = '1fdx',
			assess_methods = assess.GA341)  #模型评价
			assess_methods = (assess.GA341,assess.DOPE))  #不同模型评价
			assess_methods = =soap_protein_od.Scorer())  #不同模型评价

a.very_fast()

#这里只能一个模型
a.starting_model = 2
a.ending_model  =  2

a.final_malign3d = True

a.make()

2.2.4 多模板建模

仅需knowns里面改为list即可

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from modeller import *              
from modeller.automodel import *    

log.verbose()    
env = Environ() 


env.io.atom_files_directory = ['.', '../atom_files']

a = AutoModel(env,
              alnfile  = 'align-multiple.ali', # alignment filename
              knowns   = ('5fd1', '1bqx'),     # 模板风格代码
              sequence = '1fdx')               # code of the target

a.starting_model= 1                 
a.ending_model  = 1                 
                                    
a.make()                           

比对文件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
>P1;5fd1
structureX:5fd1:1    :A:106  :A:ferredoxin:Azotobacter vinelandii: 1.90: 0.19
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER*

>P1;1bqx
structureN:1bqx:   1 :A: 77  :A:ferredoxin:Bacillus schlegelii:-1.00:-1.00
AYVITEPCIGTKCASCVEVCPVDCIHEGEDQYYIDPDVCIDCGACEAVCPVSAIYHEDFVPEEWKSYIQKNRDFF
KK-----------------------------*

>P1;1fdx
sequence:1fdx:1    : :54   : :ferredoxin:Peptococcus aerogenes: 2.00:-1.00
AYVINDSC--IACGACKPECPVNIIQGS--IYAIDADSCIDCGSCASVCPVGAPNPED-----------------
-------------------------------*

2.2.5 生成氢键模型

如果要生成氢键模型,仅需要把AutoModel替换为AllHModel即可

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from modeller import *
from modeller.automodel import *

log.verbose()
env = Environ()

env.io.atom_files_directory = ['.', '../atom_files']

a = AllHModel(env, alnfile='alignment.ali', knowns='5fd1', sequence='1fdx')
a.starting_model = a.ending_model = 4

a.make()

2.2.6 仅优化部分模型

部分优化建模需要对AutoModel的select_atoms改造,所以我们继承AutoModel创建一个子类,之后利用子类进行建模

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from modeller import *
from modeller.automodel import *

log.verbose()

class MyModel(AutoModel):
	#覆盖父类的选择原子
	def select_atoms(self):
	    # 选择链A的1-2原子
		return Selection(self.residue_range('1:A','2:A'))
		
		# 选择链A的4,6,10原子
		#return Selection(self.residues['4:A'],
		#self.residues['6:A'],
         #self.residues['10:A'])
         
         #如果要除去一些原子那么就这样
         # return Selection(self) - Selection(self.residue_range('1:A', '5:A'))
         
env = Environ()

env.io.atom_files_directory = ['.', '../atom_files']
# 是否关闭选择的原子的动态联系
env.edat.nonbonded_sel_atoms = 2

a = MyModel(env, alnfile='alignment.ali', knowns='5fd1', sequence='1fdx')

a.starting_model = a.ending_model = 4

a.make()

nonbonded_sel_atoms:

默认为1,设置为2就被关闭,我的理解是类似于虚拟原子,占了位置,但是补参与相互作用,个人觉得自己用的机会比较少

2.2.7 包括二硫键

我自己用不到,大家可以查看示例或者查看文档 :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Comparative modeling by the AutoModel class
from modeller import *              # Load standard Modeller classes
from modeller.automodel import *    # Load the AutoModel class

# Redefine the special_patches routine to include the additional disulfides
# (this routine is empty by default):
class MyModel(AutoModel):
    def special_patches(self, aln):
        # A disulfide between residues 8 and 45 in chain A:
        self.patch(residue_type='DISU', residues=(self.residues['8:A'],
                                                  self.residues['45:A']))

log.verbose()    # request verbose output
env = Environ()  # create a new MODELLER environment to build this model in

# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']

a = MyModel(env,
            alnfile  = 'alignment.ali',     # alignment filename
            knowns   = '5fd1',              # codes of the templates
            sequence = '1fdx')              # code of the target
a.starting_model= 1                 # index of the first model
a.ending_model  = 1                 # index of the last model
                                    # (determines how many models to calculate)
a.make()                            # do the actual comparative modeling

2.2.8 约束建模

主要是增加一个csrfile参数和文件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Modeling using a provided restraints file (csrfile)
from modeller import *
from modeller.automodel import *    # Load the AutoModel class

log.verbose()
env = Environ()

# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']

a = AutoModel(env,
              alnfile  = 'alignment.ali',     # alignment filename
              knowns   = '5fd1',              # codes of the templates
              sequence = '1fdx',              # code of the target
              csrfile  = 'my.rsr')            # use 'my' restraints file
a.starting_model= 1                 # index of the first model
a.ending_model  = 1                 # index of the last model
                                    # (determines how many models to calculate)
a.make()                            # do comparative modeling

csrfile一般格式如下:

第一行一般如下:

1
MODELLER5 VERSION: MODELLER FORMAT

第二行以R开头

剩下的还没研究,具体可以查看文档

2.2.9 默认基础上增加限制

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Addition of restraints to the default ones
from modeller import *
from modeller.automodel import *    # Load the AutoModel class

log.verbose()
env = Environ()

# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']

class MyModel(AutoModel):
    def special_restraints(self, aln):
        rsr = self.restraints
        at = self.atoms
#       Add some restraints from a file:
#       rsr.append(file='my_rsrs1.rsr')

#       Residues 20 through 30 should be an alpha helix:
        rsr.add(secondary_structure.Alpha(self.residue_range('20:A', '30:A')))
#       Two beta-strands:
        rsr.add(secondary_structure.Strand(self.residue_range('1:A', '6:A')))
        rsr.add(secondary_structure.Strand(self.residue_range('9:A', '14:A')))
#       An anti-parallel sheet composed of the two strands:
        rsr.add(secondary_structure.Sheet(at['N:1:A'], at['O:14:A'],
                                          sheet_h_bonds=-5))
#       Use the following instead for a *parallel* sheet:
#       rsr.add(secondary_structure.Sheet(at['N:1:A'], at['O:9:A'],
#                                         sheet_h_bonds=5))

#       Restrain the specified CA-CA distance to 10 angstroms (st. dev.=0.1)
#       Use a harmonic potential and X-Y distance group.
        rsr.add(forms.Gaussian(group=physical.xy_distance,
                               feature=features.Distance(at['CA:35:A'],
                                                         at['CA:40:A']),
                               mean=10.0, stdev=0.1))

a = MyModel(env,
            alnfile  = 'alignment.ali',     # alignment filename
            knowns   = '5fd1',              # codes of the templates
            sequence = '1fdx')              # code of the target
a.starting_model= 1                 # index of the first model
a.ending_model  = 1                 # index of the last model
                                    # (determines how many models to calculate)
a.make()       

比较好理解,这里就不详细学习和介绍了,也是用的类的继承。

2.2.10 构建多条链的建模

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from modeller import *
from modeller.automodel import *    

log.verbose()

# 覆盖 'special_restraints' 和 'user_after_single_model' 方法:
class MyModel(AutoModel):
    def special_restraints(self, aln):
        # 约束AB链的Ca原子
        s1 = Selection(self.chains['A']).only_atom_types('CA')
        s2 = Selection(self.chains['B']).only_atom_types('CA')
        self.restraints.symmetry.append(Symmetry(s1, s2, 1.0))
    def user_after_single_model(self):
        # 建好以后报告对称性大于1A的情况:
        self.restraints.symmetry.report(1.0)

env = Environ()

env.io.atom_files_directory = ['.', '../atom_files']

a = MyModel(env,
            alnfile  = 'twochain.ali' ,    
            knowns   = '2abx',              
            sequence = '1hc9')             

a.starting_model= 1                
a.ending_model  = 1                
                                  
a.make()                          

比对需要注意的是两条链之间用斜杠/ 分开:

1
2
3
4
5
6
7
8
9
>P1;2abx
structureX:2abx:   1 :A:74 :B:bungarotoxin:bungarus multicinctus:2.5:-1.00
IVCHTTATIPSSAVTCPPGENLCYRKMWCDAFCSSRGKVVELGCAATCPSKKPYEEVTCCSTDKCNHPPKRQPG/
IVCHTTATIPSSAVTCPPGENLCYRKMWCDAFCSSRGKVVELGCAATCPSKKPYEEVTCCSTDKCNHPPKRQPG*

>P1;1hc9
sequence:1hc9:   1 :A:148:B:undefined:undefined:-1.00:-1.00
IVCHTTATSPISAVTCPPGENLCYRKMWCDVFCSSRGKVVELGCAATCPSKKPYEEVTCCSTDKCNPHPKQRPG/
IVCHTTATSPISAVTCPPGENLCYRKMWCDAFCSSRGKVVELGCAATCPSKKPYEEVTCCSTDKCNPHPKQRPG*

2.2.11 全自动比对和建模

如果没有初始的模型和靶标序列,Modeller可以为你全自动建模。

官方也提出了自动比对是非常危险的,若同源序列相似度低于50%,请不要贸然尝试

自动比对的方法也很简单,仅需要在make()前面加上auto_align()

?感觉这个就和swiss-model傻瓜建模类似了?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from modeller import *
from modeller.automodel import *    # Load the AutoModel class

log.verbose()
env = Environ()

env.io.atom_files_directory = ['.', '../atom_files']

a = AutoModel(env,
              # 文件包含PDB codes和靶标序列
              alnfile  = 'alignment.seg',
              # PDB codes 模板
              knowns   = ('5fd1', '1fdn', '1fxd', '1iqz'),
              # 靶标模板
              sequence = '1fdx')
a.auto_align()                      # 进行自动序列比对
a.make()

alignment.seg文件:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
>P1;1fdx
sequence::::::ferredoxin:Peptococcus aerogenes:-1.00:-1.00
AYVINDSCIACGACKPECPVNIIQGSIYAIDADSCIDCGSCASVCPVGAPNPED*
>P1;1fdn
structureX:1fdn:FIRST:@:55:@:ferredoxin:Clostrodium acidiurici: 1.84:-1.0
*
>P1;5fd1
structureX:5fd1:FIRST:@:60:@:ferredoxin:Azotobacter vinelandii: 1.90:0.192
*
>P1;1fxd
structureX:1fxd:FIRST:@:58:@:ferredoxin:Desolfovibrio gigas: 1.70:-1.0
*
>P1;1iqz
structureX:1iqz:FIRST:@:60:@:ferredoxin:Bacillus thermoproteolyticus: 2.30:-1.0
*

3. Loop优化

3.1 建模后自动细化Loop

在AutoModel后自动细化Loop,使用LoopModel类即可(实际他也会进行AutoModel,之后再进行Loop)

现在Modeller推出了一个DOPE-based loop细化,可以使用dope_loopmodel或者dopehr_loopmodel;两者耗时和精度依次提升。

我自己做反而越来越差,待考证

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from modeller import *
from modeller.automodel import *    

log.verbose()
env = Environ()

# directories for input atom files
env.io.atom_files_directory = ['.', '../atom_files']

a = LoopModel(env,
              alnfile  = 'alignment.ali',
              knowns   = '5fd1', 
              sequence = '1fdx') 
a.starting_model= 1 
a.ending_model  = 1
                                    
a.md_level = None

a.loop.starting_model = 1  # 第一个Loop模型
a.loop.ending_model  = 4   # 最后一个Loop模型
a.loop.md_level   =  refine.fast  # Loop精炼水平

a.make()

生成的Loop模型包含.BL的扩展后缀

loop.md_levelmd_level的原理类似,具体请查看上面的注释

3.2 定义loop区域精炼

方法和AutoModel的选自定义是类似的,仅仅把select_atoms()改为select_loop_atoms()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from modeller import *
from modeller.automodel import *

log.verbose()
env = Environ()

env.io.atom_files_directory = ['.', '../atom_files']

# Create a new class based on 'LoopModel' so that we can redefine
# select_loop_atoms
class MyLoop(LoopModel):
    # This routine picks the residues to be refined by loop modeling
    def select_loop_atoms(self):
        # Two residue ranges (both will be refined simultaneously)
        return Selection(self.residue_range('19:A', '28:A'),
                         self.residue_range('45:A', '50:A'))

a = MyLoop(env,
           alnfile  = 'alignment.ali',      # alignment filename
           knowns   = '5fd1',               # codes of the templates
           sequence = '1fdx',               # code of the target
           loop_assess_methods=assess.DOPE) # assess each loop with DOPE
a.starting_model= 1                 # index of the first model
a.ending_model  = 1                 # index of the last model

a.loop.starting_model = 1           # First loop model
a.loop.ending_model   = 2           # Last loop model

a.make()                            # do modeling and loop refinement

3.3 已有循环的细化

已有循环细化的话就不需要比对,请注意选择需要喜欢的loop

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from modeller import *
from modeller.automodel import *

log.verbose()
env = Environ()

env.io.atom_files_directory = ['.', '../atom_files']


class MyLoop(LoopModel):
    def select_loop_atoms(self):
        # 19-28aa 的loop进行优化
        return Selection(self.residue_range('19:A', '28:A'))
        # 两个loop的优化
        #return Selection(self.residue_range('19:A', '28:A'),
        #                 self.residue_range('38:A', '42:A'))

m = MyLoop(env,
           inimodel='1fdx.B99990001.pdb',   # 初始模型
           sequence='1fdx',                 # 靶标序列
           loop_assess_methods=assess.DOPE) # 使用DOPE进行打分
#          loop_assess_methods=soap_loop.Scorer()) # SOAP-Loop 打分,若使用此打分 请 from modeller import soap_loop

m.loop.starting_model= 20           
m.loop.ending_model  = 23           
m.loop.md_level = refine.very_fast  

m.make()

4. 建模后的数据分析

在完成make后会有一个output参数若是LoopModel则为loop.output

例子如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from modeller import *
from modeller.automodel import *
import sys

log.verbose()
env = Environ()

env.io.atom_files_directory = ['.', '../atom_files']

# 构建三个模型,包含DOPE和GA341打分
a = AutoModel(env, alnfile = 'alignment.ali', knowns = '5fd1',
              sequence = '1fdx', assess_methods=(assess.DOPE, assess.GA341))
a.starting_model= 1
a.ending_model  = 3
a.make()

# 获取所有成功模型
ok_models = [x for x in a.outputs if x['failure'] is None]

# 使用DOPE score进行打分排序
key = 'DOPE score'
if sys.version_info[:2] == (2,3):
	# 看python版本,因为一般现在是3.X,所以用else后面的代码块即可其实
    ok_models.sort(lambda a,b: cmp(a[key], b[key]))
else:
    ok_models.sort(key=lambda a: a[key])

# 获取最高打分模型
m = ok_models[0]
print("Top model: %s (DOPE score %.3f)" % (m['name'], m[key]))

5. 文件说明

5.1. 比对文件(PIR)说明

例子

1
2
3
4
5
6
7
>P1;5fd1
structureX:5fd1:1    :A:106  :A:ferredoxin:Azotobacter vinelandii: 1.90: 0.19
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELAEVWPNITEKKDPLPDAEDWDGVKGKLQHLER*

>P1;1fdx
sequence:1fdx:1    :A:54   :A:ferredoxin:Peptococcus aerogenes: 2.00:-1.00
AYVINDSC--IACGACKPECPVNIIQGS--IYAIDADSCIDCGSCASVCPVGAPNPED------------------------------------------------*

其中第一行为说明,>P1为固定格式,分号后面为文件标识,需要和脚本中一致

第二行需要标注是已知的structure还是要建模的sequence,第二个冒号之后是文件表示,需要和脚本中一致

后面的8个冒号不是特别重要。

序列中-表示该位置空缺,.表示此处为HETATM,w表示water,*表示文件结尾, / 表示第二个chain了

6. 实战分析

我前面写过一个Chimera进行loop补全的内容,实际上其也是用的Modeller来进行的,Modeller在同源建模领域还是非常强的。

这次我们就来分析Chimera制作的脚本,以下是Chimera Python2.X的脚本,看了上面的笔记,大家应该都比较熟悉了

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# --- UCSF Chimera Copyright ---
# Copyright (c) 2000 Regents of the University of California.
# All rights reserved.  This software provided pursuant to a
# license agreement containing restrictions on its disclosure,
# duplication and use.  This notice must be embedded in or
# attached to all copies, including partial copies, of the
# software or any revisions or derivations thereof.
# --- UCSF Chimera Copyright ---

# This script is generated by the Modeller Dialog in Chimera, 
# incorporating the settings from the user interface. User can revise 
# this script and submit the modified one into the Advanced Option panel. 


# Import the Modeller module
from modeller import *
from modeller.automodel import *    


# ---------------------- namelist.dat --------------------------------
# A "namelist.dat" file contains the file names, which was output from 
# the Modeller dialog in Chimera based on user's selection.
# The first line it the name of the target sequence, the remaining 
# lines are name of the template structures
namelist = open( 'namelist.dat', 'r' ).read().split('\n')
tarSeq = namelist[0]
template = tuple( [ x.strip() for x in namelist[1:] if x != '' ] )
# ---------------------- namelist.dat --------------------------------

# This instructs Modeller to display all log output. 
log.verbose()

# create a new Modeller environment
env = environ()

# Directory of atom/PDB/structure files. It is a relative path, inside 
# a temp folder generated by Chimera. User can modify it and add their 
# absolute path containing the structure files.
env.io.atom_files_directory = ['.', './template_struc']



# Read in HETATM records from template PDBs 
env.io.hetatm = True

# Read in water molecules from template PDBs 
env.io.water = True

# create a subclass of automodel or loopmodel, MyModel.
# user can further modify the MyModel class, to override certain routine.
class MyModel(loopmodel):
	def select_loop_atoms(self):
		from modeller import selection
		return selection(
			self.residue_range('1', '34'),
			self.residue_range('262', '276'))
	def select_atoms(self):
		from modeller import selection
		return selection(
			self.residue_range('1', '34'),
			self.residue_range('262', '276'))

		def customised_function(self): pass
		#code overrides the special_restraints method

		#def special_restraints(self, aln):

		#code overrides the special_patches method.
		# e.g. to include the addtional disulfides.
		#def special_patches(self, aln):
		
a = MyModel(env, sequence = tarSeq,
		# alignment file with template codes and target sequence
		alnfile = 'alignment.ali',
		# name of initial PDB template
		knowns = template[0])

# one fixed model to base loops on
a.starting_model = 1
a.ending_model = 1

# 5 loop models
loopRefinement = True
a.loop.starting_model = 1
a.loop.ending_model = 5
a.loop.assess_methods=(assess.DOPE, assess.GA341, assess.normalized_dope)

# Assesses the quality of models using
# the DOPE (Discrete Optimized Protein Energy) method (Shen & Sali 2006)
# and the GA341 method (Melo et al 2002, John & Sali 2003)
a.assess_methods = (assess.GA341, assess.normalized_dope)

# ------------------------- build all models --------------------------
a.make()

# ---------- Accesing output data after modeling is complete ----------

# Get a list of all successfully built models from a.outputs
if loopRefinement:
	ok_models = filter(lambda x: x['failure'] is None, a.loop.outputs)
else:
	ok_models = filter(lambda x: x['failure'] is None, a.outputs)

# Rank the models by index number
#key = 'num'
#ok_models.sort(lambda a,b: cmp(a[key], b[key]))
def numSort(a, b, key="num"):
	return cmp(a[key], b[key])
ok_models.sort(numSort)


# Output the list of ok_models to file ok_models.dat 
fMoutput = open('ok_models.dat', 'w')
fMoutput.write('File name of aligned model\t GA341\t zDOPE \n')

for m in ok_models:
	results  = '%s\t' % m['name']
	results += '%.5f\t' % m['GA341 score'][0]
	results += '%.5f\n' % m['Normalized DOPE score']
	fMoutput.write( results )

fMoutput.close()

首先我们转换成Python3.X并进行分析

首先导入相应模块

1
2
3
4
from modeller import *
from modeller.automodel import *
#用于取代cmp()函数
import operator

设置靶标序列和模板序列,Chimera是将其写入的namelist.dat文件中的

1
2
3
namelist = open( 'namelist.dat', 'r' ).read().split('\n')
tarSeq = namelist[0]
template = tuple( [ x.strip() for x in namelist[1:] if x != '' ] )

等同于

1
2
tarSeq = 'aaa'
template = ('bbb','ccc')

常规内容,进行记录日志,初始化,设置目录

1
2
3
4
5
6
7
8
9
log.verbose()

# create a new Modeller environment
env = environ()

# Directory of atom/PDB/structure files. It is a relative path, inside 
# a temp folder generated by Chimera. User can modify it and add their 
# absolute path containing the structure files.
env.io.atom_files_directory = ['.', './template_struc']

打开配体和水的保留,如果你是loop建模的话我建议water就不用了,反而碍事

1
2
3
4
5
6
# Read in HETATM records from template PDBs 
env.io.hetatm = True

# Read in water molecules from template PDBs 
env.io.water = True

对model过程中和loop细化过程中进行残基选择, 仅选择的残基才会进行建模,请注意新版本需要指定chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
class MyModel(loopmodel):
	def select_loop_atoms(self):
		from modeller import selection
		return selection(
			self.residue_range('1:A', '34:A'),
			self.residue_range('262:A', '276:A'))
	def select_atoms(self):
		from modeller import selection
		return selection(
			self.residue_range('1:A', '34:A'),
			self.residue_range('262:A', '276:A'))

		def customised_function(self): pass
		#code overrides the special_restraints method

		#def special_restraints(self, aln):

		#code overrides the special_patches method.
		# e.g. to include the addtional disulfides.
		#def special_patches(self, aln):

申明类

1
2
3
4
5
a = MyModel(env, sequence = tarSeq,
		# 比对文件水原子用w表示
		alnfile = 'alignment.ali',
		# name of initial PDB template
		knowns = template[0])

建模数量loop细化数目

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
a.starting_model = 1
a.ending_model = 1

# 5 loop models
loopRefinement = True
a.loop.starting_model = 1
a.loop.ending_model = 5
a.loop.assess_methods=(assess.DOPE, assess.GA341, assess.normalized_dope)

#打分
a.assess_methods = (assess.GA341, assess.normalized_dope)

构建模型

1
a.make()

模型打分,其实也可以查看我笔记里面最后的打分那一栏,也可以用chimera的改改,只是感觉相对麻烦一点

filter函数python2.x返回的为list,python3.x返回的为迭代相,若需要改为list需要转换

1
2
3
4
if loopRefinement:
	ok_models = list(filter(lambda x: x['failure'] is None, a.loop.outputs))
else:
	ok_models = list(filter(lambda x: x['failure'] is None, a.outputs))

字典排序比较

1
ok_models = sorted(ok_models,key=operator.itemgetter('num'))

输出到文件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
fMoutput = open('ok_models.dat', 'w')
fMoutput.write('File name of aligned model\t GA341\t zDOPE \n')

for m in ok_models:
	results  = '%s\t' % m['name']
	results += '%.5f\t' % m['GA341 score'][0]
	results += '%.5f\n' % m['Normalized DOPE score']
	fMoutput.write( results )

fMoutput.close()

完整版本如下:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# --- UCSF Chimera Copyright ---
# Copyright (c) 2000 Regents of the University of California.
# All rights reserved.  This software provided pursuant to a
# license agreement containing restrictions on its disclosure,
# duplication and use.  This notice must be embedded in or
# attached to all copies, including partial copies, of the
# software or any revisions or derivations thereof.
# --- UCSF Chimera Copyright ---

# This script is generated by the Modeller Dialog in Chimera, 
# incorporating the settings from the user interface. User can revise 
# this script and submit the modified one into the Advanced Option panel. 


# Import the Modeller module
from modeller import *
from modeller.automodel import *    
import operator


# ---------------------- namelist.dat --------------------------------
# A "namelist.dat" file contains the file names, which was output from 
# the Modeller dialog in Chimera based on user's selection.
# The first line it the name of the target sequence, the remaining 
# lines are name of the template structures
namelist = open( 'namelist.dat', 'r' ).read().split('\n')
tarSeq = namelist[0]
template = tuple( [ x.strip() for x in namelist[1:] if x != '' ] )
# ---------------------- namelist.dat --------------------------------

# This instructs Modeller to display all log output. 
log.verbose()

# create a new Modeller environment
env = environ()

# Directory of atom/PDB/structure files. It is a relative path, inside 
# a temp folder generated by Chimera. User can modify it and add their 
# absolute path containing the structure files.
env.io.atom_files_directory = ['.', './template_struc']



# Read in HETATM records from template PDBs 
env.io.hetatm = True

# Read in water molecules from template PDBs 
env.io.water = True

# create a subclass of automodel or loopmodel, MyModel.
# user can further modify the MyModel class, to override certain routine.
class MyModel(loopmodel):
	def select_loop_atoms(self):
		from modeller import selection
		return selection(
			self.residue_range('1:A', '34:A'),
			self.residue_range('262:A', '276:A'))
	def select_atoms(self):
		from modeller import selection
		return selection(
			self.residue_range('1:A', '34:A'),
			self.residue_range('262:A', '276:A'))

		def customised_function(self): pass
		#code overrides the special_restraints method

		#def special_restraints(self, aln):

		#code overrides the special_patches method.
		# e.g. to include the addtional disulfides.
		#def special_patches(self, aln):
		
a = MyModel(env, sequence = tarSeq,
		# alignment file with template codes and target sequence
		alnfile = 'alignment.ali',
		# name of initial PDB template
		knowns = template[0])

# one fixed model to base loops on
a.starting_model = 1
a.ending_model = 1

# 5 loop models
loopRefinement = True
a.loop.starting_model = 1
a.loop.ending_model = 5
a.loop.assess_methods=(assess.DOPE, assess.GA341, assess.normalized_dope)

# Assesses the quality of models using
# the DOPE (Discrete Optimized Protein Energy) method (Shen & Sali 2006)
# and the GA341 method (Melo et al 2002, John & Sali 2003)
a.assess_methods = (assess.GA341, assess.normalized_dope)

# ------------------------- build all models --------------------------
a.make()

# ---------- Accesing output data after modeling is complete ----------

# Get a list of all successfully built models from a.outputs
if loopRefinement:
	ok_models = list(filter(lambda x: x['failure'] is None, a.loop.outputs))
else:
	ok_models = list(filter(lambda x: x['failure'] is None, a.outputs))

# Rank the models by index number
#key = 'num'
#ok_models.sort(lambda a,b: cmp(a[key], b[key]))
ok_models = sorted(ok_models,key=operator.itemgetter('num'))

# Output the list of ok_models to file ok_models.dat 
fMoutput = open('ok_models.dat', 'w')
fMoutput.write('File name of aligned model\t GA341\t zDOPE \n')

for m in ok_models:
	results  = '%s\t' % m['name']
	results += '%.5f\t' % m['GA341 score'][0]
	results += '%.5f\n' % m['Normalized DOPE score']
	fMoutput.write( results )

fMoutput.close()
Licensed under CC BY-NC-SA 4.0