RAMplot拉氏图建模工具

l拉氏图不用过多介绍，主要是用来评价模型的，这次在网上找到一个作图挺好看的拉氏图软件，推举给大家,

github地址： https://github.com/mungpeter/RAMAplot

或者点此下载

1. 安装

首先安装依赖包

1

conda install -c conda-forge biopython

下载:

1

https://github.com/mungpeter/RAMAplot/archive/master.zip

2. pdb转换为fasta工具

不太适用于非标准氨基酸残基，需要自己添加

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


> 0_pdb2fasta.py
      [PDB filename: .pdb]
      [Name of PDB]
      [FASTA Output Name]

  Note: Capping Groups (ACE/NME) and nonstandard amino acids are not recognized
        Need to add 'X' to the result FASTA manually

e.g.> ./0_pdb2fasta.py \
        test.pdb       \
        test           \
        test.fasta

其主要是运用了biopython的操作pdb工具，将其转换后重新建立peptides虽然读取序列

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


...
  m =PDBParser(PERMISSIVE=1).get_structure(pdb_id, pdb_file)

  w = open(output_file, 'w')
  for chain in m.get_chains():
    residues = list(chain.get_residues())
    chain_id = chain.get_id()

    s = ''
    peptides = PPBuilder().build_peptides(chain, aa_only=False)
    for peptide in peptides:
      s = s + peptide.get_sequence() + '\n\n'

    w.write('>{0}_{1}|{2}-{3}\n'.format(
                  pdb_id, chain_id, 
                  residues[0].get_id()[1], residues[-1].get_id()[1] ) )
    w.write(str(s))
...

感觉不太智能，直接撸可能更好，当然也可以用get_resname()函数再通过3字母转化

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


from Bio.PDB.PDBParser import PDBParser

m = PDBParser(PERMISSIVE=1).get_structure('2b3p', '2b3p.pdb')

datalist = {
  'ALA':'A', 'ARG':'R', 'ASN':'N', 'ASP':'D', 'CYS':'C', 'GLN':'Q',
  'GLU':'E', 'GLY':'G', 'HIS':'H', 'ILE':'I', 'LYS':'K', 'MET':'M',
  'PRO':'P', 'PHE':'F', 'SER':'S', 'THR':'T', 'TRP':'W', 'TYR':'Y',
  'LEU':'L', 'VAL':'V', 'A':'ALA', 'R':'ARG', 'N':'ASN', 'D':'ASP',
  'C':'CYS', 'Q':'GLN', 'E':'GLU', 'G':'GLY', 'H':'HIS', 'I':'ILE',
  'K':'LYS', 'M':'MET', 'P':'PRO', 'F':'PHE', 'S':'SER', 'T':'THR',
  'W':'TRP', 'Y':'TYR', 'L':'LEU', 'V':'VAL',
}


for chain in m.get_chains():
    for residue in chain:
        # print(residue.get_resname())
        if residue.get_resname() in datalist.keys():
            print(datalist[residue.get_resname()],end='')

3. 单结构拉氏图

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


> 1_rama_single_structure.py
      -in  [ PDB file for Ramachandran density plot ]
      -img [ Output image name, extension as format (e.g. .png,svg,eps,ps,pdf) ]

     optional:
      -ref [ Directory to Density data for reference Ramachandran distribution ]
      -dpi [ Figure DPI resolution (def: 300) ]


e.g.> ./1_rama_single_structure_gen.py \
        -in 3anr.pdb.bz2               \
        -img 3anr.rama_plot.png        \
        -ref ../pyrama_data

其会制作“Proline”, “Pre-Proline”, “Glycine”, “General"四种类型的图

查看源码可以发现其主要包含三个模块：

第一个模块是读取参考图像数据 RefRamaData

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


def RefRamaData( ref_dir, ref_inf ):

  rama_data = pd.read_csv(ref_dir+'/'+ref_inf.file, delimiter=' ', comment='#')
  rama_ref  = rama_data.pivot(index='phi',columns='psi',values='density')

  # data is transpose to get correct orientation
  ref_obj = ImageData( histo2d=rama_ref.transpose() )

  # extent is different from res_obj.extent
  # color is (white, light color, dark color) at specific contour level
  ref_obj.extent = (-181,181,-181,181)
  ref_obj.colors = ref_inf.cmap
  ref_obj.norm   = mpl.colors.BoundaryNorm(ref_inf.bounds, ref_obj.colors.N)

  return ref_obj

第二个是读取输入残基的数据 InputRamaData

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52


def InputRamaData( pdb_file ):

  Res_Dih = []
  ## extract sequence and dihedral angle data from PDB structure
  pdb_id = pdb_file.split('.pdb')[0].split('/')[-1]

  # use file_handle to deal with zipped PDB
  # going into each chain in model(s), then each of broken pieces in each
  # chain, get peptide and info
  # take non-standard AA and other ligands as well. Non-standard AA will be
  # coverted to standard AA. Water and ligands will not get phi/psi values
  # so will be dropped later at the dataframe level  
  for model in p.get_structure(pdb_id, file_handle(pdb_file) ):
    for chain in model:
      chain_id = chain.get_id()
      polypeptides = PPBuilder().build_peptides(chain, aa_only=False)

      for poly in polypeptides:
        phi_psi = poly.get_phi_psi_list()

        for res_idx, residue in enumerate(poly):
          resname = residue.get_resname()
          resid   = residue.get_id()[1]
          phi, psi = phi_psi[res_idx]

          # convert 3-char AA to 1-char, check if AA is non-standard AA        
          if CheckUnnaturalAA(resname):
            resname = UnnaturalAA(resname)
          one_resname = AA(resname)
        
          Res_Dih.append([one_resname, resid, chain_id, phi, psi])


  # build dataframe of dihedral angle data, drop those without phi/psi
  pdb_df = pd.DataFrame(Res_Dih, 
            columns=['resname','resid','chain','PHI','PSI']).dropna()

  # convert angle from radian to degree
  pdb_df['phi'] = radian2deg(pdb_df.PHI.to_numpy())
  pdb_df['psi'] = radian2deg(pdb_df.PSI.to_numpy())

  # select subsets of residues: Pro, Gly, PrePro, General
  pro_df = pdb_df[pdb_df.resname == 'P']
  gly_df = pdb_df[pdb_df.resname == 'G']
  pp_df  = pdb_df.loc[(pro_df.index-1)]  # pre-Proline residues
  pp_df  = pp_df[pp_df.resname != 'P']
  not_x  = list(pro_df.index) + list(gly_df.index) + list(pp_df.index)
  gen_df = pdb_df.drop(not_x)             # general residues

  res_dict = {'Pro': pro_df, 'Gly': gly_df, 'PreP': pp_df, 'Gen': gen_df}
  
  return res_dict

最后是作图GeneraterImage:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


def GenerateImage( res_dict, ref_dict, img_name, dpi ):

  for idx, (key, ref_obj) in enumerate(sorted(ref_dict.items(), key=lambda x:x[0].lower())):
    
    plt.figure(2, figsize=(8,8))
    plt.subplot(2,2, idx+1)
    plt.title(key, fontsize=16)

    # reference dihedral angle density as background
    plt.imshow( ref_obj.histo2d[::-1], cmap=ref_obj.colors, 
                norm=ref_obj.norm,     extent=ref_obj.extent )

#    plt.grid(linestyle='--')

    # PDB AA backbone dihedral angle distribution
    plt.scatter( res_dict[key].phi, res_dict[key].psi,
                 alpha=0.67, s=8 )

    ## add additional items
    plt.xlim([-180,180])
    plt.ylim([-180,180])
    plt.plot([-180, 180], [0, 0], color="black", linewidth=1)
    plt.plot([0, 0], [-180, 180], color="black", linewidth=1)
    plt.xticks(np.arange(-180,210, step=60), fontsize=14)
    plt.yticks(np.arange(-180,210, step=60), fontsize=14)

    plt.xlabel(r'Phi $\phi$', fontsize=16)
    plt.ylabel(r'Psi $\psi$', fontsize=16)

  plt.tight_layout()
  plt.savefig(img_name, bbox_inches=0, dpi=dpi)

还挺简单的，值得我这种刚入门的学习

4. 单原子拉氏图轨迹

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


> 2x_rama_md_heatmap_gen.py
      -in  [ phi-psi file for Ramachandran density plot ]
      -img [ output image name, extension as format (e.g. .png,svg,eps,ps,pdf) ]
      
    optional:
      -res [ 1-char AminoAcid code for reference selection (def: Pro, PreP, Gly, Gen) ]
      -int [ Resolution (def: 2-deg interval) ]
      -ref [ Density data for reference Ramachandran distribution ]
      -fraction [ Cutoff fraction of the maximum Histogram value (def: 33) ]
      -smooth   [ Histogram data smoothening (def: 1.15) ]
      -t_step   [ Colorbar tick spacing per histogram digits unit (def: 4) ]
      -c_step   [ Histo Contour spacing per histogram digits unit (def: 4) ]
      -dpi      [ Figure DPI resolution (def: 300) ]\n

图是真好看：