perl应用之SNP的提取（1）：lastz_Perl

当前位置：首页 > 脚本编程 > Perl > 正文

perl应用之SNP的提取（1）：lastz: 发布时间：2020-12-09编辑：脚本学堂

需求如下：有18个sample（样品）的DNA序列，然后有一个对比序列ref，我们的任务就是对照ref序列，从18个sample中提取出SNP位点。

需求如下：
有18个sample（样品）的DNA序列，然后有一个对比序列ref，我们的任务就是对照ref序列，从18个sample中提取出SNP位点。

详细步骤：
1.首先利用lastz把每一个样品进行对比，从中找出配匹的大片段。这个过程需要用的是lastz的用法和perl中系统命令。
关于lastz，用法可以用google "lastz"，或者直接到http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html这里去看。
使用之前还是要慢慢的看一看的。

我们用一个perl程序，处理所有的18个样品。

复制代码代码如下:

#!/usr/bin/perl
use strict;
use warnings;

system 'lastz TAIR10_chr5.fas bur_0.v7.PR_in_lowercasedNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_bur.maf';
system 'lastz TAIR10_chr5.fas can_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_can.maf';
system 'lastz TAIR10_chr5.fas ct_1.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_ct.maf';
system 'lastz TAIR10_chr5.fas edi_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_edi.maf';
system 'lastz TAIR10_chr5.fas hi_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_hi.maf';
system 'lastz TAIR10_chr5.fas kn_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_kn.maf';
system 'lastz TAIR10_chr5.fas ler_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_ler.maf';
system 'lastz TAIR10_chr5.fas mt_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_mt.maf';
system 'lastz TAIR10_chr5.fas no_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_no.maf';
system 'lastz TAIR10_chr5.fas oy_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_oy.maf';
system 'lastz TAIR10_chr5.fas po_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_po.maf';
system 'lastz TAIR10_chr5.fas rsch_4.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_rsch.maf';
system 'lastz TAIR10_chr5.fas sf_2.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_sf.maf';
system 'lastz TAIR10_chr5.fas tsu_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_tsu.maf';
system 'lastz TAIR10_chr5.fas wil_2.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_wil.maf';
system 'lastz TAIR10_chr5.fas ws_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_ws.maf';
system 'lastz TAIR10_chr5.fas wu_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_wu.maf';
system 'lastz TAIR10_chr5.fas zu_0.v7.PR_in_lowercaseDNA5.txt --ambiguous=iupac --notransition --step=20 --nogapped --format=maf >TAIR_vs_zu.maf';

这里我们只需要看第一个样品就可以了。后面都是17个重复，不过样品不一样罢了。system就是调用系统命令，我们这里调用的就是lastz，然后lastz后面紧跟的是你需要对比的两个序列的名称，以谁为参考序列谁在前。然后后面的参数都是以--开头的。

--ambiguous=iupac 大致的意思就是可以忽略不同的碱基，我们都知道DNA的类型只有四种，但是却有很多不确定的情况，就像R可以代表的就是A,G中的一种，因为测序的原因，我们无法准确的确定。

附：
标准核酸代码表
标准核酸（DNA、RNA）代码（符号）表
Standard IUB/IUPAC nucleic acid codes (symbols) table

Code Nucleic Acid(s)
A---- Adenine
C---- Cytosine
G---- Guanine
T---- Thymine
U---- Uracil
M---- A or C (amino)
R---- A or G (purine)
W---- A or T (weak)
S---- C or G (strong)

Y---- C or T (pyrimidine)
K---- G or T (keto)
V---- A or C or G
H---- A or C or T
D---- A or G or T
B---- C or G or T
N---- A or G or C or T (any)

后面的大家自己去lastz的网站去看，然后--format=maf确定的输出文件的类型，>TAIR_vs_bur.maf是输出到TAIR_vs_bur.maf的文件。

上一篇：perl应用之提取snp后续处理：删除带有“—”的行
下一篇：perl应用之SNP的提取（2）：从对比序列中找到SNP位点并输出

与 perl应用之SNP的提取（1）：lastz 有关的文章

本文标题：perl应用之SNP的提取（1）：lastz
本页链接：http://www.jb200.com/article/4387.html

浏览排行

栏目分类

热点文章

perl应用之SNP的提取（1）：lastz