在(2)中我们只是取出了一个sample的snp位点,我们的想法是把所有的SNP位点统计到一个文本里面,这样,我们就可以看出哪些SNP位点的出现的频率更大。也就是哪些是真正的SNP位点,只出现在一个样品中的SNP位点,有可能是因为测序不准,导致的假阳性。
思路如下:
1.先打开第一个样品的snp位点,然后将snp在ref的位置和碱基作为hash的key,然后把样品的碱基作为value。
2.先打开第二个样品的snp位点,然后与我们得到的hash进行查找,如果有这个位点,那么就加上第二个样品的碱基。如果不存在,就把这个新的位点加入到hash中
3.打开第三个样品的snp位点,然后我们得到。。。
4打开第四个样品的。。。
5打开第五个。。。
6打开。。。
一直把18个样品都处理完,然后把得到的结果进行输出就ok了。
程序如下:
复制代码 代码如下:
#!usr/bin/perl
use strict;
use warnings;
my @information1;
my @information2;
my %position;
my $key1;
my $key2;
my $key3;
my $value;
my @samples;
my $sample;
@samples=qw/sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16 sample17 sample18/;
open (JOIN,">>Chr5_join.txt")||die("can not open!");#这里是要替换的,输出到不同的染色体中
print JOIN "pos TAIR bur can ct edi hi kn ler mt no oy po rsch sf tsu wil ws wu zu n";
open (SNP1,"snpTAIR_vs_bur.txt")||die("can not open!");#这里打开的文件要检查一下,因为后面有变成snp_TAIR_vs_bur.maf.txt
while (<SNP1>)
{
chomp;
@information1=split;
$position{$information1[0]}{$information1[1]}{sample1}=$information1[3];
}
close SNP1;
open (SNP2,"snpTAIR_vs_can.txt")||die("can not open!");
while(<SNP2>)#{{{
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample2}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample2}=$information2[3];
}
}#}}}
close SNP2;
open (SNP3,"snpTAIR_vs_ct.txt")||die("can not open!");
while(<SNP3>)#{{{
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample3}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample3}=$information2[3];
}
}#}}}
close SNP3;
open (SNP4,"snpTAIR_vs_edi.txt")||die("can not open!");
while(<SNP4>)#{{{
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample4}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample4}=$information2[3];
}
}#}}}
close SNP4;
open (SNP5,"snpTAIR_vs_hi.txt")||die("can not open!");
while(<SNP5>)#{{{
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample5}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample5}=$information2[3];
}
}#}}}
close SNP5;
open (SNP6,"snpTAIR_vs_kn.txt")||die("can not open!");
while(<SNP6>)#{{{
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample6}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample6}=$information2[3];
}
}#}}}
close SNP6;
open (SNP7,"snpTAIR_vs_ler.txt")||die("can not open!");
while(<SNP7>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample7}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample7}=$information2[3];
}
}
close SNP7;
open (SNP8,"snpTAIR_vs_mt.txt")||die("can not open!");
while(<SNP8>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample8}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample8}=$information2[3];
}
}
close SNP8;
open (SNP9,"snpTAIR_vs_no.txt")||die("can not open!");
while(<SNP9>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample9}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample9}=$information2[3];
}
}
close SNP9;
open (SNP10,"snpTAIR_vs_oy.txt")||die("can not open!");
while(<SNP10>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample10}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample10}=$information2[3];
}
}
close SNP10;
open (SNP11,"snpTAIR_vs_po.txt")||die("can not open!");
while(<SNP11>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample11}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample11}=$information2[3];
}
}
close SNP11;
open (SNP12,"snpTAIR_vs_rsch.txt")||die("can not open!");
while(<SNP12>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample12}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample12}=$information2[3];
}
}
close SNP12;
open (SNP13,"snpTAIR_vs_sf.txt")||die("can not open!");
while(<SNP13>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample13}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample13}=$information2[3];
}
}
close SNP13;
open (SNP14,"snpTAIR_vs_tsu.txt")||die("can not open!");
while(<SNP14>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample14}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample14}=$information2[3];
}
}
close SNP14;
open (SNP15,"snpTAIR_vs_wil.txt")||die("can not open!");
while(<SNP15>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample15}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample15}=$information2[3];
}
}
close SNP15;
open (SNP16,"snpTAIR_vs_ws.txt")||die("can not open!");
while(<SNP16>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample16}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample16}=$information2[3];
}
}
close SNP16;
open (SNP17,"snpTAIR_vs_wu.txt")||die("can not open!");
while(<SNP17>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample17}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample17}=$information2[3];
}
}
close SNP17;
open (SNP18,"snpTAIR_vs_zu.txt")||die("can not open!");
while(<SNP18>)
{
chomp;
@information2=split;
if ($position{$information2[0]}{$information2[1]})
{
$position{$information2[0]}{$information2[1]}{sample18}=$information2[3];
}
else
{
$position{$information2[0]}{$information2[1]}{sample18}=$information2[3];
}
}
close SNP18;
foreach $key1(sort keys %position) #遍历输出
{
foreach $key2(sort keys %{$position{$key1}})
{
print JOIN "$key1 $key2";
foreach $sample(@samples)
{
if (exists $position{$key1}{$key2}{$sample})
{
$value=$position{$key1}{$key2}{$sample};
print JOIN " $value";
}
else
{
print JOIN " -"
}
}
}
print JOIN "n";
}
得到如下的结果:
pos TAIR bur can ct edi hi kn ler mt no oy po rsch sf tsu wil ws wu zu
1000 G - A - - - - - - - - - - - - - - - -
10000003 C - - - - - G - - - - - - - - - - - -
10000013 A - - R - - - - - - - - - - - - - - -
10000021 G - - - - - - - - - - - C - - - - - -
10000034 C - - - - - - - - - - - - - - - - Y -
10000047 G R - - - - - - - - - - - - - - - R -
10000049 A R - - - - - - - - - - - - - - - R -
10000067 C Y - - - - - - - - - - - - - - - Y -
10000087 G - - R - - - - - - - - - - - - - - -
10000098 A - - S - - G - - - - - - - - G - G G
10000118 G R - R - - - - - - - - A - - R - R -
10000121 G - - - - - - - - - - - - - - R - R -
10000139 C Y - - - - - - - - - - T - - Y - Y -
10000157 A R - - - - - - - - - - - - - R - R -
10000170 G - - - - - - - - - - - - - - - - A -
10000176 C - - - - - - - - - - - - - - - - A -
10000177 T Y - - - - - - - - - - - - - - - C -
10000183 C - - - - - - - - - - - - A - - - - -
10000188 A - - - - - - - - - - - - C - - - - -
10000192 T - - - - - - - - - - - - A - - - - -
10000194 C Y - Y - - - - - - - - - T - - - - -
10000195 G - - - - - - - - - - - - C - - - - -
10000196 A - - - - - - - - - - - - G - - - - -
10000197 T - - - - - - - - - - - - A - - - - -
10000201 C - - - - - - - - - - - - T - - - - -
10000205 T - - - - - - - - - - - - C - - - - -
10000207 C - - Y - - - - - - - - - T - - - - -
我们可以对每一行进行读取,然后对-进行计数,如果-多余一定的量,那么就把这一行删除。
复制代码 代码如下:
#!/usr/bin/perl
use strict;
use warnings;
my @datas;
my $data;
my $numb=0;
my $output;
open (SNP,"Chr1_join.txt")||die("can not open !");
open (MORE,">3_more_snp.txt")||die("can not open!");
while(<SNP>)
{
$output=$_;
@datas=split;
foreach $data(@datas)
{
if ($data=~"-")
{
$numb++;
}
else
{
next;
}
}
if ($numb<15)
{
$numb=0;
print MORE "$outputn";
}
else
{
$numb=0;
}
}