在上一篇文章中,我们看了最后的输出结果如下:
pos TAIR bur can ct edi hi kn ler mt no oy po rsch sf tsu wil ws wu zu   
1000 G - A - - - - - - - - - - - - - - - -  
10000003 C - - - - - G - - - - - - - - - - - -  
10000013 A - - R - - - - - - - - - - - - - - -  
10000021 G - - - - - - - - - - - C - - - - - -  
10000034 C - - - - - - - - - - - - - - - - Y -  
10000047 G R - - - - - - - - - - - - - - - R -  
10000049 A R - - - - - - - - - - - - - - - R - 
我们发现里面有许多的--当然详细看过上面文章的会知道,有些地方在一个sample是有SNP的。但是在另一个sample这个位置就可能不是一个SNP,这样,在这个位置就是一个空白。我们设置的输出就是-;
但是这一部分信息,也是我们需要的,所以我们要到原来的.maf文件中去回对,也就是把每个位置的每个样品的碱基信息进行统计。其实关键就是一个回对的过程。
所以我们的解决思路就是:
1.用合并后的信息,建立一个hash,方便后面的查找
2.打开.maf以后,我们对每一个块,(这里说的块,包括score部分,ref序列,sample序列),我们先看在ref中是否有这个点的信息,因为,我设置的score的起始位点是90000.所以会有很多无法匹配的位点。如果有,那么取sample中对应位置的碱基,保存到hash中。
3.重复回对每一个文件,最终得到最后的结果。
 
#!/usr/bin/perl  
  
use strict;  
use warnings;  
  
my  %position;  
my  @information1;  
my  @mout;  
my  @score;  
my  @info1;  
my  @info2;  
my  @sequen1;  
my  @sequen2;  
my  $cout1;  
my  $cout;  
my  @samples;  
my  $samples;  
my  $key1;  
my  $value;  
my  $z;
  
open (DNA,"Chr5_join.txt")||die("can not open");#这里有需要替换成不同的染色体信息  
while(<DNA>)  
{  
@information1=split;  
$position{$information1[0]}{sample0}=$information1[1];  
}  
  
open (DNA1,"TAIR_vs_bur.maf")||die("can not open");  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$z=<DNA1>;  
$/="nn";  
while(<DNA1>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless ($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample1}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
}  
}  
  
open (DNA2,"TAIR_vs_can.maf")||die("can not open");  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$z=<DNA2>;  
$/="nn";  
while(<DNA2>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless ($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample2}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}
}  
}  
  
open (DNA3,"TAIR_vs_ct.maf")||die("can not open");  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$z=<DNA3>;  
$/="nn";  
while(<DNA3>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample3}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA4,"TAIR_vs_edi.maf")||die("can not open");  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$z=<DNA4>;  
$/="nn";  
while(<DNA4>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample4}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA5,"TAIR_vs_hi.maf")||die("can not open");  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$z=<DNA5>;  
$/="nn";  
while(<DNA5>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample5}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
}  
}  
  
open (DNA6,"TAIR_vs_kn.maf")||die("can not open");  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$z=<DNA6>;  
$/="nn";  
while(<DNA6>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample6}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA7,"TAIR_vs_ler.maf")||die("can not open");  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$z=<DNA7>;  
$/="nn";  
while(<DNA7>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample7}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA8,"TAIR_vs_mt.maf")||die("can not open");  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$z=<DNA8>;  
$/="nn";  
while(<DNA8>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample8}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA9,"TAIR_vs_no.maf")||die("can not open");  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$z=<DNA9>;  
$/="nn";  
while(<DNA9>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample9}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA10,"TAIR_vs_oy.maf")||die("can not open");  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$z=<DNA10>;  
$/="nn";  
while(<DNA10>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample10}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA11,"TAIR_vs_po.maf")||die("can not open");  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$z=<DNA11>;  
$/="nn";  
while(<DNA11>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample11}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA12,"TAIR_vs_rsch.maf")||die("can not open");  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$z=<DNA12>;  
$/="nn";  
while(<DNA12>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample12}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA13,"TAIR_vs_sf.maf")||die("can not open");  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$z=<DNA13>;  
$/="nn";  
while(<DNA13>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample13}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA14,"TAIR_vs_tsu.maf")||die("can not open");  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$z=<DNA14>;  
$/="nn";  
while(<DNA14>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample14}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA15,"TAIR_vs_wil.maf")||die("can not open");  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$z=<DNA15>;  
$/="nn";  
while(<DNA15>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample15}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA16,"TAIR_vs_ws.maf")||die("can not open");  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$z=<DNA16>;  
$/="nn";  
while(<DNA16>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample16}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA17,"TAIR_vs_wu.maf")||die("can not open");  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$z=<DNA17>;  
$/="nn";  
while(<DNA17>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample17}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
open (DNA18,"TAIR_vs_zu.maf")||die("can not open");  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$z=<DNA18>;  
$/="nn";  
while(<DNA18>)  
{  
@mout = split/n/,$_;  
@score= split/=/, $mout[0];  
unless($score[1]<90000)  
{  
@info1 = split/s+/,$mout[1];  
@info2 = split/s+/,$mout[2];  
  
@sequen1 = split// ,$info1[6];  
@sequen2 = split// ,$info2[6];  
  
for($cout1=$info1[2]+1;$cout1<$info1[3]+$info1[2]+1;$cout1++)  
{  
if(exists $position{$cout1})  
{  
$cout=$cout1-$info1[2]-1;  
$position{$cout1}{sample18}=$sequen2[$cout];  
}  
else   
{  
next;  
}  
}  
  
}  
}  
  
@samples=qw/sample0 sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12 sample13 sample14 sample15 sample16 sample17 sample18/;  
  
open(ALL,">all_information.txt")||die("can not open!");  
  
foreach $key1(sort keys %position)  
{  
print ALL "$key1 ";  
foreach $samples(@samples)  
{  
if(exists $position{$key1}{$samples})  
{  
$value = $position{$key1}{$samples};  
print ALL "$value ";  
}  
else  
{  
print ALL "- "  
}  
}  
print ALL "n";  
} 
结果如下:
10001656 G G - - G - G G G G - G A G G A G G G   
10001667 G G - - G - G G G G - G G G G S G G G   
10001670 G G - - G T G G G G - G G G G G G G G   
10001672 T T - - T G T T T T - T T T T T T T T   
10001673 G G - - G A G G G G - G G G G G G G G   
10001696 G G - - G G G G G G - G A G G A G G G   
10001713 C C - - C C C C C C - C C C C Y C C C   
10001768 C C - - C C C C C C - C C C C Y C C C   
10001791 A A - - G G G G A A - A G A G G A A G   
10001812 G G - - G G G G A A - G G G G G G G G   
10001816 A A - - A A A A A A - A G A A G A A A   
10001820 C C - - C C C C C C - C C C C S C C C  
我们可以看到好多位置的信息已经补充完整了。
然后一个新的问题就是,我们知道基因在测序的过程中会有很多的重复区域,我们也要把相应的重复区域删除。
删除重复区域其实也是一个hash应用的过程。没有多少亮点。
我们首先来看一下这个repeat区域的格式:
5   0   4758  
5   52485289  
5   56677   56887  
5   63698   63796  
5   64015   64047  
5   64771   71199  
5   78471   78531  
5   138655  138720  
5   138941  139927  
5   140759  140948  
5   162806  162844  
5   301999  302104 
我们要做的就是确定第一个,和最后一个位置,然后利用if循环,使数据递增。然后检查hash中时候有这样一个key值的存在,如果有的话,那么就delete这个hash,如果没有的就保存。就是这样。
我们来看一下代码:
#!/usr/bin/perl  
# move repeat arear out  
  
use strict;  
use warnings;  
  
my %position;  
my @pos;  
my $cout;  
my $key;  
my $value;  
  
open (ALL,"all_information.txt")||die("can not open!");  
while(<ALL>)  
{  
$position{$1}=$2 if $_=~/^(d+)s(.+)$/;  
}  
  
open (POS,"sepChr5.txt")||die("can not open!");  
while(<POS>)  
{  
@pos=split;  
for($cout=$pos[1]+1;$cout<$pos[2]+1;$cout++)  
{  
if(exists $position{$cout})  
{  
delete $position{$cout};  
}  
else   
{  
next;  
}  
}  
}  
  
open (RESULT,">without_repeat_information.txt")||die ("can not open!");  
foreach $key(sort keys %position)  
{  
print RESULT "$key ";  
$value=$position{$key};  
print RESULT "$value n";  
}