过滤文本段(python、perl)代码

发布时间:2021-01-02编辑:脚本学堂
过滤文本段(python、perl)代码,对文本进行处理,详见以下介绍,有需要的朋友可以参考下。

过滤文本段(python、perl)代码,对文本进行处理,详见以下介绍,有需要的朋友可以参考下。
如下文本:
## Alignment 0: score=397.0 e_value=8.2e-18 N=9 scaffold1&scaffold106 minus
  0-  0:        10026549        10007782          2e-75
  0-  1:        10026550        10007781         8e-150
  0-  2:        10026552        10007780         1e-116
  0-  3:        10026555        10007778              0
  0-  4:        10026570        10007768              0
  0-  5:        10026579        10007758          4e-15
  0-  6:        10026581        10007738          2e-44
  0-  7:        10026587        10007734         9e-145
  0-  8:        10026591        10007732         2e-147
## Alignment 1: score=2304.0 e_value=1e-164 N=47 scaffold1&scaffold107 minus
  1-  0:        10026836        10007942          2e-84
  1-  1:        10026839        10007940              0
  1-  2:        10026840        10007938              0
  1-  3:        10026842        10007937          9e-82
  1-  4:        10026843        10007935          7e-79
  1-  5:        10026847        10007933         3e-119
  1-  6:        10026850        10007932          2e-87
  1-  7:        10026854        10007928          5e-22
  1-  8:        10026855        10007927         3e-101
  1-  9:        10026856        10007925         1e-106
  1- 10:        10026857        10007924              0
  1- 11:        10026858        10007922         9e-123
  1- 12:        10026859        10007921          1e-80
  1- 13:        10026860        10007920         8e-104
  1- 14:        10026862        10007918          4e-25
  1- 15:        10026863        10007917              0
  1- 16:        10026864        10007912          4e-40
  1- 17:        10026865        10007911              0
  1- 18:        10026866        10007910         7e-122
  1- 19:        10026867        10007908          2e-25
  1- 20:        10026868        10007907              0
  1- 21:        10026869        10007905              0
  1- 22:        10026870        10007904         3e-150
  1- 23:        10026871        10007903          5e-77
  1- 24:        10026874        10007901              0
  1- 25:        10026875        10007897              0
  1- 26:        10026876        10007896              0
  1- 27:        10026877        10007894              0
  1- 28:        10026880        10007893          3e-52
  1- 29:        10026881        10007892              0
  1- 30:        10026882        10007891              0
  1- 31:        10026883        10007890              0
  1- 32:        10026886        10007889          1e-50
  1- 33:        10026887        10007888         6e-157
  1- 34:        10026888        10007887              0
  1- 35:        10026889        10007884              0
  1- 36:        10026890        10007883          2e-18
  1- 37:        10026891        10007882          9e-64
  1- 38:        10026892        10007881              0
  1- 39:        10026895        10007880              0
  1- 40:        10026898        10007875              0
  1- 41:        10026900        10007874              0
  1- 42:        10026901        10007873              0
  1- 43:        10026902        10007871         2e-123
  1- 44:        10026903        10007870              0
  1- 45:        10026905        10007869              0
  1- 46:        10026909        10007868          1e-81
## Alignment 2: score=811.0 e_value=3.3e-43 N=17 scaffold1&scaffold111 minus
  2-  0:        10026595        10007449          6e-40
  2-  1:        10026599        10007448          4e-90
  2-  2:        10026600        10007447              0
  2-  3:        10026601        10007444          9e-55
  2-  4:        10026603        10007438          4e-78
  2-  5:        10026604        10007434         9e-122
  2-  6:        10026606        10007432         2e-162
  2-  7:        10026607        10007427              0
  2-  8:        10026608        10007426              0
  2-  9:        10026612        10007417              0
  2- 10:        10026613        10007415         8e-128
  2- 11:        10026614        10007414          3e-64
  2- 12:        10026615        10007409              0
  2- 13:        10026616        10007406              0
  2- 14:        10026617        10007403         1e-171
  2- 15:        10026618        10007402              0
  2- 16:        10026619        10007397          7e-18
........
如果Alignment后面少于20行,把整个的去掉

python代码:
 

复制代码 代码如下:
#!/usr/bin/python 
sum = 0
sumdata = []
FD = open("/root/data.txt","r")
line = FD.readline()
while line:
if line.find("Alignment") == 3:
if sum >= 20:
for i in sumdata:
print i,
sum=0
sumdata=[line]
else:
sum = sum + 1
sumdata.append(line)
line=FD.readline()
if len(line) == 0:
if sum >= 20:
for i in sumdata:
print i,

perl代码
 

复制代码 代码如下:

#!/usr/bin/perl
open(FD,"/root/data.txt");
while (<FD>){
        if ($_ =~ /Alignment/){
                if($sum >= 20){
                        print @sumdata;}
                $sum=0;
                @sumdata=($_);}
        else{
                $sum++;
                push(@sumdata,$_);}

}
print @sumdata if $sum >=20;
close(FD);