使用perl删除文件中的重复行_Perl

当前位置：首页 > 脚本编程 > Perl > 正文

使用perl删除文件中的重复行: 发布时间：2019-12-30编辑：脚本学堂

如果有一个文件data有10G大，但是有好多行都是重复的，需要将该文件中重复的行合并为一行...

如果有一个文件data有10G大，但是有好多行都是重复的，需要将该文件中重复的行合并为一行，那么我们需要用什么办法来实现
cat data |sort|uniq > new_data #该方法可以实现，但是你需要花上好几个小时，结果才能出来。

下面是一个使用perl脚本来完成此功能的小工具。
原理很简单，创建一个hash，每行的内容为键,值由每行出现的次数来填充，脚本如下;

复制代码代码如下:

#!/usr/bin/perl
    # Author :CaoJiangfeng
    # Date:2011-09-28
    # Version :1.0
    use warnings;
    use strict;

my %hash;
my $script = $0; # Get the script name

    sub usage
    {
            printf("Usage:n");
            printf("perl $script <source_file> <dest_file>n");

}

# If the number of parameters less than 2 ,exit the script
if ( $#ARGV+1 < 2) {

            &usage;
            exit 0;
    }

my $source_file = $ARGV[0]; #File need to remove duplicate rows
my $dest_file = $ARGV[1]; # File after remove duplicates rows

open (FILE,"<$source_file") or die "Cannot open file $!n";
open (SORTED,">$dest_file") or die "Cannot open file $!n";

    while(defined (my $line = <FILE>))
    {
            chomp($line);
            $hash{$line} += 1;
            # print "$line,$hash{$line}n";
    }

    foreach my $k (keys %hash) {
            print SORTED "$k,$hash{$k}n";#改行打印出列和该列出现的次数到目标文件
    }
    close (FILE);
    close (SORTED);

上一篇：perl判断文件和目录是否为空的代码
下一篇：perl获取文件夹下所有文件名的代码

与使用perl删除文件中的重复行有关的文章

本文标题：使用perl删除文件中的重复行
本页链接：http://www.jb200.com/article/3208.html

浏览排行

栏目分类

热点文章

1perl连接mysql数据库实例代码

使用perl删除文件中的重复行