perl多进程下载网页中所有链接的代码_Perl

当前位置：首页 > 脚本编程 > Perl > 正文

perl多进程下载网页中所有链接的代码: 发布时间：2020-03-30编辑：脚本学堂

perl多进程下载网页中所有链接的代码，感兴趣的朋友可以参考下。

复制代码代码如下:

#!c:perlbinperl.exe -w
use LWP::Simple;
my $last=264; #264是最后一页，web看出来的没有写自动的

#获取所有文章列表页面，存入@index_url
my $url_t = 'http://linux.chinaunix.netindex_';
   my @index_url ={};
   my $index_url="";
   my $i =1;
   $index_url[0] = "http://linux.chinaunix.netindex.shtml";
   for ($i = 1; $i<$last; $i++)
{
    $index_url[$i] = $url_t."$i.shtml";
        }

my $maxchild=10; #最多10个进程

for($i=0;$i<=$maxchild-1;$i++)
{
my $child=fork();
   if($child)
{   # child >; 0, so we're the parent
        warn "launching child $childn";
}else{
       do_child($i);         # child handles
    exit 0;             # child is done
}
}
sub do_child
{
   my $url="";
   my $url_head="http://linux.chinaunix.net/***";
   my $location = "e:mysoftperllinuxdoc";
   my $t = 1;
   my $webdoc="";

my $child_number=shift(@_);
   print("child ,$child_number n");

   for($i=$child_number;$i<$last;$i=$i+$maxchild)
   {
    $index_url = @index_url[$i];
    my $webdoc = get $index_url;
$j=1;
print "正在处理 $index_url","n";
    while ($webdoc =~ m#(/d{4}/d{2}/d{2}/d*.shtml)(.*?)14px">(.*?)<#sgi)
    {

print $j,":$1---$3:";
$j++;
my $url=$url_head.$1;
my $file=$location.$3.".html";

my $code = getstore($url,$file);
if (is_error($code))
           {
       my $code= getstore($url,$location.$t.".html");
       $t+=1;
              print "--succeed 3--n";
# $t   针对文件名中有特殊字符，改用递增数字命名文件
    }
else
{
              print "--succeed 1--n";
}
    }
   }
   exit;
}

说明：
开了10个进程从8:18开始下载，一共有264.shtml个页面，每个页面上有30个文件链接
263*30=7890
遇到index_54.shtml 171.shtml中间有错误
use of uninitialized value $webdoc in pattern match (m//) at line 50

9:04 不到处理完共下载了6512个文件。（7890个文件中有些是同名字的文件）。

上一篇：Perl多线程和多进程实例
下一篇：perl的printf函数

与 perl多进程下载网页中所有链接的代码有关的文章

本文标题：perl多进程下载网页中所有链接的代码
本页链接：http://www.jb200.com/article/3090.html

浏览排行

栏目分类

热点文章

perl多进程下载网页中所有链接的代码