用户注册



邮箱:

密码:

用户登录


邮箱:

密码:
记住登录一个月忘记密码?

发表随想


还能输入:200字
云代码 - perl代码库

perl 提取url

2012-10-15 作者: 神马举报

[perl]代码库

#-----------------------------
use HTML::LinkExtor;

$parser = HTML::LinkExtor->new ( undef, $base_url );
$parser->parse_file ( $filename );
@links = $parser->links;
foreach $linkarray ( @links )
{
	my @element = @$linkarray;
	my $elt_type = shift @element;
# element type

# possibly test whether this is an element we're interested in
	while (@element) {
# extract the next attribute and its value
		my ($attr_name, $attr_value) = splice(@element, 0, 2);
# ... do something with them ...
	}
}
#-----------------------------
<A HREF="http://www.perl.com/">Home page</A>
        <IMG SRC="images/big.gif" LOWSRC="images/big-lowres.gif">
#-----------------------------
                                         [
                                             [ a,   href   => "http://www.perl.com/" ],
                                             [ img, src    => "images/big.gif",
                                               lowsrc => "images/big-lowres.gif" ]
                                         ]
#-----------------------------
if ($elt_type eq 'a' && $attr_name eq 'href') {
	print "ANCHOR: $attr_value\n"
	if $attr_value->scheme =~ /http|ftp/;
}
if ($elt_type eq 'img' && $attr_name eq 'src') {
	print "IMAGE:  $attr_value\n";
}
#-----------------------------
# download the following standalone program
#!/usr/bin/perl -w
# xurl - extract unique, sorted list of links from URL
use HTML::LinkExtor;
use LWP::Simple;

$base_url = shift;
$parser = HTML::LinkExtor->new(undef, $base_url);
$parser->parse(get($base_url))->eof;
@links = $parser->links;
foreach $linkarray (@links) {
	my @element  = @$linkarray;
	my $elt_type = shift @element;
	while (@element) {
		my ($attr_name , $attr_value) = splice(@element, 0, 2);
		$seen{$attr_value}++;
	}
}
for (sort keys %seen) { print $_, "\n" }

#-----------------------------
#% xurl http://www.perl.com/CPAN
#ftp://ftp@ftp.perl.com/CPAN/CPAN.html
#
#http://language.perl.com/misc/CPAN.cgi
#
#http://language.perl.com/misc/cpan_module
#
#http://language.perl.com/misc/getcpan
#
#http://www.perl.com/index.html
#
#http://www.perl.com/gifs/lcb.xbm
#-----------------------------
<URL:http://www.perl.com>
#-----------------------------
@URLs = ($message =~ /<URL:(.*?)>/g);
#-----------------------------


网友评论    (发表评论)

共1 条评论 1/1页

发表评论:

评论须知:

  • 1、评论每次加2分,每天上限为30;
  • 2、请文明用语,共同创建干净的技术交流环境;
  • 3、若被发现提交非法信息,评论将会被删除,并且给予扣分处理,严重者给予封号处理;
  • 4、请勿发布广告信息或其他无关评论,否则将会删除评论并扣分,严重者给予封号处理。


扫码下载

加载中,请稍后...

输入口令后可复制整站源码

加载中,请稍后...