编程随想

会python真的可以为所欲为 by Python自学0(回) 398天前

这里还有人吗 by mikeKil1(回) 583天前

这里还有人吗 by mikeKil0(回) 583天前

每天面对着电脑屏幕，敲打键盘。我所面对的并不只是代码，而是一种生活方式。 by js特效0(回) 808天前

到处都是羊，不想上班 by Python自学0(回) 888天前

鸽子 by 张书娥0(回) 892天前

云代码 - perl代码库

perl 提取或删除HTML标签

2012-10-15 作者：神马举报

[perl]代码库

#-----------------------------
( $plain_text = $html_text ) =~ s/<[^>]*>//gs;     #WRONG
#-----------------------------
                              use HTML::Parse;
use HTML::FormatText;
$plain_text = HTML::FormatText->new->format ( parse_html ( $html_text ) );
#-----------------------------
#% perl -pe 's/<[^>]*>//g' file
#-----------------------------
#<IMG SRC = "foo.gif"
#     ALT = "Flurp!">
#-----------------------------
#% perl -0777 -pe 's/<[^>]*>//gs' file
#-----------------------------
{
    local $/;
# temporary whole-file input mode
    $html = <FILE>;
    $html =~ s/<[^>]*>//gs;
       }
#-----------------------------
#<IMG SRC = "foo.gif" ALT = "A > B">
#
#<!-- <A comment> -->
#
#<script>if (a<b && a>c)</script>
#
#<# Just data #>
#
#<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
#-----------------------------
#<!-- This section commented out.
#    <B>You can't see me!</B>
#-->
#-----------------------------
       package MyParser;
use HTML::Parser;
use HTML::Entities qw(decode_entities);
 
@ISA = qw(HTML::Parser);
 
sub text {
    my($self, $text) = @_;
    print decode_entities($text);
}
 
package main;
MyParser->new->parse_file(*F);
#-----------------------------
($title) = ($html =~ m#<TITLE>\s*(.*?)\s*</TITLE>#is);
#-----------------------------
# download the following standalone program
#!/usr/bin/perl
# htitle - get html title from URL
 
die "usage: $0 url ...\n" unless @ARGV;
require LWP;
 
foreach $url (@ARGV) {
    $ua = LWP::UserAgent->new();
    $res = $ua->request(HTTP::Request->new(GET => $url));
    print "$url: " if @ARGV > 1;
if ($res->is_success) {
        print $res->title, "\n";
    } else {
        print $res->status_line, "\n";
    }
}
 
#-----------------------------
#% htitle http://www.ora.com
#www.oreilly.com -- Welcome to O'Reilly & Associates!
#
#% htitle http://www.perl.com/ http://www.perl.com/nullvoid
#http://www.perl.com/: The www.perl.com Home Page
#http://www.perl.com/nullvoid: 404 File Not Found
#-----------------------------

网友评论 (发表评论)

by: 发表于：2017-09-18 17:48:52 顶(0) | 踩(0) 回复
？？
回复评论

还能输入：1000字

共1 条评论 1/1页

发表评论：

评论须知：

1、评论每次加2分，每天上限为30；
2、请文明用语，共同创建干净的技术交流环境；
3、若被发现提交非法信息，评论将会被删除，并且给予扣分处理，严重者给予封号处理；
4、请勿发布广告信息或其他无关评论，否则将会删除评论并扣分，严重者给予封号处理。

用户注册

用户登录

发表随想

该用户最新代码

编程随想

perl 提取或删除HTML标签

[perl]代码库

网友评论 (发表评论)

回复评论

发表评论：

评论须知：

扫码下载

输入口令后可复制整站源码