Rev 311 | Blame | Compare with Previous | Last modification | View Log | RSS feed
############################################################################## Pod/ParseUtils.pm -- helpers for POD parsing and conversion## Copyright (C) 1999-2000 by Marek Rouchal. All rights reserved.# This file is part of "PodParser". PodParser is free software;# you can redistribute it and/or modify it under the same terms# as Perl itself.#############################################################################package Pod::ParseUtils;use strict;use vars qw($VERSION);$VERSION = '1.36'; ## Current version of this packagerequire 5.005; ## requires this Perl version or later=head1 NAMEPod::ParseUtils - helpers for POD parsing and conversion=head1 SYNOPSISuse Pod::ParseUtils;my $list = new Pod::List;my $link = Pod::Hyperlink->new('Pod::Parser');=head1 DESCRIPTIONB<Pod::ParseUtils> contains a few object-oriented helper packages forPOD parsing and processing (i.e. in POD formatters and translators).=cut#-----------------------------------------------------------------------------# Pod::List## class to hold POD list info (=over, =item, =back)#-----------------------------------------------------------------------------package Pod::List;use Carp;=head2 Pod::ListB<Pod::List> can be used to hold information about POD lists(written as =over ... =item ... =back) for further processing.The following methods are available:=over 4=item Pod::List-E<gt>new()Create a new list object. Properties may be specified through a hashreference like this:my $list = Pod::List->new({ -start => $., -indent => 4 });See the individual methods/properties for details.=cutsub new {my $this = shift;my $class = ref($this) || $this;my %params = @_;my $self = {%params};bless $self, $class;$self->initialize();return $self;}sub initialize {my $self = shift;$self->{-file} ||= 'unknown';$self->{-start} ||= 'unknown';$self->{-indent} ||= 4; # perlpod: "should be the default"$self->{_items} = [];$self->{-type} ||= '';}=item $list-E<gt>file()Without argument, retrieves the file name the list is in. This musthave been set before by either specifying B<-file> in the B<new()>method or by calling the B<file()> method with a scalar argument.=cut# The POD file name the list appears insub file {return (@_ > 1) ? ($_[0]->{-file} = $_[1]) : $_[0]->{-file};}=item $list-E<gt>start()Without argument, retrieves the line number where the list started.This must have been set before by either specifying B<-start> in theB<new()> method or by calling the B<start()> method with a scalarargument.=cut# The line in the file the node appearssub start {return (@_ > 1) ? ($_[0]->{-start} = $_[1]) : $_[0]->{-start};}=item $list-E<gt>indent()Without argument, retrieves the indent level of the list as specifiedin C<=over n>. This must have been set before by either specifyingB<-indent> in the B<new()> method or by calling the B<indent()> methodwith a scalar argument.=cut# indent levelsub indent {return (@_ > 1) ? ($_[0]->{-indent} = $_[1]) : $_[0]->{-indent};}=item $list-E<gt>type()Without argument, retrieves the list type, which can be an arbitrary value,e.g. C<OL>, C<UL>, ... when thinking the HTML way.This must have been set before by either specifyingB<-type> in the B<new()> method or by calling the B<type()> methodwith a scalar argument.=cut# The type of the list (UL, OL, ...)sub type {return (@_ > 1) ? ($_[0]->{-type} = $_[1]) : $_[0]->{-type};}=item $list-E<gt>rx()Without argument, retrieves a regular expression for simplifying theindividual item strings once the list type has been determined. Usage:E.g. when converting to HTML, one might strip the leading number inan ordered list as C<E<lt>OLE<gt>> already prints numbers itself.This must have been set before by either specifyingB<-rx> in the B<new()> method or by calling the B<rx()> methodwith a scalar argument.=cut# The regular expression to simplify the itemssub rx {return (@_ > 1) ? ($_[0]->{-rx} = $_[1]) : $_[0]->{-rx};}=item $list-E<gt>item()Without argument, retrieves the array of the items in this list.The items may be represented by any scalar.If an argument has been given, it is pushed on the list of items.=cut# The individual =items of this listsub item {my ($self,$item) = @_;if(defined $item) {push(@{$self->{_items}}, $item);return $item;}else {return @{$self->{_items}};}}=item $list-E<gt>parent()Without argument, retrieves information about the parent holding thislist, which is represented as an arbitrary scalar.This must have been set before by either specifyingB<-parent> in the B<new()> method or by calling the B<parent()> methodwith a scalar argument.=cut# possibility for parsers/translators to store information about the# lists's parent objectsub parent {return (@_ > 1) ? ($_[0]->{-parent} = $_[1]) : $_[0]->{-parent};}=item $list-E<gt>tag()Without argument, retrieves information about the list tag, which can beany scalar.This must have been set before by either specifyingB<-tag> in the B<new()> method or by calling the B<tag()> methodwith a scalar argument.=back=cut# possibility for parsers/translators to store information about the# list's objectsub tag {return (@_ > 1) ? ($_[0]->{-tag} = $_[1]) : $_[0]->{-tag};}#-----------------------------------------------------------------------------# Pod::Hyperlink## class to manipulate POD hyperlinks (L<>)#-----------------------------------------------------------------------------package Pod::Hyperlink;=head2 Pod::HyperlinkB<Pod::Hyperlink> is a class for manipulation of POD hyperlinks. Usage:my $link = Pod::Hyperlink->new('alternative text|page/"section in page"');The B<Pod::Hyperlink> class is mainly designed to parse the contents of theC<LE<lt>...E<gt>> sequence, providing a simple interface for accessing thedifferent parts of a POD hyperlink for further processing. It can also beused to construct hyperlinks.=over 4=item Pod::Hyperlink-E<gt>new()The B<new()> method can either be passed a set of key/value pairs or a singlescalar value, namely the contents of a C<LE<lt>...E<gt>> sequence. An objectof the class C<Pod::Hyperlink> is returned. The value C<undef> indicates afailure, the error message is stored in C<$@>.=cutuse Carp;sub new {my $this = shift;my $class = ref($this) || $this;my $self = +{};bless $self, $class;$self->initialize();if(defined $_[0]) {if(ref($_[0])) {# called with a list of parameters%$self = %{$_[0]};$self->_construct_text();}else {# called with L<> contentsreturn unless($self->parse($_[0]));}}return $self;}sub initialize {my $self = shift;$self->{-line} ||= 'undef';$self->{-file} ||= 'undef';$self->{-page} ||= '';$self->{-node} ||= '';$self->{-alttext} ||= '';$self->{-type} ||= 'undef';$self->{_warnings} = [];}=item $link-E<gt>parse($string)This method can be used to (re)parse a (new) hyperlink, i.e. the contentsof a C<LE<lt>...E<gt>> sequence. The result is stored in the current object.Warnings are stored in the B<warnings> property.E.g. sections like C<LE<lt>open(2)E<gt>> are deprecated, as they do not pointto Perl documents. C<LE<lt>DBI::foo(3p)E<gt>> is wrong as well, the manpagesection can simply be dropped.=cutsub parse {my $self = shift;local($_) = $_[0];# syntax check the link and extract destinationmy ($alttext,$page,$node,$type,$quoted) = (undef,'','','',0);$self->{_warnings} = [];# collapse newlines with whitespaces/\s*\n+\s*/ /g;# strip leading/trailing whitespaceif(s/^[\s\n]+//) {$self->warning('ignoring leading whitespace in link');}if(s/[\s\n]+$//) {$self->warning('ignoring trailing whitespace in link');}unless(length($_)) {_invalid_link('empty link');return;}## Check for different possibilities. This is tedious and error-prone# we match all possibilities (alttext, page, section/item)#warn "DEBUG: link=$_\n";# only page# problem: a lot of people use (), or (1) or the like to indicate# man page sections. But this collides with L<func()> that is supposed# to point to an internal funtion...my $page_rx = '[\w.-]+(?:::[\w.-]+)*(?:[(](?:\d\w*|)[)]|)';# page name onlyif(/^($page_rx)$/o) {$page = $1;$type = 'page';}# alttext, page and "section"elsif(m{^(.*?)\s*[|]\s*($page_rx)\s*/\s*"(.+)"$}o) {($alttext, $page, $node) = ($1, $2, $3);$type = 'section';$quoted = 1; #... therefore | and / are allowed}# alttext and pageelsif(/^(.*?)\s*[|]\s*($page_rx)$/o) {($alttext, $page) = ($1, $2);$type = 'page';}# alttext and "section"elsif(m{^(.*?)\s*[|]\s*(?:/\s*|)"(.+)"$}) {($alttext, $node) = ($1,$2);$type = 'section';$quoted = 1;}# page and "section"elsif(m{^($page_rx)\s*/\s*"(.+)"$}o) {($page, $node) = ($1, $2);$type = 'section';$quoted = 1;}# page and itemelsif(m{^($page_rx)\s*/\s*(.+)$}o) {($page, $node) = ($1, $2);$type = 'item';}# only "section"elsif(m{^/?"(.+)"$}) {$node = $1;$type = 'section';$quoted = 1;}# only itemelsif(m{^\s*/(.+)$}) {$node = $1;$type = 'item';}# non-standard: Hyperlink with alt-text - doesn't remove protocol prefix, maybe it should?elsif(/^ \s* (.*?) \s* [|] \s* (\w+:[^:\s] [^\s|]*?) \s* $/ix) {($alttext,$node) = ($1,$2);$type = 'hyperlink';}# non-standard: Hyperlinkelsif(/^(\w+:[^:\s]\S*)$/i) {$node = $1;$type = 'hyperlink';}# alttext, page and itemelsif(m{^(.*?)\s*[|]\s*($page_rx)\s*/\s*(.+)$}o) {($alttext, $page, $node) = ($1, $2, $3);$type = 'item';}# alttext and itemelsif(m{^(.*?)\s*[|]\s*/(.+)$}) {($alttext, $node) = ($1,$2);}# must be an item or a "malformed" section (without "")else {$node = $_;$type = 'item';}# collapse whitespace in nodes$node =~ s/\s+/ /gs;# empty alternative text expands to node nameif(defined $alttext) {if(!length($alttext)) {$alttext = $node || $page;}}else {$alttext = '';}if($page =~ /[(]\w*[)]$/) {$self->warning("(section) in '$page' deprecated");}if(!$quoted && $node =~ m{[|/]} && $type ne 'hyperlink') {$self->warning("node '$node' contains non-escaped | or /");}if($alttext =~ m{[|/]}) {$self->warning("alternative text '$node' contains non-escaped | or /");}$self->{-page} = $page;$self->{-node} = $node;$self->{-alttext} = $alttext;#warn "DEBUG: page=$page section=$section item=$item alttext=$alttext\n";$self->{-type} = $type;$self->_construct_text();1;}sub _construct_text {my $self = shift;my $alttext = $self->alttext();my $type = $self->type();my $section = $self->node();my $page = $self->page();my $page_ext = '';$page =~ s/([(]\w*[)])$// && ($page_ext = $1);if($alttext) {$self->{_text} = $alttext;}elsif($type eq 'hyperlink') {$self->{_text} = $section;}else {$self->{_text} = ($section || '') .(($page && $section) ? ' in ' : '') ."$page$page_ext";}# for being marked up later# use the non-standard markers P<> and Q<>, so that the resulting# text can be parsed by the translators. It's their job to put# the correct hypertext around the linktextif($alttext) {$self->{_markup} = "Q<$alttext>";}elsif($type eq 'hyperlink') {$self->{_markup} = "Q<$section>";}else {$self->{_markup} = (!$section ? '' : "Q<$section>") .($page ? ($section ? ' in ':'') . "P<$page>$page_ext" : '');}}=item $link-E<gt>markup($string)Set/retrieve the textual value of the link. This string contains specialmarkers C<PE<lt>E<gt>> and C<QE<lt>E<gt>> that should be expanded by thetranslator's interior sequence expansion engine to theformatter-specific code to highlight/activate the hyperlink. The detailshave to be implemented in the translator.=cut#' retrieve/set markuped textsub markup {return (@_ > 1) ? ($_[0]->{_markup} = $_[1]) : $_[0]->{_markup};}=item $link-E<gt>text()This method returns the textual representation of the hyperlink as above,but without markers (read only). Depending on the link type this is one ofthe following alternatives (the + and * denote the portions of the textthat are marked up):+perl+ L<perl>*$|* in +perlvar+ L<perlvar/$|>*OPTIONS* in +perldoc+ L<perldoc/"OPTIONS">*DESCRIPTION* L<"DESCRIPTION">=cut# The complete link's textsub text {return $_[0]->{_text};}=item $link-E<gt>warning()After parsing, this method returns any warnings encountered during theparsing process.=cut# Set/retrieve warningssub warning {my $self = shift;if(@_) {push(@{$self->{_warnings}}, @_);return @_;}return @{$self->{_warnings}};}=item $link-E<gt>file()=item $link-E<gt>line()Just simple slots for storing information about the line and the filethe link was encountered in. Has to be filled in manually.=cut# The line in the file the link appearssub line {return (@_ > 1) ? ($_[0]->{-line} = $_[1]) : $_[0]->{-line};}# The POD file name the link appears insub file {return (@_ > 1) ? ($_[0]->{-file} = $_[1]) : $_[0]->{-file};}=item $link-E<gt>page()This method sets or returns the POD page this link points to.=cut# The POD page the link appears onsub page {if (@_ > 1) {$_[0]->{-page} = $_[1];$_[0]->_construct_text();}return $_[0]->{-page};}=item $link-E<gt>node()As above, but the destination node text of the link.=cut# The link destinationsub node {if (@_ > 1) {$_[0]->{-node} = $_[1];$_[0]->_construct_text();}return $_[0]->{-node};}=item $link-E<gt>alttext()Sets or returns an alternative text specified in the link.=cut# Potential alternative textsub alttext {if (@_ > 1) {$_[0]->{-alttext} = $_[1];$_[0]->_construct_text();}return $_[0]->{-alttext};}=item $link-E<gt>type()The node type, either C<section> or C<item>. As an unofficial type,there is also C<hyperlink>, derived from e.g. C<LE<lt>http://perl.comE<gt>>=cut# The type: item or headnsub type {return (@_ > 1) ? ($_[0]->{-type} = $_[1]) : $_[0]->{-type};}=item $link-E<gt>link()Returns the link as contents of C<LE<lt>E<gt>>. Reciprocal to B<parse()>.=back=cut# The link itselfsub link {my $self = shift;my $link = $self->page() || '';if($self->node()) {my $node = $self->node();$node =~ s/\|/E<verbar>/g;$node =~ s{/}{E<sol>}g;if($self->type() eq 'section') {$link .= ($link ? '/' : '') . '"' . $node . '"';}elsif($self->type() eq 'hyperlink') {$link = $self->node();}else { # item$link .= '/' . $node;}}if($self->alttext()) {my $text = $self->alttext();$text =~ s/\|/E<verbar>/g;$text =~ s{/}{E<sol>}g;$link = "$text|$link";}return $link;}sub _invalid_link {my ($msg) = @_;# this sets @_#eval { die "$msg\n" };#chomp $@;$@ = $msg; # this seems to work, too!return;}#-----------------------------------------------------------------------------# Pod::Cache## class to hold POD page details#-----------------------------------------------------------------------------package Pod::Cache;=head2 Pod::CacheB<Pod::Cache> holds information about a set of POD documents,especially the nodes for hyperlinks.The following methods are available:=over 4=item Pod::Cache-E<gt>new()Create a new cache object. This object can hold an arbitrary number ofPOD documents of class Pod::Cache::Item.=cutsub new {my $this = shift;my $class = ref($this) || $this;my $self = [];bless $self, $class;return $self;}=item $cache-E<gt>item()Add a new item to the cache. Without arguments, this method returns alist of all cache elements.=cutsub item {my ($self,%param) = @_;if(%param) {my $item = Pod::Cache::Item->new(%param);push(@$self, $item);return $item;}else {return @{$self};}}=item $cache-E<gt>find_page($name)Look for a POD document named C<$name> in the cache. Returns thereference to the corresponding Pod::Cache::Item object or undef ifnot found.=back=cutsub find_page {my ($self,$page) = @_;foreach(@$self) {if($_->page() eq $page) {return $_;}}return;}package Pod::Cache::Item;=head2 Pod::Cache::ItemB<Pod::Cache::Item> holds information about individual POD documents,that can be grouped in a Pod::Cache object.It is intended to hold information about the hyperlink nodes of PODdocuments.The following methods are available:=over 4=item Pod::Cache::Item-E<gt>new()Create a new object.=cutsub new {my $this = shift;my $class = ref($this) || $this;my %params = @_;my $self = {%params};bless $self, $class;$self->initialize();return $self;}sub initialize {my $self = shift;$self->{-nodes} = [] unless(defined $self->{-nodes});}=item $cacheitem-E<gt>page()Set/retrieve the POD document name (e.g. "Pod::Parser").=cut# The POD pagesub page {return (@_ > 1) ? ($_[0]->{-page} = $_[1]) : $_[0]->{-page};}=item $cacheitem-E<gt>description()Set/retrieve the POD short description as found in the C<=head1 NAME>section.=cut# The POD description, taken out of NAME if presentsub description {return (@_ > 1) ? ($_[0]->{-description} = $_[1]) : $_[0]->{-description};}=item $cacheitem-E<gt>path()Set/retrieve the POD file storage path.=cut# The file pathsub path {return (@_ > 1) ? ($_[0]->{-path} = $_[1]) : $_[0]->{-path};}=item $cacheitem-E<gt>file()Set/retrieve the POD file name.=cut# The POD file namesub file {return (@_ > 1) ? ($_[0]->{-file} = $_[1]) : $_[0]->{-file};}=item $cacheitem-E<gt>nodes()Add a node (or a list of nodes) to the document's node list. Note thatthe order is kept, i.e. start with the first node and end with the last.If no argument is given, the current list of nodes is returned in thesame order the nodes have been added.A node can be any scalar, but usually is a pair of node string andunique id for the C<find_node> method to work correctly.=cut# The POD nodessub nodes {my ($self,@nodes) = @_;if(@nodes) {push(@{$self->{-nodes}}, @nodes);return @nodes;}else {return @{$self->{-nodes}};}}=item $cacheitem-E<gt>find_node($name)Look for a node or index entry named C<$name> in the object.Returns the unique id of the node (i.e. the second element of the arraystored in the node array) or undef if not found.=cutsub find_node {my ($self,$node) = @_;my @search;push(@search, @{$self->{-nodes}}) if($self->{-nodes});push(@search, @{$self->{-idx}}) if($self->{-idx});foreach(@search) {if($_->[0] eq $node) {return $_->[1]; # id}}return;}=item $cacheitem-E<gt>idx()Add an index entry (or a list of them) to the document's index list. Note thatthe order is kept, i.e. start with the first node and end with the last.If no argument is given, the current list of index entries is returned in thesame order the entries have been added.An index entry can be any scalar, but usually is a pair of string andunique id.=back=cut# The POD index entriessub idx {my ($self,@idx) = @_;if(@idx) {push(@{$self->{-idx}}, @idx);return @idx;}else {return @{$self->{-idx}};}}=head1 AUTHORPlease report bugs using L<http://rt.cpan.org>.Marek Rouchal E<lt>marekr@cpan.orgE<gt>, borrowinga lot of things from L<pod2man> and L<pod2roff> as well as other PODprocessing tools by Tom Christiansen, Brad Appleton and Russ Allbery.=head1 SEE ALSOL<pod2man>, L<pod2roff>, L<Pod::Parser>, L<Pod::Checker>,L<pod2html>=cut1;