NAME HTML::FromText - mark up text as HTML SYNOPSIS use HTML::FromText; print text2html($text, urls => 1, paras => 1, headings => 1); DESCRIPTION The "text2html" function marks up plain text as HTML. By default it expands tabs and converts HTML metacharacters into the corresponding entities. More complicated transformations, such as splitting the text into paragraphs or marking up bulleted lists, can be carried out by setting the appropriate options. SUMMARY OF OPTIONS These options always apply: metachars Convert HTML metacharacters to entity references urls Convert URLs to links email Convert email addresses to links bold Mark up words with *asterisks* in bold underline Mark up words with _underscores_ as underlined You can then choose to treat the text according to one of these options: pre Treat text as preformatted lines Treat text as line-oriented paras Treat text as paragraph-oriented (If more than one of these is specified, "pre" takes precedence over "lines" which takes precedence over "paras".) The following option applies when the "lines" option is specified: spaces Preserve spaces from the original text The following options apply when the "paras" option is specified: blockparas Mark up indented paragraphs as block quote blockquotes Ditto, also preserve lines from original blockcode Ditto, also preserve spaces from original bullets Mark up bulleted paragraphs as unordered list headings Mark up headings numbers Mark up numbered paragraphs as ordered list tables Mark up tables title Mark up first paragraph as level 1 heading "text2html" will issue a warning if it is passed nonsensical options, for example "headings" but not "paras". These warnings can be supressed by setting $HTML::FromText::QUIET to true. OPTIONS blockparas blockquotes blockcode These options cause to "text2html" to spot paragraphs where every line begins with whitespace, and mark them up as block quotes. If more than one of these options is specified, "blockparas" takes precedence over "blockcode", which takes precedence over "blockquotes". All three options are ignored unless the "paras" option is also set. The "blockparas" option marks up the paragraph as a block quote with no other changes. For example, Turing wrote, I propose to consider the question, "Can machines think?" becomes

Turing wrote,

I propose to consider the question, "Can machines think?"
The "blockquotes" option preserves line breaks in the original text. For example, From "The Waste Land": Phlebas the Phoenecian, a fortnight dead, Forgot the cry of gulls, and the deep sea swell becomes

From "The Waste Land":

Phlebas the Phoenecian, a fortnight dead,
Forgot the cry of gulls, and the deep sea swell
The "blockcode" option preserves line breaks and spaces in the original text and renders the paragraph in a fixed-width font. For example: Here's how to output numbers with commas: sub commify { local $_ = shift; 1 while s/^(-?\d+)(\d{3})/$1,$2/; $_; } becomes

Here's how to output numbers with commas:

sub commify {
  local $_ = shift;
  1 while s/^(-?\d+)(\d{3})/$1,$2/;
  $_;
}
bold Words surrounded with asterisks are marked up in bold, so "*abc*" becomes "abc". bullets Spots bulleted paragraphs (beginning with optional whitespace, an asterisk or hyphen, and whitespace) and marks them up as an unordered list. Bulleted paragraphs don't have to be separated by blank lines. For example, Shopping list: * apples * pears becomes

Shopping list:

This option is ignored unless the "paras" option is set. email Spots email addresses in the text and converts them to links. For example Mail me at web@perl.com. becomes Mail me at web@perl.com. headings Spots headings (paragraphs starting with numbers) and marks them up as headings of the appropriate level. For example, 1. Introduction 1.1 Background 1.1.1 Previous work 2. Conclusion becomes

1. Introduction

1.1 Background

1.1.1 Previous work

2. Conclusion

This option is ignored unless the "paras" option is set. lines Formats the text so as to preserve line breaks. For example, Line 1 Line 2 becomes Line 1
Line 2 If two or more of the options "pre", "lines" and "paras" are set, then "pre" takes precedence over "lines", which takes precedence over "paras". metachars Converts HTML metacharacters into their corresponding entity references. Ampersand ("&") becomes "&", less than ("<") becomes "<", greater than (">") becomes ">", and quote (") becomes """. This option is 1 by default. numbers Spots numbered paragraphs (beginning with whitespace, digits, an optional period/parenthesis/bracket, and whitespace) and marks them up as an ordered list. Numbered paragraphs don't have to be separated by blank lines. For example, To do: 1. Write thesis 2. Submit it 3. Celebrate becomes

To do:

  1. Write thesis

  2. Submit it

  3. Celebrate

This option is ignored unless the "paras" option is set. paras Format the text into paragraphs. Paragraphs are separated by one or more blank lines. For example, Paragraph 1 Paragraph 2 becomes

Paragraph 1

Paragraph 2

If two or more of the options "pre", "lines" and "paras" are set, then "pre" takes precedence over "lines", which takes precedence over "paras". pre Wrap the whole input in a "
" element. For example,

            preformatted
            text

        becomes

            
preformatted
            text
If two or more of the options "pre", "lines" and "paras" are set, then "pre" takes precedence over "lines", which takes precedence over "paras". spaces Preserves spaces throughout the text. For example, Line 1 Line 2 Line 3 becomes Line 1
 Line  2
  Line   3 This option is ignored unless the "lines" option is set. tables Spots tables and marks them up appropriately. Columns must be separated by two or more spaces (this prevents accidental incorrect recognition of a paragraph where interword spaces happen to line up). If there are two or more rows in a paragraph and all rows share the same set of (two or more) columns, the paragraph is assumed to be a table. For example -e File exists. -z File has zero size. -s File has nonzero size (returns size). becomes

-eFile exists.
-zFile has zero size.
-sFile has nonzero size (returns size).

"text2html" guesses for each column whether it is intended to be left, centre or right aligned. This option is ignored unless the "paras" option is set. title Formats the first paragraph of the text as a first-level heading. For example, Paragraph 1 Paragraph 2 becomes

Paragraph 1

Paragraph 2

This option is ignored unless the "paras" option is set. underline Words surrounded with underscores are marked up with underline, so "_abc_" becomes "abc". urls Spots Uniform Resource Locators (URLs) in the text and converts them to links. For example See https://perl.com/. becomes See https://perl.com/. SEE ALSO The "HTML::Entities" module (part of the LWP package) provides functions for encoding and decoding HTML entities. Tom Christiansen has a complete implementation of RFC 822 structured field bodies. See "http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz". Seth Golub's "txt2html" utility does everything that "HTML::FromText" does, and a few things that it would like to do. See "http://www.thehouse.org/txt2html/". RFC 822: "Standard for the Format of ARPA Internet Text Messages" describes the syntax of email addresses (the more esoteric features of structured field bodies, in particular quoted-strings, domain literals and comments, are not recognized by "HTML::FromText"). See "ftp://src.doc.ic.ac.uk/rfc/rfc822.txt". RFC 1630: "Universal Resource Identifiers in WWW" lists the protocols that may appear in URLs. "HTML::FromText" also recognizes "https:", but ignores "file:" because experience suggests that it results in too many false positives. See "ftp://src.doc.ic.ac.uk/rfc/rfc1630.txt". AUTHOR Gareth Rees "". COPYRIGHT Copyright (c) 1999 Canon Research Centre Europe. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.