Author :
Ján Godó
Abstract
*1 Introduction to HTML 3.2
*2 The HEAD element
*3 The BODY element
*Further Reading
*
Abstract
The HyperText Markup Language (HTML) is a simple markup language used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of applications. HTML 3.2 is standard from january 1997 used to be as a replacement for HTML 2.0 (RFC 1866).
HTML 3.2 is W3C’s specification for HTML, developed in early `96 together with vendors including IBM, Microsoft, Netscape Communications Corporation, Novell, SoftQuad, Spyglass, and Sun Microsystems. HTML 3.2 adds widely deployed features such as tables, applets and text flow around images, while providing full backwards compatibility with the existing standard HTML 2.0.
Hypertext Markup Language (HTML) is a system for marking up documents with tags that indicate how text in the documents should be presented and how the documents are linked together. Hypertext links are quite powerful. Within the HTML markup scheme lies the power to create interactive, cross-platform, multimedia, client-server applications. This string of adjectives is not just hype; such systems do exist. One, called the World Wide Web (also known as WWW or just simply, the Web), lives on the Internet, providing organization to a wide variety of computer resources located around the globe.
The Web is an interlinked collection of living documents containing formatted text, images, and sound. These documents are organized into webspaces. A webspace is typically structured around a home page with links to other pages or documents both in and outside of the webspace. A home page functions as a virtual meeting place in cyberspace for the exchange of information.
The continuing development of HTML is conducted on the Web in an open process that you can be part of. New tools and techniques appear frequently and are quickly spread throughout the community of Web authors.
The Structure of HTML documents
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> <HEAD> <TITLE>A study of population dynamics</TITLE> ... other head elements </HEAD> <BODY> ... document body </BODY> </HTML>
Every conforming HTML 3.2 document must start with the <!DOCTYPE> declaration that is needed to distinguish HTML 3.2 documents from other versions of HTML. Every HTML 3.2 document must also include the descriptive title element. A minimal HTML 3.2 document thus looks like:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <TITLE> Presentation of new HTML</TITLE>
This contains the document head, but you can always omit both the start and end tags for HEAD.
TITLE element ( defines the document title, and is always needed )
ISINDEX element ( for simple keyword searches, see PROMPT attribute )
BASE element ( defines base URL for resolving relative URLs )
SCRIPT element ( reserved for future use with scripting languages )
META element ( used to supply meta info as name/value pairs )
TITLE, SCRIPT and STYLE are containers and require both start and end tags. The other elements are not containers so that end tags are forbidden.
Note that conforming browsers won’t render the contents of SCRIPT and STYLE elements.
The TITLE element
Every HTML 3.2 document must have exactly one TITLE element in the document’s HEAD. It provides an advisory title which can be displayed in a user agent’s window caption etc.Markup is not permitted in the content of a TITLE element. Example TITLE element:
<TITLE>Presentation of new HTML</TITLE>
The ISINDEX element
The ISINDEX element indicates that the user agent should provide a single line text input field for entering a query string. There are no restrictions on the number of characters that can be entered. The PROMPT attribute can be used to specify a prompt string for the input field, e.g.
<ISINDEX PROMPT="Search Phrase">
Example of ISINDEX element For example, if the query string entered is "computer graphics" and the base URL is:
http:// www.fmph.uniba.sk/
then the query generated is:
http://www.fmph.uniba.sk/?computer+graphics"
Note that space characters are mapped to "+" characters and that normal URL character escaping mechanisms apply. For further details see the HTTP specification.
The BASE element
The BASE element gives the base URL for dereferencing relative URLs, using the rules given by the URL specification, e.g.
<BASE href="http://www.acme.com/intro.html"> ... <IMG SRC="icons/logo.gif">
The image is deferenced to
http://www.acme.com/icons/logo.gif
In the absence of a BASE element the document URL should be used.
Note that this is not necessarily the same as the URL used to request the document, as the base URL may be overridden by an HTTP header accompanying the document.
The META element
The META element can be used to include name/value pairs describing properties of the document, such as author, expiry date, a list of key words etc. The NAME attribute specifies the property name while the CONTENT attribute specifies the property value, e.g.
<META NAME="Author" CONTENT="Johny Godó">
This contains the document body. Both start and end tags for BODY may be omitted. The body can contain a wide range of elements:
Headings (H1 - H6)
The ADDRESS element
Block level Elements
Text level elements
The key attributes are: BACKGROUND, BGCOLOR, TEXT, LINK, VLINK and ALINK. These can be used to set a repeating background image, plus background and foreground colors for normal text and hypertext links.
Example:
<body bgcolor=white text=black link=red vlink=maroon alink=fuschia>
Most elements that can appear in the document body fall into one of two groups: block level elements which cause paragraph breaks, and text level elements which don’t. Common block level elements include H1 to H6 (headers), P (paragraphs) LI (list items), and HR (horizontal rules). Common text level elements include EM, I, B and FONT (character emphasis), A (hypertext links), IMG and APPLET (embedded objects) and BR (line breaks).
Note that block elements generally act as containers for text level and other block level elements (excluding headings and address elements), while text level elements can only contain other text level elements. The exact model depends on the element.
H1, H2, H3, H4, H5 and H6 are used for document headings. You always need the start and end tags. H1 elements are more important than H2 elements and so on, so that H6 elements define the least important level of headings. More important headings are generally rendered in a larger font than less important ones. Use the optional ALIGN attribute to set the text alignment within a heading, e.g.
<H1 ALIGN=CENTER> ... centered heading ... </H1>
The default is left alignment, but this can be overridden by an enclosing DIV or CENTER element.
The ADDRESS element requires start and end tags, and specifies information such as authorship and contact details for the current document. User agents should render the content with paragraph-breaks before and after.
Example:
<ADDRESS> Author<BR> Johny Godo<BR> MFF UK Bratislava<BR> Tel: +421 7 724 000 /211 </ADDRESS>
P paragraphs - The paragraph element requires a start tag, but the end tag can always be omitted. Use the ALIGN attribute to set the text alignment within a paragraph, e.g. <P ALIGN=RIGHT>
UL unordered lists - These require start and end tags, and contain one or more LI elements representing individual list items.
OL ordered (i.e. numbered) lists - These require start and end tags, and contain one or more LI elements representing individual list items.
DL definition lists - These require start and end tags and contain DT elements that give the terms, and DD elements that give corresponding definitions.
PRE preformatted text - Requires start and end tags. These elements are rendered with a monospaced font and preserve layout defined by whitespace and line break characters.
DIV document divisions - Requires start and end tags. It is used with the ALIGN attribute to set the text alignment of the block elements it contains. ALIGN can be one of LEFT, CENTER or RIGHT.
CENTER text alignment - Requires start and end tags. It is used to center text lines enclosed by the CENTER element. See DIV for a more general solution.
BLOCKQUOTE quoted passage - Requires start and end tags. It is used to enclose extended quotations and is typically rendered with indented margins.
FORM fill-out forms - Requires start and end tags. This element is used to define a fill-out form for processing by HTTP servers. The attributes are ACTION, METHOD and ENCTYPE. Form elements can’t be nested.
ISINDEX primitive HTML forms - Not a container, so the end tag is forbidden. This predates FORM and is used for simple kinds of forms which have a single text input field, implied by this element. A single ISINDEX can appear in the document head or body.
HR horizontal rules - Not a container, so the end tag is forbidden. attributes are ALIGN, NOSHADE, SIZE and WIDTH.
TABLE can be nested - Requires start and end tags. Each table starts with an optional CAPTION followed by one or more TR elements defining table rows. Each row has one or more cells defined by TH or TD elements. attributes for TABLE elements are WIDTH, BORDER, CELLSPACING and CELLPADDING.
List items can contain block and text level items, including nested lists, although headings and address elements are excluded.
Unordered lists take the form:
<UL> <LI> ... first list item <LI> ... second list item ... </UL>
The UL element is used for unordered lists. Both start and end tags are always needed. The LI element is used for individual list items. The end tag for LI elements can always be omitted.
Note that LI elements can contain nested lists. The COMPACT attribute can be used as a hint to the user agent to render lists in a more compact style.
The TYPE attribute can be used to set the bullet style on UL and LI elements. The permitted values are "disc", "square" or "circle". The default generally depends on the level of nesting for lists.
with <li type=disc>
with <li type=square>
with <li type=circle>
Ordered (i.e. numbered) lists take the form:
<OL> <LI> ... first list item <LI> ... second list item ... </OL>
The OL START attribute can be used to initialize the sequence number (by default it is initialized to 1). You can set it later on with the VALUE attribute on LI elements. Both of these attributes expect integer values. The COMPACT attribute can be used as a hint to the user agent to render lists in a more compact style. The OL TYPE attribute allows you to set the numbering style for list items:
Type |
Numbering style |
|
1 |
Arabic number |
1, 2, 3, ... |
a |
Lower alpha |
a, b, c, ... |
A |
Upper alpha |
A, B, C, ... |
i |
Lower roman |
i, ii, iii, ... |
I |
Upper roman |
I, II, III, ... |
Definition Lists
Definition lists take the form:
<DL> <DT> term name <DD> term definition ... </DL>
DT elements can only act as containers for text level elements, while DD elements can hold block level elements as well, excluding headings and address elements.
For example:
<DL> <DT>Term 1<dd>This is the definition of the first term. <DT>Term 2<dd>This is the definition of the second term. </DL>
which could be rendered as:
Term 1 This is the definition of the first term. Term 2 This is the definition of the second term.
The COMPACT attribute can be used with the DL element as a hint to the user agent to render lists in a more compact style.
This used to define an HTML form, and you can have more than one form in the same document. Both the start and end tags are required. For very simple forms, you can also use the ISINDEX element. Forms can contain a wide range of HTML markup including several kinds of form fields such as single and multi-line text fields, radio button groups, checkboxes, and menus.
action - This specifies a URL which will be used to invoke a server-side forms handler. This is either an HTTP server or a Mail-to URL. The latter allows you to post forms via email, e.g. action="mailto:johny@nw.fmph.uniba.sk".
method - When the action attribute specifies an HTTP server, the method attribute determines which HTTP method will be used to send the form’s contents to the server. It can be either GET or POST, and defaults to GET.
enctype - This determines the mechanism used to encode the form’s contents.
Further details on handling forms are given in RFC 1867.
HTML 3.2 includes a widely deployed subset of the specification given in RFC 1942 and can be used to markup tabular material or for layout purposes.
Note that the latter role typically causes problems when rending to speech or to text only user agents.
Tables take the general form:
<TABLE BORDER=3 CELLSPACING=2 CELLPADDING=2 WIDTH="80%"> <CAPTION> ... table caption ... </CAPTION> <TR><TD> first cell <TD> second cell <TR> ... ... </TABLE>
These don’t cause paragraph breaks. Text level elements that define character styles can generally be nested. They can contain other text level elements but not block level elements.
These all require start and end tags, e.g.
This has some <B>bold text</B>.
Text level elements must be properly nested - the following is in error:
This has some <B>bold and <I></B>italic text</I>.
User agents should do their best to respect nested emphasis, e.g.
This has some <B>bold and <I>italic text</I></B>.
Where the available fonts are restricted or for speech output, alternative means should be used for rendering differences in emphasis.
TT teletype or monospaced text
I italic text style
B bold text style
U underlined text style
STRIKE strike-through text style
BIG places text in a large font
SMALL places text in a small font
SUB places text in subscript style
SUP places text in superscript style
These all require start and end tags, e.g.
This has some <EM>emphasized text</EM>.
EM basic emphasis typically rendered in an italic font
STRONG strong emphasis typically rendered in a bold font
DFN defining instance of the enclosed term
CODE used for extracts from program code
SAMP used for sample output from programs, and scripts etc.
KBD used for text to be typed by the user
VAR used for variables or arguments to commands
CITE used for citations or references to other sources
INPUT, SELECT and TEXTAREA are only allowed within FORM elements. INPUT can be used for a variety of form fields including single line text fields, password fields, checkboxes, radio buttons, submit and reset buttons, hidden fields, file upload, and image buttons. SELECT elements are used for single or multiple choice menus. TEXTAREA elements are used to define multi-line text fields. The content of the element is used to initialize the field.
text fields, radio buttons, check boxes, ...INPUT elements are not containers and so the end tag is forbidden. type - Used to set the type of input field:
type=text | password | checkbox | radio | submit | image | reset | file | hidden
name - Used to define the property name that will be used to identify this field’s content when it is submitted to the server.
value - Used to initialize the field, or to provide a textual label for submit and reset buttons.
checked - The presence of this attribute is used to initialize checkboxes and radio buttons to their checked state.
size - Used to set the visible size of text fields to a given number of average character widths, e.g. size=20
maxlength - Sets the maximum number of characters permitted in a text field.
src - Specifies a URL for the image to use with a graphical submit button.
align - Used to specify image alignment for graphical submit buttons. It is defined just like the IMG align attribute and takes one of the values: top, middle, bottom, left or right, defaulting to bottom.
SELECT is used to define select one from many or many from many menus. SELECT elements require start and end tags and contain one or more OPTION elements that define menu items. One from many menus are generally rendered as drop-down menus while many from many menus are generally shown as list boxes.
Example:
<SELECT NAME="flavor"> <OPTION VALUE=a>Vanilla <OPTION VALUE=b>Strawberry <OPTION VALUE=c>Rum and Raisin <OPTION VALUE=d>Peach and Orange </SELECT>
SELECT attributes:
name - This specifies a property name that is used to identify the menu choice when the form is submitted to the server. Each selected option results in a property name/value pair being included as part of the form’s contents.
size - This sets the number of visible choices for many from many menus.
multiple - The presence of this attribute signifies that the users can make multiple selections. By default only one selection is allowed.
OPTION attributes:
selected - When this attribute is present, the option is selected when the document is initially loaded. It is an error for more than one option to be so selected for one from many menus.
value - Specifies the property value to be used when submitting the form’s content. This is combined with the property name as given by the name attribute of the parent SELECT element.
TEXTAREA elements require start and end tags. The content of the element is restricted to text and character entities. It is used to initialize the text that is shown when the document is first loaded.
Example:
<TEXTAREA NAME=address ROWS=4 COLS=40> Your address here ... </TEXTAREA>
name - This specifies a property name that is used to identify the textarea field when the form is submitted to the server. rows - Specifies the number of visible text lines. Users should be able to enter more lines that this, so user agents should provide some means to scroll through the contents of the textarea field when the contents extend beyond the visible area. cols - Specifies the visible width in average character widths. Users should be able to enter longer lines that this, so user agents should provide some means to scroll through the contents of the textarea field when the contents extend beyond the visible area. User agents may wrap visible text lines to keep long lines visible without the need for scrolling.
Anchors, IMG, APPLET, FONT, BASEFONT, BR and MAP.
Anchors can’t be nested and always require start and end tags. They are used to define hypertext links and also to define named locations for use as targets for hypertext links, e.g.
The way to <a href="hands-on.html">happiness</a>.
and also to define named locations for use as targets for hypertext links, e.g.
<h2><a name=mit>545 Tech Square - Hacker’s Paradise</a></h2>
name - This should be a string defining unique name for the scope of the current HTML document. NAME is used to associate a name with this part of a document for use with URLs that target a named section of a document.
href - Specifies a URL acting as a network address for the linked resource. This could be another HTML document, a PDF file or an image etc.
title - An advisory title for the linked resource.
Used to insert images. IMG is an empty element and so the end tag is forbidden. Images can be positioned vertically relative to the current textline or floated to the left or right. See BR with the CLEAR attribute for control over textflow.
e.g.<IMG SRC="canyon.gif" ALT="Grand Canyon">
IMG elements support the following attributes:
src - This attribute is required for every IMG element. It specifies a URL for the image resource, for instance a GIF, JPEG or PNG image file.
alt - This is used to provide a text description of the image and is vital for interoperability with speech-based and text only user agents.
align - This specifies how the image is positioned relative to the current textline in which it occurs: Align=top | middle | bottom | left | right
width - Specifies the intended width of the image in pixels.
height - Specifies the intended height of the image in pixels.
border - When the IMG element appears as part of a hypertext link, the user agent will generally indicate this by drawing a colored border (typically blue) around the image. This attribute can be used to set the width of this border in pixels. hspace - This can be used to provide white space to the immediate left and right of the image. The HSPACE attribute sets the width of this white space in pixels. By default HSPACE is a small non-zero number.
vspace - This can be used to provide white space above and below the image The VSPACE attribute sets the height of this white space in pixels. By default VSPACE is a small non-zero number.
usemap - This can be used to give a URL fragment identifier for a client-side image map defined with the IMG element. ismap - When the IMG element is part of a hypertext link, and the user clicks on the image, the ISMAP attribute causes the location to be passed to the server.
Here is an example of how you use ISMAP:
<a href="/cgibin/navbar.map"><img src=navbar.gif ismap border=0></a>
The location clicked is passed to the server as follows. The user agent derives a new URL from the URL specified by the HREF attribute by appending `?’ the x coordinate `,’ and the y coordinate of the location in pixels. The link is then followed using the new URL. For instance, if the user clicked at at the location x=10, y=27 then the derived URL will be: "/cgibin/navbar.map?10,27". It is generally a good idea to suppress the border and use graphical idioms to indicate that the image is clickable.
Requires start and end tags. This element is supported by all Java enabled browsers. It allows you to embed a Java applet into HTML documents. APPLET uses associated PAREM elements to pass parameters to the applet. Following the PARAM elements, the content of APPLET elements should be used to provide an alternative to the applet for user agents that don’t support Java. Java-compatible browsers ignore this extra HTML code. You can use it to show a snapshot of the applet running, with text explaining what the applet does.
Here is a simple example of a Java applet:
<applet code="Bubbles.class" width=500 height=500> Java applet that draws animated bubbles. </applet>
Here is another one using a PARAM element:
<applet code="AudioItem" width=15 height=15> <param name=snd value="Hello.au|Welcome.au"> Java applet that plays a welcoming sound. </applet>
Requires start and end tags. This allows you to change the font size and/or color for the enclosed text. The attributes are: SIZE and COLOR. Font sizes are given in terms of a scalar range defined by the user agent with no direct mapping to point sizes etc.
size - This sets the font size for the contents of the font element. You can set size to an integer ranging from 1 to 7 for an absolute font size, or specify a relative font size with a signed integer value, e.g. size="+1" or size="-2". This is mapped to an absolute font size by adding the current base font size as set by the BASEFONT element (see below).
color - Used to set the color to stroke the text. Colors are given as RGB in hexadecimal notation or as one of 16 widely understood color names defined as per the BGCOLOR attribute on the BODY element.
The following shows the effects of setting font to absolute sizes:
size=1
size=2 size=3 size=4 size=5 size=6The following shows the effect of relative font sizes using a base font size of 3:
Used to set the base font size. BASEFONT is an empty element so the end tag is forbidden. The SIZE attribute is an integer value ranging from 1 to 7. The base font size applies to the normal and preformatted text but not to headings.
Used to force a line break. This is an empty element so the end tag is forbidden. The CLEAR attribute can be used to move down past floating images on either margin. <BR CLEAR=LEFT> moves down past floating images on the left margin, <BR CLEAR=RIGHT> does the same for floating images on the right margin, while <BR CLEAR=ALL> does the same for such images on both left and right margins.
The MAP element provides a mechanism for client-side image maps. These can be placed in the same document or grouped in a separate document although this isn’t yet widely supported. The MAP element requires start and end tags. It contains one or more AREA elements that specify hotzones on the associated image and bind these hotzones to URLs.
Here is a simple example for a graphical navigational toolbar:
<img src="navbar.gif" border=0 usemap="#map1"> <map name="map1"> <area href=guide.html alt="Access Guide" shape=rect coords="0,0,118,28"> <area href=search.html alt="Search" shape=rect coords="184,0,276,28"> <area href=shortcut.html alt="Go" shape=rect coords="118,0,184,28"> <area href=top10.html alt="Top Ten" shape=rect coords="276,0,373,28"> </map>
The MAP element has one attribute NAME which is used to associate a name with a map. This is then used by the USEMAP attribute on the IMG element to reference the map via a URL fragment identifier.
Note that the value of the NAME attribute is case sensitive. The AREA element is an empty element and so the end tag is forbidden. It takes the following attributes: SHAPE, COORDS, HREF, NOHREF and ALT.
Further Reading
Tim Berners-Lee and Dan Connolly : HTML 2.0 (RFC1866)
ftp://ds.internic.net/rfc/rfc1866.txt
http://www.w3.org/pub/WWW/MarkUp/
E. Nebel and L. Masinter : Form-based File Upload in HTML (RFC1867)
ftp://ds.internic.net/rfc/rfc1867.txt
Dave Raggett : HTML Tables (RFC1942)
ftp://ds.internic.net/rfc/rfc1942.txt
http://www.w3.org/pub/WWW/TR/WD-tables
Mathew Anderson, Ricardo Motta, Srinivasan Chandrasekar and Michael Stokes : Proposal for a Standard Color Space for the Internet (sRGB)