ARTICLES FOR WEBMASTERS
Beginning XML: Basic Syntax and Differences From HTML
XML is the acronym
for Extensible Markup Language, which focuses on describing data and what data
actually is.
HTML is
also a markup language, but it deals with how data looks and is
displayed. XML
tags are not predefined like HTML tags – you must invent your own. XML
doesn't
physically do anything; rather, it helps to structure, store, and send
information across different information systems in an easy, simple way
that
doesn't require any translation at all.
Understanding the Syntax Rules of XML
The first line of an
XML document, called
the XML declaration, is optional. It gives the version of XML currently
being
used (either 1.0 or 1.1, although 1.0 is the most common), as well as
the
character encoding.
<?xml
version="1.0"
encoding="ISO-8859-1"?>
The above example describes version 1.0 of XML and its ISO-8859-1
character set
(one of many potential choices).
The rest of an XML
document invariably
contains nested elements, or pairs of tags, inserted throughout. Each
element
is comprised of one pair of tags, called a start tag and an end tag.
The start
tag is formed by putting a term in angle brackets. The end tag is
formed in the
same way as the start tag, using the same term, except this time there
is a
slash directly after the first angle bracket and before the term.
Example start terms:
<rule>Everything in between the start
and end tags is called the content.</rule>
Everything in
between the <rule> and
</rule>, start and end tags in the example above, is considered
the
content. A full element has a start tag, content, and an end tag, just
like the
example.
Besides text content, an XML
element may
also include attributes. An attribute is a name and a value paired
together,
placed in the start tag directly after the element name.
<term
number=“1”
type=“technical”>Attribute</term>
In the above example, the element name term has 2 attributes -
number=“1”
is an attribute, and so is
type=“technical”. They are both included in in the start tag right
after the
element name (term). In the number=“1” attribute, the name number
has the
value 1. In
the type=“technical” attribute, the name type has the value technical.
The complete XML element describes the function of the text – that
there is a
certain number (1) of terms being described, and that the type of term
is
technical. Attribute is the 1 technical term being addressed.
**Keep in mind that although
the number 1
is a quantity and that the term technical is a measurement of
quality,
in XML they are merely supposed to stand for the terms they describe,
not
function as the terms themselves.
The values of attributes must
be put in
either single or double quotes. In the above example, the “1” and “technical”
attribute
values have been correctly placed in quotes. Each different attribute
name may
only be used once in any given element. In the previous example, the
attribute
names term number and type have each been used only
once.
Elements can
include other elements inside
of them.
<termlist>
<term>Element</term>
<term>Attribute</term>
<term>Name</term>
<term>Value</term>
</termlist>
In this example, the element termlist
contains three term elements. The element termlist is
also known
as the top-level root element, or document element. XML that does not
contain a
top-level root element is formed badly, and is considered malformed.
Incorrect XML Example:
<term>Element</term>
<term>Attribute</term>
<term>Name</term>
<term>Value</term>
Without the top-level root element termlist, the term
subelements
are badly-created XML.
More Differences Between XML and HTML
XML is different
than HTML in many subtle
but crucial ways, so it follows that there are some tasks that are
better
suited for XML than HTML, and vice versa – for instance, with XML it is
a much
simpler task to access crucial document information than with HTML,
which would
sometimes require an excess of so-called markup language red tape.
HTML doesn't have to have a
closing tag,
but XML does (except in the case of the XML declaration, which is not
considered an element, so the usual rules don't apply).
Example of
correct HTML:
<p>A correct
HTML paragraph doesn't have to have a closing tag
<p>New paragraphs can
start without old paragraphs having a
closing tag.
This is incorrect XML, however.
Example of correct XML:
<p>A correct XML paragraph
has a closing
tag</p>
<p>If XML doesn't have
a
closing tag, then it is wrongly constructed</p>
HTML isn't case sensitive, but XML is.
Example of
correct HTML:
<Rule>The capitalization in
'rule' here is
inconsistent, but fine for HTML</rule>
This type of mixed capitalization
works for
HTML, but is considered incorrect XML.
Example of correct XML:
<rule>The
capitalization in 'rule' here is the same in both the start and end
tag</rule>
HTML tags can be used in
different orders
(or nested improperly), but XML tags need to be used exactly
symmetrically
without overlapping (or nested properly).
Example of correct HTML:
<b><i>Go to the
store,
Jimmy!</b></i>
This is fine for
HTML, but the <i>
tags overlap with the <b> tags, so as XML, the above markup
fails.
Example of correct XML:
<b><i>Go to the
store,
Jimmy!</i></b>
HTML tags get rid of
any white space
purposely included in a document, whereas XML preserves all white space.
Original text:
Don't use HTML to do the
following thing:
preserve
space
HTML version:
Don't use HTML to do the
following thing: preserve space
XML version:
Don't use HTML to do the
following thing:
preserve space
In the above example, white space was intentionally included, which the
HTML
version is shown as incapable of preserving. The XML version is
successful in
this respect.
With these basic tenements of
the syntax of
XML and its differences from HTML under your belt, you should have a
firm idea
of how to create and use basic, valid XML.
Recent articles:
|