HTML Is Broken

From Woozle Writes Code
Revision as of 01:45, 1 September 2005 by Woozle (talk | contribs)
Jump to navigation Jump to search

I'm sure I've seen this discussed elsewhere, but HTML is messed-up.

It is (now) ostensibly a data-description language which has been corrupted for use as a visual markup language, and it does not serve the needs of either very well. Current planning seems to be heading toward replacing HTML with XHTML, which may finally strip the last of the visual markup elements from HTML and push them out into CSS (style sheets) where they belong, but the resulting pair of languages is still an awkward tool for creating user interfaces. (I will discuss this more at length.)

HTML is also internally inconsistent:

  • Using <img src=URL> syntax, a page may contain images which are defined elsewhere, but a page may not include text defined elsewhere (except via action on the server's part, e.g. SSI). All text within a page is defined within a single file, but there is no way to define images within the same file; this considerably complicates web application programming in some ways.
  • Images and tables have some basic characteristics in common, i.e. they are both basically rectangular; HTML's handling of them is also similar in many ways, i.e. either can be aligned flush left or right, so that the rest of the page's contents can wrap around them. Only images, however, can be handled like words in a paragraph, i.e. allowed to wrap when the end of a line is reached. There is no reason why tables should not also have this ability. The lack of this ability prevents linewrapped image sets from having captions, as the only way to give an image a caption is to enclose it within a table.
  • "Box" elements in HTML (such as tables, images, etc.) may have their sizes defined as a percentage of the available room. However, this is often handled very poorly -- if the available room is partly blocked by a left- or right- aligned table or image, the blockage is often ignored in calculating the block element's size, which can easily result in elements becoming unintentionally (and uncontrollably) superimposed on top of each other, giving incomprehensible or at least very ugly results.
  • The "page" is treated as either being the size of the browser window (e.g. in "height=x%" tags) or as an infinitely extendible scroll, depending on context. There is no way for the HTML author to specify which meaning is intended. (I know I need to put in an example of this, but can't think of one just now.)

This article isn't necessarily finished; it's just written to the point where I ran out of time. I'm planning to discuss what I'd like to see replacing HTML in this article: I/O Markup Language