HTML Is Broken

From Woozle Writes Code
Revision as of 01:25, 9 June 2011 by Woozle (talk | contribs) (Reverted edits by 58.9.106.233 (talk) to last revision by Aero)
Jump to navigation Jump to search

I'm sure I've seen this discussed elsewhere, but HTML is messed-up.

It is (now) ostensibly a data-description language which has been corrupted for use as a visual markup language, and it does not serve the needs of either very well. Current planning seems to be heading toward replacing HTML with XHTML, which may finally strip the last of the visual markup elements from HTML and push them out into CSS (style sheets) where they belong, but the resulting pair of languages is still an awkward tool for creating user interfaces. (I will discuss this more at length.)

HTML is also internally inconsistent:

  • Using <img src=URL> syntax, a page may contain images which are defined elsewhere, but a page may not include text defined elsewhere (except via action on the server's part, e.g. SSI). All text within a page is defined within a single file, but there is no way to define images within the same file; this considerably complicates web application programming in some ways.
  • Images and tables have some basic characteristics in common, i.e. they are both basically rectangular; HTML's handling of them is also similar in many ways, i.e. either can be aligned flush left or right, so that the rest of the page's contents can wrap around them. Only images, however, can be handled like words in a paragraph, i.e. allowed to wrap when the end of a line is reached. There is no reason why tables should not also have this ability. The lack of this ability prevents linewrapped image sets from having captions, as the only way to give an image a caption is to enclose it within a table. 2006-04-21 update: actually this can be done, by assigning a style of "display: inline" to the tables being shown – but there are enough technical problems (especially cross-browser compatibility issues) with it that it is of very limited utility.
  • "Box" elements (tables, images):
    • Box elements may have their sizes defined as a percentage of the available room. However, this is often handled very poorly -- if the available room is partly blocked by a left- or right- aligned table or image, the blockage is often ignored in calculating the block element's size, which can easily result in elements becoming unintentionally (and uncontrollably) superimposed on top of each other, giving incomprehensible or at least very ugly results.
      • I think there are various ways to fix this with CSS ('clear: both'?) or you're experiencing errors in a specific implementation ~ Aero
    • Although you can specify that a box element should take up 100% of the width of the page or 100% of the height of the page, there is no way to specify that it should take up 100% of whichever dimension is smaller (i.e. "fit to page"). This makes it impossible to automatically generate html pages where all images in a series are correctly fitted to print without either being cut off at the sides or spilling over to the next page; the output must be checked by eye and the image tag corrected to be height=100% or width=100% as appropriate.
      • Use reasonable fixed print sizes in a separate stylesheet for printing (see below) ~ Aero
    • A box element cannot be sized relative to another element, unless it is contained by that element. (I'm trying to remember why I needed to do this, but nothing is coming to mind at the moment.)
  • The "page" is treated as either being the size of the browser window (e.g. in "height=x%" tags), the size of the printable area of a page if the document is printed (or print-previewed), or as an infinitely extendible scroll, depending on context. There is no way for the HTML author to specify which meaning is intended. (I know I need to put in an example of this, but can't think of one just now.)
    • Use separate CSS stylesheets for different media types (screen/print/etc.) ~ Aero

This article isn't necessarily finished; it's just written to the point where I ran out of time. I'm planning to discuss what I'd like to see replacing HTML in this article: I/O Markup Language

Notes

To be investigated: the possibility of using Edje as an alternative method of conveying weblike pages over the internet