A Turnaround – Don't use XHTML

  • Thread starter Thread starter Sage
  • 13 comments
  • 884 views

Sage

Staff Emeritus
Messages
12,533
United States
United States
Messages
GTP_Sage
A major turnaround, in fact.

For quite a while now (though not anytime recently), I've been an advocate for XHTML & CSS usage for website deployment. However, after reading this and this article (the latter doesn't show up properly in IE, I believe), I've come to the conclusion that 99% of the time, you shouldn't be using XHTML in any flavor.

Why? Here are the short and long explanations:

SHORT

IE can't accept true XHTML, and simply renders it as HTML, so why use something that will be "broken down" anyway?

(REALLY) LONG

In all technicality, XHTML is supposed to be served with a MIME type of application/xhtml+xml. (For those of you unfamiliar with MIME types, this is basically data telling the browser how to handle the document – for example, if a document has a MIME type of application/pdf, the browser (or any other program that runs across it) will know to treat it as a PDF file, application/powerpoint as a PowerPoint file, etc.) However, we run into a big problem – IE doesn't support application/xhtml+xml. The IE team hasn't included this MIME type, so if IE runs across it, it'll think it's something that you made up, and will turn the XHTML document into useless putty.

So what does 99.9% of the XHTML-using population do? They use a MIME type of text/html, since that's what MIME type you're supposed to send HTML 4 documents as. An XHTML document sent as a text/html document shouldn't be valid, but it doesn't fail the validation test, because the W3C has bent that rule to accommodate IE.

Why did the W3C do this? Why did they break one of their own rules? My only guess is that they wanted people to adopt XHTML so that they'd get used to the XML syntax, which requires all lowercase, closing tags on everything, a DOCTYPE, etc. If they kept promoting HTML 4 as the current standard, they'd have a harder time getting people to do that stuff.

And "that stuff" is called creating a well-formed document. A well-formed document has all lowercase tags, a closing tag for every opened element, attributes enclosed in quotations, a DOCTYPE, and is made with semantics in mind (meaning that you don't use tables for layout, you don't use <b> when you should use <strong>, etc.).

While XHTML requires that a document be well-formed, HTML doesn't. However, there's no reason that you can't make a well-formed HTML document—for example, there's no rule saying that you can't have closing paragraph tags (</p>) in a document, even though you can get away without them—the closing tag is optional, but not forbidden.

So, what's my point? If you create an XHTML document, you have to break it, unless you don't mind neglecting 90% of the web browsing population. On the other hand, you can create a perfectly valid and well-formed HTML document. Even worse, an XHTML 1.1 document has to be sent as application/xhtml+xml (the bending of the rules that the W3C made only applies to XHTML 1.0 docs).

Some, myself included (on the Insider site), have tried to circumnavigate this problem by using PHP to send out HTTP Headers asking the browser if it accepts application/xhtml+xml as a MIME type – if it doesn't, then a XHTML 1.0 DOCTYPE is printed on the page, and a text/html MIME type sent in the HTTP Header; if it does, then an XHTML 1.1 DOCTYPE gets printed, and a application/xhtml+xml MIME type sent. However, even though the XHTML 1.0 DOCTYPE with a text/html MIME type validates under the W3C's validator, again, it's still technically wrong, because you're sending conflicting information to the browser (telling it in one part of the document that it's supposed to be parsed as XHTML, a subset of XML, while in another part of the document, telling it that it's supposed be be parsed as a text HTML file).

No good. So, my recommendation is to add an HTML 4 DOCTYPE to your webpages, while still keeping well-formedness in mind, so that it would validate as XHTML. I'll be following through on this during the Insider's next redesign (don't hold your breath – we're talking many months here ;)).

----- The following in optional reading for those who feel the need to burn some time -----

Let's pretend that IE does support application/xhtml+xml (which it doesn't!). So, even if you can send your beautiful application/xhtml+xml MIME type, here are a few reasons you still probably wouldn't want to write a true-blue XHTML document (all of these are copied directly from Hixie's XHTML Advocacy page, but I'm putting the important stuff in here, and a number of you might not be able to see that webpage anyway).

Code:
 * <script> and <style> elements in XHTML sent as text/html have to be
   escaped using ridiculously complicated strings.

   This is because in XHTML, <script> and <style> elements are #PCDATA
   blocks, not #CDATA blocks, and therefore <!-- and --> really _are_
   comments tags, and are not ignored by the XHTML parser. To escape
   script in an XHTML document which may be handled as either HTML4 or
   XHTML, you have to use:

      <script type="text/javascript"><!--//--><![CDATA[//><!--
        ...
      //--><!]]></script>

   To embed CSS in an XHTML document which may be handled as either
   HTML4 or XHTML, you have to use:

      <style type="text/css"><!--/*--><![CDATA[/*><!--*/
        ...
      /*]]>*/--></style>

   Yes, it's pretty ridiculous. If documents _aren't_ escaped like
   this, then the contents of <script> and <style> elements get
   dropped on the floor when parsed as true XHTML.

   (This is all assuming you want your pages to work with older
   browsers as well as XHTML browsers. If you only care about XHTML
   and HTML4 browsers, you can make it a bit simpler.)

 * A CSS stylesheet written for an HTML4 document is interpreted
   slightly differently in an XHTML context (e.g. the <body> element
   is not magical in XHTML, tag names must be written in lowercase in
   XHTML). Thus documents change rendering when parsed as XHTML.

 * A DOM-based script written for an HTML4 document has subtly
   different semantics in an XHTML context (e.g. element names are
   case insensitive and returned in uppercase in HTML4, case sensitive
   and always lowercase in XHTML; you have to use the namespace-aware
   methods in XHTML, but not in HTML4). BUT, if you send your
   documents as text/html, then they will use the HTML4 semantics
   DESPITE being XHTML! Thus, scripts are highly likely to break when
   the document is parsed as XHTML.

 * Scripts that use document.write() will not work in XHTML contexts.
   (You have to use DOM Core methods.)

 * Current UAs are, for text/html content, HTML4 user agents (at best)
   and certainly not XHTML user agents. Therefore if you send them
   XHTML you are sending them content in a language which is not
   native to them, and instead relying on their error handling. Since
   this is not defined in any specification, it may vary from one user
   agent to the other.

 * XHTML documents that use the "/>" notation, as in "<link />" have
   very different semantics when parsed as HTML4. So if there was to
   be a fully compliant HTML4 UA, it would be quite correct to show
   ">" characters all over the page.
 
Oh, and, remember how I said "…I've come to the conclusion that 99% of the time, you shouldn't be using XHTML in any flavor", that 1% refers to very specific cases where an XHTML document is paramount, such as when the site uses MathML (think of it as an XML "plugin" to allow you to show complicated math equations on the web without using images)… and very few sites actually need those kinds of things (this is the only site I know of that uses MathML).
 
So we shouldn't use a web standard just because Microsoft are too lazy to implement support for it in their browser? That just gives them more encourage for further procrastination towards standards support. 👎

And IE dominancy is down to under 70% by the way...click!
 
sUn
Your still advocating for cascading style sheets though, correct?
Of course!

Shannon
So we shouldn't use a web standard just because Microsoft are too lazy to implement support for it in their browser? That just gives them more encourage for further procrastination towards standards support. 👎
HTML 4 is a web standard though… and it's unfair to argue that XHTML is "more-standards-ish" than HTML, especially when you can have well-formedness on both (and it's like saying Atom is a better standard than MathML).

And IE dominancy is down to under 70% by the way...click!
Ah, I've seen that before, but you have to remember – most of the people who visit that site are probably already aware of browser alternatives. IE still holds 90-something percent of the worldwide browser market (very high 90s too… if I remember correctly, Firefox is currently at 1.7%, and it's easily one of the most popular alternatives to IE, aside from aging versions of Netscape).
 
I still don't see why I should stop using XHTML because some corporate monopoly can't be bothered updating their browser.
 
Everything new that I write I write in XHTML1.1. It does pass the W3C validation script, even though it's sent by my server as text/html.

I think that I won't change this policy, especially since I think it has benefit. I have noticed a real increase in rendering speed with compliant documents, especially in Firefox, which seems much happier with a compliant document than not.

The only element of my site that I don't think I can get to work and be compliant is the Forums. The rest of it will get migrated at some point. Fantasy F1 2005 will certainly be compliant.
 
Shannon
So we shouldn't use a web standard just because Microsoft are too lazy to implement support for it in their browser? That just gives them more encourage for further procrastination towards standards support. 👎

And IE dominancy is down to under 70% by the way...click!
👍

That's great news.
 
Shannon
I still don't see why I should stop using XHTML because some corporate monopoly can't be bothered updating their browser.
Because XHTML has only one advantage over HTML, which is that you can use XML "plugins" with it, but like I said, 99.9% of the web-using and web-developing population has no use for those. Trust me, that is the single, one, uno advantage of XHTML over well-formed HTML – In fact, in some ways, XHTML has disadvantages, such as the aforementioned #PCDATA escaping nonsense.

Let me put it this way – by using a text/html MIME type within an XHTML document, you are forcing web browser developers to break their browsers… if Firefox were truly standards-compliant, it wouldn't even show such a page, but because people use it, the developers have had to include that "bug". So you're forcing browsers to show invalid code for the sake of using a technology standard that you don't need.

Again, I can't stress this enough – there is no benefit whatsoever in using XHTML over HTML, except for XML plugins, which you don't need, unless you plan on showing complicated calculus problems on your page or you plan on using SVG (Scalable Vector Graphics, which aren't supported by any web browser AFAIK). And I can't stress this enough either – a well-formed HTML document is still standards compliant… in true technicality, your XHTML documents are not standards compliant, because they have the wrong MIME type and more than likely the wrong commenting markup. The W3C forgives this, because they are bending the rules for Internet Explorer – thus, you're giving them (the IE team) less reason to change, since you're using the bent rules that had to be made around IE's deficiency.

BTW, as a sidenote: The Mozilla.org page is a proper, well-formed, totally semantic HTML 4 document. They knew that there would be no advantages to using XHTML, because it wasn't necessary (and note that this isn't old HTML they're using… they just launched this new version some weeks or months ago). If they changed that one line at the top of the source page to an XHTML DOCTYPE, the one and only thing they'd be achieving is breaking the page (regardless of the Valid! spitout that the W3C validator would give).
 
Fair enough. I still can't believe I'm hearing this from Sage of all people though. :lol:

BTW - More browser statistics here. IE never touches over 79% in either of the 5 sources.
 
Shannon
Fair enough. I still can't believe I'm hearing this from Sage of all people though. :lol:
Change happens. :p

BTW - More browser statistics here. IE never touches over 79% in either of the 5 sources.
IE6, that is. Source 4 seems the most diversified source, since it's from many, many websites, and it shows 92% for that, which is slightly less than I thought it'd be, but still very, um, huge. And, I could be wrong on this, but I imagine there are more Netscape users in the "Gecko based" category than Firefox users, seeing as how Netscape has almost ten years under its belt (October 1994), and lordy knows how many people out there simply use the browser that came with their computer and never upgrade. Firefox still has a ways to go, unfortunately, but I think it can do it, if Mozilla plays it cards right.
 
Sage
IE6, that is. Source 4 seems the most diversified source, since it's from many, many websites, and it shows 92% for that, which is slightly less than I thought it'd be, but still very, um, huge. And, I could be wrong on this, but I imagine there are more Netscape users in the "Gecko based" category than Firefox users, seeing as how Netscape has almost ten years under its belt (October 1994), and lordy knows how many people out there simply use the browser that came with their computer and never upgrade. Firefox still has a ways to go, unfortunately, but I think it can do it, if Mozilla plays it cards right.
This is why IE has such a huge market share. To win the browser wars, Microsoft had to resort to actually integrating IE into Windows. In fact, you can access a website with IE via Windows Explorer (a folder explorer thing to access files on your harddrive). Considering the majority of computers these days are Windows PCs and a lot of first-time computer buyers wouldn't know the first thing about internet, that adds up to a lot of IE users...

Oh, and Firefox 1.0 PR generated over a million downloads in it's first week of release. It's doing well, but with the upcoming release of IE7, it has a lot of work on it's hands. :indiff:
 
Just for interest's sake, here are the statistics for the whole of 2004 for visitors to my web site:
1 Internet Explorer 63.33%
2 Mozilla 23.07%
3 Netscape 5.65%
4 Google Robot 3.75%
5 Safari 1.86%
6 Konqueror 1.22%
7 Alexa Robot 0.72%
8 Opera 0.16%
9 Scooter Robot 0.12%
10 IBM Almaden Robot 0.08%
11 Wget 0.03%
12 InternetSeer.com Robot 0.02%
13 Lynx 0.01%
 
Back