Sage
Staff Emeritus
- 12,533
- United States
- GTP_Sage
A major turnaround, in fact.
For quite a while now (though not anytime recently), I've been an advocate for XHTML & CSS usage for website deployment. However, after reading this and this article (the latter doesn't show up properly in IE, I believe), I've come to the conclusion that 99% of the time, you shouldn't be using XHTML in any flavor.
Why? Here are the short and long explanations:
SHORT
IE can't accept true XHTML, and simply renders it as HTML, so why use something that will be "broken down" anyway?
(REALLY) LONG
In all technicality, XHTML is supposed to be served with a MIME type of application/xhtml+xml. (For those of you unfamiliar with MIME types, this is basically data telling the browser how to handle the document for example, if a document has a MIME type of application/pdf, the browser (or any other program that runs across it) will know to treat it as a PDF file, application/powerpoint as a PowerPoint file, etc.) However, we run into a big problem IE doesn't support application/xhtml+xml. The IE team hasn't included this MIME type, so if IE runs across it, it'll think it's something that you made up, and will turn the XHTML document into useless putty.
So what does 99.9% of the XHTML-using population do? They use a MIME type of text/html, since that's what MIME type you're supposed to send HTML 4 documents as. An XHTML document sent as a text/html document shouldn't be valid, but it doesn't fail the validation test, because the W3C has bent that rule to accommodate IE.
Why did the W3C do this? Why did they break one of their own rules? My only guess is that they wanted people to adopt XHTML so that they'd get used to the XML syntax, which requires all lowercase, closing tags on everything, a DOCTYPE, etc. If they kept promoting HTML 4 as the current standard, they'd have a harder time getting people to do that stuff.
And "that stuff" is called creating a well-formed document. A well-formed document has all lowercase tags, a closing tag for every opened element, attributes enclosed in quotations, a DOCTYPE, and is made with semantics in mind (meaning that you don't use tables for layout, you don't use <b> when you should use <strong>, etc.).
While XHTML requires that a document be well-formed, HTML doesn't. However, there's no reason that you can't make a well-formed HTML documentfor example, there's no rule saying that you can't have closing paragraph tags (</p>) in a document, even though you can get away without themthe closing tag is optional, but not forbidden.
So, what's my point? If you create an XHTML document, you have to break it, unless you don't mind neglecting 90% of the web browsing population. On the other hand, you can create a perfectly valid and well-formed HTML document. Even worse, an XHTML 1.1 document has to be sent as application/xhtml+xml (the bending of the rules that the W3C made only applies to XHTML 1.0 docs).
Some, myself included (on the Insider site), have tried to circumnavigate this problem by using PHP to send out HTTP Headers asking the browser if it accepts application/xhtml+xml as a MIME type if it doesn't, then a XHTML 1.0 DOCTYPE is printed on the page, and a text/html MIME type sent in the HTTP Header; if it does, then an XHTML 1.1 DOCTYPE gets printed, and a application/xhtml+xml MIME type sent. However, even though the XHTML 1.0 DOCTYPE with a text/html MIME type validates under the W3C's validator, again, it's still technically wrong, because you're sending conflicting information to the browser (telling it in one part of the document that it's supposed to be parsed as XHTML, a subset of XML, while in another part of the document, telling it that it's supposed be be parsed as a text HTML file).
No good. So, my recommendation is to add an HTML 4 DOCTYPE to your webpages, while still keeping well-formedness in mind, so that it would validate as XHTML. I'll be following through on this during the Insider's next redesign (don't hold your breath we're talking many months here
).
----- The following in optional reading for those who feel the need to burn some time -----
Let's pretend that IE does support application/xhtml+xml (which it doesn't!). So, even if you can send your beautiful application/xhtml+xml MIME type, here are a few reasons you still probably wouldn't want to write a true-blue XHTML document (all of these are copied directly from Hixie's XHTML Advocacy page, but I'm putting the important stuff in here, and a number of you might not be able to see that webpage anyway).
For quite a while now (though not anytime recently), I've been an advocate for XHTML & CSS usage for website deployment. However, after reading this and this article (the latter doesn't show up properly in IE, I believe), I've come to the conclusion that 99% of the time, you shouldn't be using XHTML in any flavor.
Why? Here are the short and long explanations:
SHORT
IE can't accept true XHTML, and simply renders it as HTML, so why use something that will be "broken down" anyway?
(REALLY) LONG
In all technicality, XHTML is supposed to be served with a MIME type of application/xhtml+xml. (For those of you unfamiliar with MIME types, this is basically data telling the browser how to handle the document for example, if a document has a MIME type of application/pdf, the browser (or any other program that runs across it) will know to treat it as a PDF file, application/powerpoint as a PowerPoint file, etc.) However, we run into a big problem IE doesn't support application/xhtml+xml. The IE team hasn't included this MIME type, so if IE runs across it, it'll think it's something that you made up, and will turn the XHTML document into useless putty.
So what does 99.9% of the XHTML-using population do? They use a MIME type of text/html, since that's what MIME type you're supposed to send HTML 4 documents as. An XHTML document sent as a text/html document shouldn't be valid, but it doesn't fail the validation test, because the W3C has bent that rule to accommodate IE.
Why did the W3C do this? Why did they break one of their own rules? My only guess is that they wanted people to adopt XHTML so that they'd get used to the XML syntax, which requires all lowercase, closing tags on everything, a DOCTYPE, etc. If they kept promoting HTML 4 as the current standard, they'd have a harder time getting people to do that stuff.
And "that stuff" is called creating a well-formed document. A well-formed document has all lowercase tags, a closing tag for every opened element, attributes enclosed in quotations, a DOCTYPE, and is made with semantics in mind (meaning that you don't use tables for layout, you don't use <b> when you should use <strong>, etc.).
While XHTML requires that a document be well-formed, HTML doesn't. However, there's no reason that you can't make a well-formed HTML documentfor example, there's no rule saying that you can't have closing paragraph tags (</p>) in a document, even though you can get away without themthe closing tag is optional, but not forbidden.
So, what's my point? If you create an XHTML document, you have to break it, unless you don't mind neglecting 90% of the web browsing population. On the other hand, you can create a perfectly valid and well-formed HTML document. Even worse, an XHTML 1.1 document has to be sent as application/xhtml+xml (the bending of the rules that the W3C made only applies to XHTML 1.0 docs).
Some, myself included (on the Insider site), have tried to circumnavigate this problem by using PHP to send out HTTP Headers asking the browser if it accepts application/xhtml+xml as a MIME type if it doesn't, then a XHTML 1.0 DOCTYPE is printed on the page, and a text/html MIME type sent in the HTTP Header; if it does, then an XHTML 1.1 DOCTYPE gets printed, and a application/xhtml+xml MIME type sent. However, even though the XHTML 1.0 DOCTYPE with a text/html MIME type validates under the W3C's validator, again, it's still technically wrong, because you're sending conflicting information to the browser (telling it in one part of the document that it's supposed to be parsed as XHTML, a subset of XML, while in another part of the document, telling it that it's supposed be be parsed as a text HTML file).
No good. So, my recommendation is to add an HTML 4 DOCTYPE to your webpages, while still keeping well-formedness in mind, so that it would validate as XHTML. I'll be following through on this during the Insider's next redesign (don't hold your breath we're talking many months here
----- The following in optional reading for those who feel the need to burn some time -----
Let's pretend that IE does support application/xhtml+xml (which it doesn't!). So, even if you can send your beautiful application/xhtml+xml MIME type, here are a few reasons you still probably wouldn't want to write a true-blue XHTML document (all of these are copied directly from Hixie's XHTML Advocacy page, but I'm putting the important stuff in here, and a number of you might not be able to see that webpage anyway).
Code:
* <script> and <style> elements in XHTML sent as text/html have to be
escaped using ridiculously complicated strings.
This is because in XHTML, <script> and <style> elements are #PCDATA
blocks, not #CDATA blocks, and therefore <!-- and --> really _are_
comments tags, and are not ignored by the XHTML parser. To escape
script in an XHTML document which may be handled as either HTML4 or
XHTML, you have to use:
<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>
To embed CSS in an XHTML document which may be handled as either
HTML4 or XHTML, you have to use:
<style type="text/css"><!--/*--><![CDATA[/*><!--*/
...
/*]]>*/--></style>
Yes, it's pretty ridiculous. If documents _aren't_ escaped like
this, then the contents of <script> and <style> elements get
dropped on the floor when parsed as true XHTML.
(This is all assuming you want your pages to work with older
browsers as well as XHTML browsers. If you only care about XHTML
and HTML4 browsers, you can make it a bit simpler.)
* A CSS stylesheet written for an HTML4 document is interpreted
slightly differently in an XHTML context (e.g. the <body> element
is not magical in XHTML, tag names must be written in lowercase in
XHTML). Thus documents change rendering when parsed as XHTML.
* A DOM-based script written for an HTML4 document has subtly
different semantics in an XHTML context (e.g. element names are
case insensitive and returned in uppercase in HTML4, case sensitive
and always lowercase in XHTML; you have to use the namespace-aware
methods in XHTML, but not in HTML4). BUT, if you send your
documents as text/html, then they will use the HTML4 semantics
DESPITE being XHTML! Thus, scripts are highly likely to break when
the document is parsed as XHTML.
* Scripts that use document.write() will not work in XHTML contexts.
(You have to use DOM Core methods.)
* Current UAs are, for text/html content, HTML4 user agents (at best)
and certainly not XHTML user agents. Therefore if you send them
XHTML you are sending them content in a language which is not
native to them, and instead relying on their error handling. Since
this is not defined in any specification, it may vary from one user
agent to the other.
* XHTML documents that use the "/>" notation, as in "<link />" have
very different semantics when parsed as HTML4. So if there was to
be a fully compliant HTML4 UA, it would be quite correct to show
">" characters all over the page.