• Web forums can be indexed by Google and hence show up in search results
  • Web forum sysops can easily decorate a forum as they please, creating a little more personality than an average mailing list
  • Forums often have fine controls in how posts as seen to the outside world
  • Forums are easier to moderate - UI in place for banning IPs etc.
  • Forums generally have a better community feel
  • Generally have DB backend, making it easier to do queries to see how many posts a certain user has made etc.
  • Instant gratification - you can confirm your post is live

Mailing list problems:

  • Hard to subscribe
  • Sometimes difficult to confirm your post actually was posted
  • Cliquey atmosphere
  • Often no public archives or poor Web archives

If you're a fan like myself of email, please consider taking up a project to write a decent mailing list Web archive system.

You wrote: “HTML in messages is just plain stupid. Most MUAs can’t render Unicode or wrap text properly. To render HTML correctly you need a beast! Or am I wrong? Most Web forum posts don’t seem to use HTML that much …”

Only MUA I know that can’t handle Unicode nicely is pine. Mutt can handle unicode but not by default. Real MUA‘s (the one with GUI) what people are using, like outlook, can handle unicode correctly.

HTML-mails on the anotherhand.. Webmail’s display HTML-mails only when you open attachment in a new window (== sucky MUA design), and one windows MUA which is included in one finnish companys business suite. Mutt can’t handle html mails unless you hook lynx -dump or similar to config.

HTML-based emails are good thing and Unicode is the best thing what Internet has for text compatibility!

- Joose

Comment by Joose

Even on a Windows machine I get ???????????????? I guess that’s because I don’t have a Japanese font or something installed.

The HTML email implementation quality vary widely. Some MUAs do a text version and an HTML version. OMG! Which one does one read? Are they the same?

I find it dissapointing you believe “HTML-based emails are good thing”! Most MUAs battle to quote HTML emails. Grr!

Comment by hendry

Well, then there are something wrong or somethig.. Because this page of yours, is unicode and I think that you can read this very easily.. Let’s try it.. Here is few UTF-8 chars:

€ (euro sign)
Ä (a with dots)
‰ (promille)
≠ (not eqaul to)

- Joose

Comment by Joose

OK OK YOU WIN JOOSE. :) Though seriously I do get ????? on Japanese sites, largely because I guess I don’t have the correct font installed. Though I still think decent client Unicode setups are rare.

Something that Teemu pointed me to:

NNTP gateway:

Which works with Web forum:

Comment by hendry

I must agree with HTML in email, because it’s not actually necessary to convey a message. HTML is fluff. Pictures in-line, borders, colours, text styles, etc. I do actually occasionally find it useful (though only as multipart/alternative) but think it’s very silly to use where it’s not necessary. HTML support is complex for MUAs to properly implement, and generally unnecessary.

While to an extent the same is true of Unicode in email (be it UCS-2/UTF-16 or UTF-8) in that a user can use an older 8-bit locale like the ISO-8859 locales (“latin-n / iso-8859-n”) , I don’t think it’s the same thing as HTML at all. Not that you said that, re-reading your post.

Unicode should not be difficult for most MUAs. MUAs must already handle incoming text in varying encodings. There is no such thing as “just text” ; a byte stream isn’t text without an associated encoding. Most MIME email these days should have character encoding info, so one can’t really claim there’s no way to know what the input charset is. It’s not safe to assume that incoming messages use the same text encoding as you – for example, an MUA running on a machine with a default UTF-8 charset must convert incoming latin-1 text to UTF-8; an MUA on a latin-1 system must do the reverse for incoming UTF-8 text. To an English speaker who only uses the 7-bit ASCII subset that’s not even noticeable, but it’s a major pain for the rest of the world (and for those of us who like typographically civilized email :-P ).

Unless you’re prepared to flatly require that all incoming text only use 7-bit ASCII, you need to be aware of encodings. Even if you are, SHIFT-JIS and some other unusual encodings that don’t retain the 7 bit ASCII subset will still stab you in the back.

Given this, the most likely issue with unicode support is in MUAs that assume a 1:1 correspondance between bytes and characters, which will find themselves getting very confused doing things like wrapping or taking substrings. This is annoying, to be sure, but fixable. If nothing else, these MUAs can use their existing encoding conversion facilities to convert incoming unicode text to an encoding they CAN deal with, such as latin-1, and substitute characters that don’t have codepoints in that locale with an error indicator ( ”?” is common, if confusing ). MUAs that can’t do at least this have bigger problems than unicode to worry about.

Consequently, I think that for MIME email, it’s stupid to insist that your preferred character set (be it 7-bit ASCII or otherwise) should be the one everybody uses. It’s really an MUA‘s job, for proper MIME support, to take care of this issue, and to my mind failure to do so is a major bug.

For non-MIME mail, the issue is rather messier, but the obvious answer is “use MIME”. Which, like HTML, isn’t actually technically necessary ( plain RFC822 ASCII bodies with uuencoded attachments are good enough for everybody, right?) but rather handy. Hmm. Maybe HTML isn’t so bad after all ;-)

When it comes to man pages, it’s a bit more confusing. As far as I know there’s no way to reliably determine what text encoding a man page is in, since (unlike MIME) man pages don’t have anything so useful as a header or macro to declare the encoding of the text. The right fix is probably to add support in nroff for a macro like .CHARSET, solving the problem once and for all. Without that, the only way to go is to assume an encoding (either taking it from the system locale, or from some external specifier of the input’s file encoding). If you can’t always reliably get the encoding of a file, you have to best guess it – and ASCII-only text has the best chance of surviving that best guess.

So yeah, I’d say for now it’s necessary to work around the immediate problem by reducing man pages to 7-bit ASCII where possible. That’s very far from a “fix” though – it’s a workaround, and an ugly one. The real solution is to fix man pages so that they specify their encoding and the tools that process them are aware of text encodings. Translators would love whoever did this forever, and I expect it’d make people who generate man pages from other formats pretty darn happy too. There’s less and less reason to ensure tools that don’t understand text encodings these days, and they’re becoming more and more of a problem.

I should note that I’m an Australian myself; the furthest out of US-ASCII I get on a day to day basis are symols like ®©° and the Czech, German and French names of people I work with over the ‘net. However, it’s really embarrassing to have to write Vanek because some broken mail client can’t cope with Vaněk. Even in English we now mangle the language (or borrowed words, at least) to work around the stupidity of old programs and protocols – when did you last write café as café not cafe, correctly spell résumé, or write dæmon?

In the short run, sure, edit out those section symbols. In the longer run, that’s just not an OK “fix” anymore in my view – not for English speakers, and certainly not for anybody else.

Craig Ringer

Comment by Craig Ringer