A disadvantage of using a simple text editor to produce HTML is that it is relatively time-consuming to put in the proper typographical quotation marks and dashes. For example like this: “Welcome – come in” instead of "Welcome - come in". Furthermore Windows applications which do insert these characters not infrequently use Windows-specific characters instead of the proper platform-independent HTML characters.
I offer here a pair of sed scripts which automatically generate the proper characters.
If you haven’t met sed before: it is a small (40kB – microscopic compared to your average Windows application!) batch editor. You can download it free from Sourceforge, which also has tutorials etc. (Not that you need tutorials to use sed scripts – only to write them). There is further information on sed at http://www.student.northpark.edu/pemente/sed/.
I have tested these scripts on sed 3.59 on Windows.
There are two scripts, for Windows and for other operating systems.
Windows version: | proper_quotes_win.sed |
---|---|
Non-windows version: | proper_quotes_nonwin.sed |
Feel free to download them and use them (for non-commercial purposes). If they’ve been useful I’d appreciate it if you’d tell me – and please tell me if you have problems with them.
What they do is:
They are simple to use. For example, to convert file magic.html, just type the following on the command line (assuming the script and file are in the same directory and sed itself is either in the same directory or on the path):
sed -f proper_quotes_win.sed magic.html >magic2.html
You can then rename magic2.html back to magic.html after you have done any checking you may wish to do.
And of course you can, if desired, reduce the amount of typing by creating a small batch file. For
DOS/Windows, if you create a file pq.bat
containing:
echo off sed -f D:\website\tools\proper_quotes_win.sed %1 >temp.html copy temp.html %1(obviously your path must be adjusted to wherever you keep your tools) then all you need to type is
pq magic.html
(or whatever your file name is).
There are some minor restrictions:
However between <HEAD> and </HEAD> tags, no characters are converted, so HTML attributes can then span a line break (in practice this means within Meta elements).
And some very minor restrictions which you are unlikely to hit:
Thanks to David Warren Steel, who tested these for me on the SunOS and Irix dialects of Unix, and helped me track down a particularly obscure problem. Also to Stan Brown, who found several extra cases which needed handling (I didn’t manage to address all of them, but have done the ones most likely to occur.)