#---------------------------------------------------------------------- # proper_quotes_nonwin.sed # Copyright Stephen Poley. May be freely used for non-commercial purposes. # http://www.xs4all.nl/~sbpoley/ #---------------------------------------------------------------------- # # This sed script changes plain quotation marks in HTML to proper # typographical quotation marks, and hyphens used as dashes to dash # characters. It also replaces numeric character references # with character entity references for consistency. # # Non-Windows version: line-feed terminators # # Any quotes which must not be changed can be represented by ' (single) or # " or " (double). # # A few characters will be left unchanged in situations where it is hard to # determine if a left or right quotation mark is appropriate. # # Running a file through this script multiple times should produce the same # result as running it once; it doesn't matter if a file has already been fully # or partially converted to proper quotation marks. # # Restrictions: # 1) Doesn't recognise multi-line HTML comments, so any quotes in such # comments will also be converted. # 2) Within the body, can't handle HTML attributes which span a line break # (these are probably rare and arguably bad practice anyway). # 3) Within the head, HTML attributes spanning a line break (Meta elements) # are handled OK provided the and tags are present. # # Last updates: # 29-9-2003, handle multi-line Meta's # 1-4-2006, handle digit-quote-punctuation sequence #---------------------------------------------------------------------- # skip DOCTYPE // { x s/^/h/ x } /<\/[Hh][Ee][Aa][Dd]>/ { x s/h// x } x /h/ { x b endscr } x # replace any degree characters, so can use them temporarily s/°/\°/g # temporarily replace quotes in attributes and comments s/="\([^"]*\)"/=°\1°/g : commdq s/\(