September 30, 2010

Repair broken html

Filed under: Webdevelopment — Schlagwörter: — paddy @ 1:38 pm

This worked pretty well for me:
“NB:“ Entities such as „en dash“ – which results in: – have to be substituted before using tidy.

for i in *.html; do recode windows-1252..u8 $i;done
for i in *.html; do sed -i ’s#<html>#<html xmlns=“http://www.w3.org/1999/xhtml“ xml:lang=“en“ lang=“en“>#‘ $i;done
for i in *.html; do tidy -c -m -utf8 -asxml $i;done
for i in *.html; do sed -i ‚1i\<!DOCTYPE html PUBLIC „-//W3C//DTD XHTML 1.0 Transitional//EN“ „http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd“>‘ $i;done
for i in *.html; do sed -i ’s#</title>#</title>\n<meta http-equiv=“Content-Type“ content=“text/html; charset=utf-8″ />#‘ $i;done

Keine Kommentare »

Noch keine Kommentare

RSS-Feed für Kommentare zu diesem Artikel. TrackBack URL

Schreib einen Kommentar