BeautifulSoupで簡単に危険なタグをエスケープ
閉じてないタグを閉じたり、タグ名を小文字にしたりというような副作用もあるわけですが。
from BeautifulSoup import BeautifulSoup import cgi dangerous_tags = [ "script", "applet", "object", "embed", "img", "form", "input", "select", "textarea", "button", ] def escape(xmlstr): """ escape dangerous tags. >>> escape(u'<div>snip <script>alert("<b>BAD</b>")</script> snip</div>') u'<div>snip <script>alert("<b>BAD</b>")</script> snip</div>' """ xml = BeautifulSoup(xmlstr) for node in xml.findAll(dangerous_tags): node.replaceWith(cgi.escape(unicode(node))) pass return unicode(xml)
pythonモジュールBeautifulSoupについて詳しくは、Beautiful Soup documentation