We use BeautifulSoup to scrub HTML from our requests. Assume scrub is a configurable option with varying degree of security to remove HTML or remove dangerous elements. The code is something like:
for k,v in request.form.iteritems():
soup = BeautifulSoup(value)
soup.scrub()
request[k] = str(soup)
It usually works fine for HTML and Text input both. However if the input was simply plain text which has &
it breaks.
BeautifulSoup('H&W Insurance') = 'H&W; Insurance'
Ofcourse I can fix it by HTML escaping my input. But it won't work if input really was HTML. And if I do nothing, &
is not going to work. Both ways something is going to break. Is there a way I can both scrub the HTML and yet make my &
work?
I think the only way this can be solved it to have some conventions in the request to specify the exact type of the request, but the paradox is I am trying to handle an unexpected input, so I can't really specify something. Is this really a solvable problem?
Aucun commentaire:
Enregistrer un commentaire