web: Non UTF-8 characters in XML document make DOMParser fail

mardi 17 avril 2018

Non UTF-8 characters in XML document make DOMParser fail

I’m struggling with parsing of some input to my extension. There’s RAW text contained in xhr response. For correct documents following code works and dom will contain the parsed response:

const response = await getNodeList();
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(response.data, 'text/xml');
const nodes = xmlDoc.getElementsByTagName('node');

Problem now is, that there are XML documents out there which contain non-ASCII characters in other encoding as UTF-8. E.g. an space as ISO-8859-1 as in DEVICE%206-1.

This makes the DOMParser to bail out (made sure by replacing the %20 by an o using a hex editor). The question now is, what’s to do (apart from telling users that garbage in results in garbage out)?

<?xml version="1.0"?><zwave><node desc="6" deprecated="1"><ep desc="262" id="0" generic="16" specific="1" name="DEVICE 6".......

Regards, Umut

web

mardi 17 avril 2018

Non UTF-8 characters in XML document make DOMParser fail

Aucun commentaire:

Enregistrer un commentaire