#2233 Undeclared Entities in xml::XParser

SlimerDude Sun 26 Jan 2014

When working with Sizzle I parse the XHTML with XParser.parseDoc(), but I often get:

xml::XErr: Unsupported entity   [line 26, col 284]
  xml::XParser.err (XParser.fan:940)
  xml::XParser.err (XParser.fan)
  xml::XParser.toCharData (XParser.fan:808)

I can work around this by doing a Str replace of   with   before I parse the XML string:

xml := "<html>&nbsp;</html>"

xml = xml.replace("&nbsp;", "&#160;")
doc := XParser(xml.in).parseDoc

Sizzle.selectFromStr(xml, "html")

But I don't really want to do a Str replace for each and every valid HTML entity, of which there are hundreds.

Instead, it would be nice if XParser allowed undeclared entities, requiring a simple change of the XParser.toCharData() method. Specifically, changing line 808 from:

throw err("Unsupported entity &${entity};")

to

return "&${entity};"

Though you may want to make this behaviour optional (for backwards compatibility) by making it optional:

// via field
XParser(xml.in) {it.allowUndeclaredEntities = true}

// via an options map
XParser(xml.in).parseDoc(["allowUndeclaredEntities":true])

How about it?

SlimerDude Sun 26 Jan 2014

Hmm... after poking around the net, it seems that named entities aren't recommended in XHTML5. From HTML_vs._XHTML on WHATWG:

Do not use entity references in XHTML (except for the 5 predefined entities: &amp;, &lt;, &gt;, &quot; and &apos;); use the equivalent Unicode or numeric character reference sequence instead.

So I guess I should man up, make sure the pages are proper XHTML5 and deal with them myself!

See:

Login or Signup to reply.