Seems there's an extraneous span end tag making an appearance.
matthewTue 14 Jul 2020
I'm sure these are not the only cases of unbalanced tags. I pushed a fix to the website, but I might suggest that improperly formatted HTML is a very common thing in general, and that your HtmlParser be able to more gracefully continue parsing in the face of issues like this. If you find an unexpected <end/> tag, log it as an error and keep parsing - or maybe have an optional strict mode or something.
SlimerDudeThu 16 Jul 2020
Don't worry @matthew, I'll not bug you again on this matter! :)
SlimerDude Wed 8 Jul 2020
Hi,
I know this is pedantic, I mention it only because it may be affecting SEO and the like...
The fantom.org website is declared with an XHTML DOCTYPE:
But the
<head>uses an empty<meta>void tag (invalid XML).fansh> xml::XParser(web::WebClient(`https://fantom.org/`).getStr.replace("–", "-").in).parseDoc xml::XErr: Expecting end of element 'meta' (start line 7) [line 21, col 3] xml::XParser.err (XParser.fan:986) xml::XParser.parseElemEnd (XParser.fan:505) xml::XParser.next (XParser.fan:174) xml::XParser.parseElem (XParser.fan:69) xml::XParser.parseDoc (XParser.fan:46) xml::XParser.parseDoc (XParser.fan)But beyond this, there seems to be a missing
</div>end tag somewhere that makes even HTML invalid.fansh> xml::XParser(web::WebClient(`https://fantom.org/`).getStr.replace("–", "-").replace("1.0'", "1.0'/").in).parseDoc xml::XErr: Expecting end of element 'div' (start line 51) [line 98, col 3] xml::XParser.err (XParser.fan:986) xml::XParser.parseElemEnd (XParser.fan:505) xml::XParser.next (XParser.fan:174) xml::XParser.parseElem (XParser.fan:69) xml::XParser.parseDoc (XParser.fan:46) xml::XParser.parseDoc (XParser.fan)It was noticed by a colleague who was using the Fantom site to test a bug fix to HTML Parser.
matthew Fri 10 Jul 2020
Thanks for reporting. I'll look at fixing both these things next week.
matthew Mon 13 Jul 2020
@SlimerDude - switched the doctype to HTML5 doctype so you won't be able to parse it as XML anymore. I also fixed the unmatched
<div>tagSlimerDude Tue 14 Jul 2020
Hi Matthew, cool the homepage is looking good and parsing nicely!
However, parsing this page, and any other forum page, gives:
sys::ParseErr: End tag </span> does not match start tag <h2>Seems there's an extraneous span end tag making an appearance.
matthew Tue 14 Jul 2020
I'm sure these are not the only cases of unbalanced tags. I pushed a fix to the website, but I might suggest that improperly formatted HTML is a very common thing in general, and that your HtmlParser be able to more gracefully continue parsing in the face of issues like this. If you find an unexpected <end/> tag, log it as an error and keep parsing - or maybe have an optional strict mode or something.
SlimerDude Thu 16 Jul 2020
Don't worry @matthew, I'll not bug you again on this matter! :)