First off, after reading about the horror that other programming languages bring to the table with regards to URI support (notably Ruby and Python), I'm very pleased that Fantom took the time to get this right!
I do however have the odd question regarding URI encoding / decoding...
Q1) Encoding
Lets say I have some nasty bit of text than I want to use as path segment. (Let's even have it include the / character!) How do I create a URI with this? Example:
url := `http://foo.com/`
nasty := "-/-"
// ... code to add 'nasty' to 'url'
// the standard (backslashed) form of url should now look like:
echo(url) // --> http://foo.com/-\/-
To create a Uri from a Str there is sys::Uri.fromStr and sys::Uri.decode, but to use these my Str needs to be already encoded as standard form or percent encoded.
Is there a method somewhere that does this encoding for me? Or am I to write it myself?
Q2) Decoding
Assuming I now have my URL http://foo.com/-\/-, how do I now convert the path segment back into a standard Str that's not standard form? Example:
url := Uri("http://foo.com/-\\/-")
path := url.path.first
nasty := ... // code to decode path *from* standard form
echo(nasty) // --> -/-
Again, am I not seeing a method somewhere or do I need to write it myself?
If these standard form <-> Str and percent endcoding <-> Str methods don't currently exist, I feel as if they're an omission to the current (and otherwise excellent) API and it'd be nice if they were supported.
SlimerDudeFri 10 Oct 2014
For those wanting to convert to / from URI standard form, here are my methods:
static const Int[] delims := ":/?#[]@\\".chars
// Encode the Str *to* URI standard form
// see http://fantom.org/sidewalk/topic/2357
static Str encodeUri(Str str) {
buf := StrBuf(str.size + 8) // allow for 8 escapes
str.chars.each |char| {
if (delims.contains(char))
buf.addChar('\\')
buf.addChar(char)
}
return buf.toStr
}
** Decode the Str *from* URI standard form
** see http://fantom.org/sidewalk/topic/2357
static Str decodeUri(Str str) {
if (!str.chars.contains('\\'))
return str
buf := StrBuf(str.size)
escaped := false
str.chars.each |char| {
escaped = (char == '\\' && !escaped)
if (!escaped)
buf.addChar(char)
}
return buf.toStr
}
brianMon 20 Oct 2014
Maybe I'm not fully grokking it, but how does the API now whether that "-/-" is supposed to treat the "/" as a path separator or know that it is supposed to backslash escape it? All the normal encode/decode assume that special chars are being used for their normal purposes (scheme, port, path separators). Is what you are trying to do is escape all those special chars because you know that the string is a single file name within the path?
It sounds like you are going some wacked out things if you are trying to escape slashes and stuff, so maybe some background info might help too.
SlimerDudeSun 26 Oct 2014
maybe some background info might help too.
Sure.
Str <-> Standard Form
BedSheet encodes / decodes objects as strings so they may be embedded in URLs. A common use case is that a user object may encode itself as its primary key, so a User with an ID of 42 may be combined with a URL of /user to make /user/42.
A string msg could also be encoded, for example Hello Mum! would become /showMsg/Hello Mum!. The point is, any string should be able to be encoded into a URL:
msg := "What the @#:\\/!?"
url := `/showMsg/` + encodeUri(msg).toUri
At the other end when you're handling the request for the /showMsg/... you want to decode the URI segment back into it's original form.
origMsg := decodeUri(url.path[1]) // --> "What the @#:\\/!?"
So the methods do as they say, encode and decode strings into URI paths.
Str <-> Percent Encoding
I see in the java src that Fantom has some pretty optimised routines for encoding / decoding URIs into an percent encoded format. It would be neat if they were exposed a little so others could make use of it.
I cleaned up the escape handling in Uri normalization, and added five new methods: isPathRel, escapeToken, unescapeToken, encodeToken, and decodeToken. These methods are not design as a general purpose percent encoding library, but rather just designed to work with URIs and our predefined and optimized charMap/delimiter tables. These are actually the fundamental building blocks that where not easily exposed previously. I decided against some other higher level convenience methods such as a new "section constructor" for now - although now its fairly easy to build up parts into a normalized or encoded form now yourself with a StrBuf.
SlimerDudeSat 16 Sep 2017
Those methods look like a good addition Brian, thanks! I look forward to trying them out.
As for the convenience ctor, I may try putting a util class together which hopefully, now that we have the new methods, shouldn't be too difficult
SlimerDude Fri 10 Oct 2014
First off, after reading about the horror that other programming languages bring to the table with regards to URI support (notably Ruby and Python), I'm very pleased that Fantom took the time to get this right!
I do however have the odd question regarding URI encoding / decoding...
Q1) Encoding
Lets say I have some nasty bit of text than I want to use as path segment. (Let's even have it include the
/
character!) How do I create a URI with this? Example:To create a Uri from a Str there is
sys::Uri.fromStr
andsys::Uri.decode
, but to use these my Str needs to be already encoded as standard form or percent encoded.Is there a method somewhere that does this encoding for me? Or am I to write it myself?
Q2) Decoding
Assuming I now have my URL
http://foo.com/-\/-
, how do I now convert the path segment back into a standard Str that's not standard form? Example:Again, am I not seeing a method somewhere or do I need to write it myself?
If these
standard form <-> Str
andpercent endcoding <-> Str
methods don't currently exist, I feel as if they're an omission to the current (and otherwise excellent) API and it'd be nice if they were supported.SlimerDude Fri 10 Oct 2014
For those wanting to convert to / from URI standard form, here are my methods:
brian Mon 20 Oct 2014
Maybe I'm not fully grokking it, but how does the API now whether that "-/-" is supposed to treat the "/" as a path separator or know that it is supposed to backslash escape it? All the normal encode/decode assume that special chars are being used for their normal purposes (scheme, port, path separators). Is what you are trying to do is escape all those special chars because you know that the string is a single file name within the path?
It sounds like you are going some wacked out things if you are trying to escape slashes and stuff, so maybe some background info might help too.
SlimerDude Sun 26 Oct 2014
Sure.
Str <-> Standard Form
BedSheet encodes / decodes objects as strings so they may be embedded in URLs. A common use case is that a user object may encode itself as its primary key, so a User with an ID of
42
may be combined with a URL of/user
to make/user/42
.A string msg could also be encoded, for example
Hello Mum!
would become/showMsg/Hello Mum!
. The point is, any string should be able to be encoded into a URL:At the other end when you're handling the request for the
/showMsg/...
you want to decode the URI segment back into it's original form.So the methods do as they say, encode and decode strings into URI paths.
Str <-> Percent Encoding
I see in the java src that Fantom has some pretty optimised routines for encoding / decoding URIs into an percent encoded format. It would be neat if they were exposed a little so others could make use of it.
OAuth in particular makes heavy use of percent encoding.
Specifically I'm thinking of methods like:
where
exclude
is a list of characters that will not be encoded, usually the unreserved set-._~
brian Thu 8 Jan 2015
Ticket promoted to #2357 and assigned to brian
Add methods to Uri to encode/decode just the name portion of path
brian Fri 10 Jul 2015
More summary from 2432:
SlimerDude Tue 10 Nov 2015
As percent encoding UTF-8 strings is non-trivial, here's some sample code:
Some test examples:
brian Fri 15 Sep 2017
Ticket resolved in 1.0.70
I cleaned up the escape handling in Uri normalization, and added five new methods: isPathRel, escapeToken, unescapeToken, encodeToken, and decodeToken. These methods are not design as a general purpose percent encoding library, but rather just designed to work with URIs and our predefined and optimized charMap/delimiter tables. These are actually the fundamental building blocks that where not easily exposed previously. I decided against some other higher level convenience methods such as a new "section constructor" for now - although now its fairly easy to build up parts into a normalized or encoded form now yourself with a StrBuf.
SlimerDude Sat 16 Sep 2017
Those methods look like a good addition Brian, thanks! I look forward to trying them out.
As for the convenience ctor, I may try putting a util class together which hopefully, now that we have the new methods, shouldn't be too difficult