#2598 URL Encoding

go4 Wed 8 Mar 2017

How to encode RUL Component in Fantom?

Example: encode http://fantom.org/index?id=1&p=abc

to http%3A%2F%2Ffantom.org%2Findex%3Fid%3D1%26p%3Dabc

fansh> `http://fantom.org/index?id=1&p=abc`.encode

fansh> Uri.encodeQuery(["t":"http://fantom.org/index?id=1&p=abc"])

Uri.encodeQuery seems not encode /, : and ?


SlimerDude Thu 9 Mar 2017

I believe those characters do not need to be encoded when part of a query string, as they hold no special meaning to query strings.

RFC 3986 sec-3.4 Query says:

The characters slash ("/") and question mark ("?") may represent data within the query component.

Beware that some older, erroneous implementations may not handle such data correctly because they fail to distinguish query data from path data when looking for hierarchical separators.

Query components are often used to carry identifying information (such as) a reference to another URI, it is sometimes better for usability to avoid percent- encoding those characters.

Is this causing you problems, or did you just expect everything to be percent encoded?

On a related matter, it would be useful to have general purpose percent encoding / decoding methods in Fantom, because with the UTF-8 encoding, it's not easy.

The URI class has some very optimised routines for percent encoding / decoding but they're hidden from the public API. In my 3rd comment on the URI Encoding / Decoding topic I suggest they are exposed somehow.

Re-thinking, I think the following to encode / decode a single character would be useful:

static Str percentEncode(Int char)
static Int? percentDecode(Str encodedChar, Bool checked := true)

Originally I was thinking percentEncode() would encode an entire string - but it then gets complicated as to what other non a-zA-Z0-9 characters you don't want encoded. So it may be easier to just encode single characters, leaving it up to the user to encode strings:

encoded := `http://fantom.org/index?id=1&p=abc`.chars.map {
    it.isAlphaNum ? it.toChar : Uri.percentEncode(it)

But given it's the UTF-8 encoding that's the tricky part, maybe Charset could sport similar methods for encoding / decoding individual code points? (Note that given the current public Charset API is no better than an enum, it could be a welcome addition!)

go4 Fri 10 Mar 2017

Thank you. You're right

That is most of online url encoding tools are not correct.

brian Wed 4 Oct 2017

Just a link - this similar to 2357 which is now handled by encodeToken and decodeToken (which takes an explicit section of the URI so we know what chars are required to be percent encoded)

Login or Signup to reply.