urlencode
- urlencode — URL-encodes string
Description
- This function is convenient when encoding a string to be used in a query part of a URL, as a convenient way to pass variables to the next page.
Example #1 urlencode() example
rawurlencode
- rawurlencode — URL-encode according to RFC 3986
Description
- Encodes the given string according to » RFC 3986.
Example #1 including a password in an FTP URL
[ad type=”banner”]urlencode vs rawurlencode
The main difference between the two is the encoding of a [SPACE]
- URLENCODE – encodes a space as a plus sign ‘+‘
- RAWURLENCODE – encodes a space as ‘%20’
rawurlencode:
- the follows RFC 1738 prior to PHP 5.3.0 and RFC 3986 afterwards.
- Returns a string in which all non-alphanumeric characters except -_.~ have been replaced with a percent (%) sign followed by two hex digits.
- This is the encoding described in » RFC 3986 for protecting literal characters from being interpreted as special URL delimiters, and for protecting URLs from being mangled by transmission media with character conversions (like some email systems).
- Note on RFC 3986 vs 1738. rawurlencode prior to php 5.3 encoded the tilde character (~) according to RFC 1738.
- As of PHP 5.3, however, rawurlencode follows RFC 3986 which does not require encoding tilde characters.
urlencode :
- urlencode encodes spaces as plus signs (not as %20 as done in rawurlencode)
- Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs.
- It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type.
- This differs from the » RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.
Differences in EBCDIC:
URLENCODE:
- Same iteration setup as with ASCII
- Still translating the “space” character to a + sign. Note– I think this needs to be compiled in EBCDIC or you’ll end up with a bug? Can someone edit and confirm this?
- It checks if the present char is a char before 0, with the exception of being a . or -, OR less than A but greater than char 9, OR greater than Z and less than a but not a _. OR greater than z . If it matches any of those, do a similar lookup as found in the ASCII version.
RAWURLENCODE:
- Same iteration setup as with ASCII
- Same check as described in the EBCDIC version of URL Encode, with the exception that if it’s greater than z, it excludes ~ from the URL encode.
- Same assignment as the ASCII RawUrlEncode
- Still appending the \0 byte to the string before return.
One practical reason to choose one over the other is if you’re going to use the result in another environment, for example JavaScript.
- In PHP urlencode(‘test 1’) returns ‘test+1’ while rawurlencode(‘test 1’) returns ‘test%201’ as result.
- But if you need to “decode” this in JavaScript using decodeURI() function then decode URI(“test+1”) will give you “test+1” while decode URI(“test%201”) will give you “test 1” as result.
- In other words the space (” “) encoded by urlencode to plus (“+”) in PHP will not be properly decoded by decode URI in JavaScript.
- In such cases the rawurlencode PHP function should be used.