URL Encoding: A Security Primer
A few previous posts have mentioned URL-encoding in passing, and while that might be sufficient for those walkthroughs, this post covers URL encoding in a bit more depth.
This includes what URL encoding is, why it’s needed, and of course, how this relates to security (including CTFs, CVEs, and using URL encoding “in the wild”).
What is URL encoding?
First of all, let’s define encoding.
Encoding is a way of converting data into a particular format. This is not encryption (there’s no key), and it’s not hashing (since encoding is reversible via decoding). Instead, encoding is just a way of translating characters into the expected format of whatever is receiving them.
Likewise, decoding is the reverse of encoding (to get back to the original data format).
URL encoding
URL encoding is a mechanism for translating unprintable or special characters to a universally accepted format by web servers and browsers. (source)
Put another way: URLs can only be transmitted over the internet if they conform to a certain set of characters (ASCII character set). Additionally, some characters are “reserved” (they have special meaning), such as =
.
RFC 3986 specifies these characters in great detail, but for a summary, here’s a table from Google Maps’ documentation:
If you have a URL that contains a character outside of this set, or a reserved character, they need to be translated into a different format. This translation is known as URL encoding.
This is done with a %
. For example, spaces cannot be sent as part of URLs, and are translated (URL encoded) into %20
.
Again, Google Maps’ documentation shows some examples of other common encodings:
Why is URL encoding needed?
URLs, or more specifically, URIs (Uniform Resource Indicators) must conform to certain standards in order to be consistently and correctly interpreted across all applications.
These standards (RFC 3986) include:
- Which characters are valid (“digits, letters, and a few graphic symbols”)
- How characters outside the allowed set must be transmitted (percent-encoded using % and the hex value), and
- Which characters are reserved and how to deal with conflicts (“If data for a URI component would conflict with a reserved character’s purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.”)
How to Identify URL Encoding
If you’ve got a string that’s URL-encoded, the majority of it should be discernible (as typically, only a small amount of the string will require URL-encoding using %
characters). Double-encoded strings may have a larger amount of %
-encoded characters.
If you are trying a CTF or other challenge and your requests are not working as expected, try changing your encoding to URL encoding and see if it makes a difference.
URL Encoding and Security
What does this apply to within security?
Since this is URL encoding, it applies to anything that’s transmitting URIs over the internet. This obviously includes web-related security, but might also include desktop, mobile, or other applications that are sending or processing URIs.
We can group URL encoding related vulnerabilities in different ways, so you’ll see some overlap between the following sections (for example, XSS style attacks that rely on lack of filtering against double encoding).
CWE-177 and CWE-174
Within MIRE’s CWE (Common Weakness Enumeration) cataloging system, URL encoding vulnerabilities are grouped under CWE-177, improper handling of URL encoding. Double-encoding issues are grouped under CWE-174.
Double Encoding Issues
There are vulnerabilities and CTF challenges where URL-encoding can help bypass a poorly implemented filtering mechanism. However, decoding a URL once isn’t the end of the story: attackers can double-decode their payload.
Double encoding is an attack technique where request parameters are encoded twice, as a way of bypassing filters that only perform one round of user input decoding. For example, ../
could be encoded as %2e%2e%2f
then encoded further by swapping out %
for %25
: %252e%252e%252f
.
The double encoding allows them to bypass the filter, but is still maliciously interpreted by the program.
Examples:
- CVE-2001-0333: a directory traversal vulnerability in IIS 5.0 and earlier where attackers can bypass filtering using double encoding on
../
- CVE-2004-1939: Zaep AntiSpam had a XSS vulnerability that did not remove
/
(to prevent cross-site scripting) if the attacker double-encoded a slash as%252f
. - CVE-2004-1938: a vulnerability where SQL injection was possible if single quotes were double encoded as
%2527
. - CVE-2004-1315: an RCE vulnerability that allowed for code execution by means of double-encoding a “highlight” parameter.
Resources:
- https://owasp.org/www-community/Double_Encoding
- https://subscription.packtpub.com/book/networking-and-servers/9781785284588/1/ch01lvl1sec11/double-encoding
- https://portswigger.net/web-security/reference/obfuscating-attacks-using-encodings
Path Traversal and URL Encoding
One common usage for URL-encoding (or double encoding) is path traversal, where an attacker is able to retrieve files on the filesystem outside of the main web directory.
Often times, there is a form of filtering in place, but these filtering methods do not always account for URL-encoding or double encoding.
Examples:
- CVE-2021-41773 & CVE-2021-42013: Apache Server <2.4.49 did not properly detect methods of encoding “../”, which allowed attackers to read arbitrary files.
Resources:
- https://www.ibm.com/docs/en/snips/4.6.0?topic=categories-path-traversal-attacks
- https://owasp.org/www-community/attacks/Path_Traversal
Cross-Site Scripting (XSS) and URL Encoding
Cross-site scripting vulnerabilities occur when an attacker is able to make a website execute malicious scripts. The web server should filter user input to prevent this from happening, but a poorly implemented filter may not catch characters (such as <
) if they are URL-encoded.
Examples:
- CVE-2004-1939: Zaep AntiSpam had a XSS vulnerability that did not remove
/
(to prevent cross-site scripting) if the attacker double-encoded a slash as%252f
.
Resources:
- https://trustfoundry.net/browser-url-encoding-decoding-and-xss/
- https://cobalt.io/blog/a-pentesters-guide-to-cross-site-scripting-xss
Null Byte Injection and URL Encoding
While you wouldn’t expect %00
to make an appearance in most web applications (vs binaries, for example) as websites are usually developed in a higher-level programming language, they do eventually touch system-level code, typically written in C.
Null byte injection is an exploitation trick to bypass sanity checking filters by adding a null-byte to the end of user input, URL-encoded as %00
. This may allow an attacker to prematurely terminate a string, and “cut off” a value they can’t control.
Examples:
- CVE-2000-0671: Roxen webservers (version <2.0.69) allowed the bypass of access restrictions, directory listing, source code reading all by adding a
%00
to the URL. - CVE-2004-0629: RCE in Adobe Acrobat 5.05 due to buffer overflow triggered by
%00
plus a long string.
Resources:
- https://www.whitehatsec.com/glossary/content/null-byte-injection
- https://owasp.org/www-community/attacks/Embedding_Null_Code
- http://phrack.org/issues/55/7.html#article
- http://projects.webappsec.org/w/page/13246949/Null%20Byte%20Injection
Other
As shown in the double-encoding section, RCEs and SQL injection vulnerabilities can also overlap with improper decoding. Anything that is a user-supplied value that is improperly “handled” (decoded and filtered) can lead to an issue.
CTF Examples
- VirSecCon CTF’s GET Encoded: Using URL encoding to bypass a blocklist
- Advent of CTF Challenge 2: A cookie-related challenge that did not focus on URL-encoding, but needed URL-encoding in CyberChef to properly interpret data
- FireShell CTF’s Vice: a PHP deserialization bug that requires the CTF player to URL-encode their input
How to URL encode
Python
# Python 2
import sys
import urllib
result = urllib.quote_plus(<your string here>)
print result
# Python 3
import sys
import urllib.parse
result = print(urllib.parse.quote_plus(<your string here>))
print(result)
Javascript
Open up the Dev Tools console in your browser, then:
console.log(encodeURIComponent("<Your string here>"))
BurpSuite
BurpSuite allows you to highlight a value (shown here in the Repeater
tab), right-click, select Convert, and then choose an encoding scheme.
You can also send data to the Decoder
tab, and decode URL-encoded values by selecting the right decoding method manually, or clicking “smart decode”:
CyberChef
The fastest way to do URL encoding or decoding if you need a quick “scratch pad” style website is probably CyberChef.
Within CyberChef, find the “URL Encode” module, and add your input to the upper right-hand box.
There are other websites solely dedicated to URL encoding/decoding, such as URLencoding.io.