Can you find the challenge within this polyglot and then do the RE to solve it? It includes a tool within itself to help solve it.

6 months ago

Latest Post Design+Innovation for COVID-19 by Colin Senner
/*
This is part of a series of CTFs for an awesome security company.  
As these are still part of their hiring process I won't be disclosing their name.
*/
disclaimer

If you'd like to try yourself before reading the CTF write-up you can download the file here: zipfile.pdf.

sha256: 5a98dd851dcac3aa2a2b72c8e34ad502bf5c3ce1210b3d1107984659d66a20b0

Zipfile.pdf is a manual for the NES, or is it a polyglot?  It’s called ‘zipfile’, which sounds like a huge clue.  Let’s see if it is a zip file as well.

Sure enough, zipfile.pdf is also a zip file that contains two more files.

Mesen.exe -> NES Emulator
Doc.pdf -> PDF file

I found a small recent article about this in PagedOut:

A guide to ICO/PDF polyglot files: https://pagedout.institute/download/PagedOut_001_beta1.pdf

In the article it explains there’s a way to make a file both a valid icon and pdf file.  Potentially something similar is going on here.  We’ve seen the pdf is also a zip file, what header does it have?

Wait, it’s an NES rom as well?  Damn, ok.  Let’s load up the ROM in our emulator.

Yep, it’s a valid ROM image. I pulled out just the NES image from the pdf so it’s only 25kb.

Ok, after playing around in the debugger a bit, it’s clear I need a better view of this statically.  Next step is to get this thing loaded in Ghidra successfully.

Not everything is labeled very well, but I can at least see the disassembly.  

When I start pressing buttons on my keyboard, some things appear in the little prompt $> on the game.

Here’s the controller mapping:

I stared at this for a good moment trying to figure out where the data getting loaded into the Accumulator was going...then I saw it’s a dereference, ugh.  There’s some secret data our values are getting xor’d with here:

The code goes through the secret data and ^ each key of our input.  It puts this newly written data at 0x400.

It fills up this buffer just xoring the encrypted bytes at 0xC6E4 with our input looped.

Once that’s done, it scans for the pattern located at: 0xCB48 which is this pattern:[0x26, 0x2C, 0x21, 0x27, 0x45, 0x37, 0x28, 0x10]

It searches 0x400 - 0x7FF for this pattern, if it finds it, we get the flag.

Wrote a python script to search for the result.

All in all, fun challenge.  I didn’t realize there were so many possible file types for polyglots, but it’s seemingly endless, specifically because of the PDF specification in that the header can be in any of the first 1024 bytes.  Onto the last challenge, and probably the scariest.

Colin Senner

Published 6 months ago