CTF Support
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

File Analysis

Introduction

File identification is one of the first steps in a forensics challenge. Often, files may have misleading extensions, be corrupted, or contain hidden data. Knowing how to determine a file’s true type and contents is crucial for further analysis.

Goals:

  • Identify file types reliably
  • Detect mismatched extensions and polyglots
  • Extract useful metadata

Quick Reference

  • Identify file type: file mystery_file
  • View magic bytes: hexdump -C file | head
  • Check for embedded data: binwalk file
  • View metadata: exiftool file
  • Search for strings: strings file | grep "CTF{"

Tools

Tool Purpose
file Detects file type based on magic bytes
hexdump / xxd Inspect raw bytes of a file
binwalk Analyze embedded files and compressed data
exiftool Extract metadata from media and documents
strings Extract readable text

Step-by-Step Guide

Using file

The simplest method is the file command. It looks at the “magic bytes” of a file (the first few bytes that identify a file type).

file mystery_file

Example:

$ file file.zip
file.zip: Zip archive data, at least v2.0 to extract, compression method=deflate

Even if the extension is .jpg, file will detect the true type.

In case of polyglots, using file with the flag -k/--keep-going will identify multiple filetypes.

Inspecting the Header with hexdump / xxd

If file fails, manually check the first few bytes.

Using hexdump:

hexdump -C mystery_file | head -n 10

Using xxd:

xxd -l 32 mystery_file

Common Magic Bytes:

File Type Hex Signature
PNG 89 50 4E 47 0D 0A 1A 0A
ZIP 50 4B 03 04
PDF 25 50 44 46
GIF 47 49 46 38 37 61 / 38 39 61

A comprehensive list can be found on the Wikipedia page List of file signatures.

Using binwalk for Embedded Files

Some files may contain hidden data or multiple embedded files (polyglots).

Using binwalk:

binwalk mystery_file

Example:

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             PNG image, 1920 x 1080
24576         0x6000          Zip archive data, at least v2.0 to extract

This can detect compressed archives, images, or encrypted blobs inside another file.

NOTE: These tools sometimes return false positives, identifying false headers.

Extracting Metadata with exiftool

For documents or media, metadata often reveals hidden information.

exiftool mystery_image.jpg

Check for:

  • Author or device info
  • Comments
  • Encoded/hex strings
  • “FLAG” patterns

Searching for Strings

Human-readable text might contain hints or embedded flags.

strings mystery_file | less

For Unicode strings (common in Windows):

strings -el mystery_file | less

Look for:

  • URLs
  • Encoded/hex strings
  • “FLAG” patterns

Tips

  • Never rely solely on file extensions, they can be faked.
  • Hidden content can exist anywhere, not just in headers (use binwalk or hexdump / xxd).
  • Compressed or encrypted embedded files may require extra tools for extraction.
  • Some files may be corrupted deliberately to confuse solvers, understanding headers is key.