PDF compression in 2026: what actually works
Why some PDFs shrink 90% and others barely move. A practical guide to compression levels, what gets stripped, and when to use which.
You drop a 25 MB PDF into a compressor, expect a small file out, and get back… 24 MB. Why?
Because PDF compression isn’t one thing — it’s a half-dozen techniques applied to a half-dozen kinds of data inside the file. Some PDFs are mostly text (already small). Some are mostly images (lots to gain). Some have invisible cruft (easy wins). This article explains what actually moves the needle, so you can predict the result before you hit Compress.
What’s actually inside a PDF
A PDF file is a tree of objects. Some objects are:
- Text streams — already small; a 100-page novel is maybe 500 KB. Compressing text gains essentially nothing.
- Vector graphics — diagrams, logos. Tiny.
- Embedded images — photos, scans, signatures. Almost always the biggest things in the file.
- Embedded fonts — usually 100 KB–1 MB each; a font-heavy PDF can have several.
- Metadata — XMP, tags for accessibility, form definitions. Usually small.
If your PDF is 50 MB, you can bet 95% of it is images. The only thing that meaningfully changes file size is what you do to those images.
What our compressor does
Compress PDF uses a fast compiled module that runs in your browser, walks the PDF, finds every image XObject, and recompresses it. Three levels:
- Extreme: re-encode every image as JPEG quality 50, target 96 PPI.
- Recommended: JPEG quality 70, target 150 PPI.
- Less compression: JPEG quality 85, target 220 PPI.
PPI (pixels per inch) matters because most “high-res” images in PDFs were captured at print resolution (300 PPI) when 96 PPI is enough for screens. Halving the linear resolution quarters the pixel count, before any quality loss from JPEG.
Text, fonts, vector graphics, and structure all pass through untouched. Nothing rendered on screen disappears.
What we strip (and what we don’t)
By default at the Extreme level, we also drop XMP metadata — the invisible “Author / Title / Subject” entries Adobe writes when you export. On real-world Adobe PDFs, that alone is often 1–5 KB per page (yes, kilobytes — PDF XMP is verbose).
We don’t drop:
- Form fields — fillable forms keep working.
- Hyperlinks and bookmarks — still clickable.
- Tagged structure (a11y tree) — screen-reader compatibility preserved at Recommended and Less compression. Stripped only at Extreme.
- Form data the user has already entered.
The Extreme level removes the structure tree because for many users, the file just needs to be small enough to email — accessibility metadata they don’t use can go.
Realistic expectations
| Source PDF type | Extreme | Recommended | Less compression |
|---|---|---|---|
| Scanned book pages (300 PPI grayscale) | 80–90% smaller | 60% | 30% |
| Photo-heavy report | 70% | 50% | 25% |
| Mixed-content business doc | 40–60% | 30% | 15% |
| Pure text (novel, RFC) | 0–5% | 0% | 0% |
| Already-optimized PDF | 5–15% | 0% | 0% |
If you compress at Recommended and barely save anything, your PDF was probably already optimized at export. There’s no magic that beats a JPEG that’s already at 70% quality.
When not to compress
- Printing: if you’ll print the PDF, leave it at high resolution. Compression sized for screen looks soft on paper.
- Archival: PDF/A documents shouldn’t be re-encoded. Use PDF/A-conforming tools, not a generic compressor.
- Re-edited downstream: if someone will run OCR on the result, more pixels = better text recognition.
Privacy
Everything runs locally. Your PDF is parsed inside your browser tab, never uploaded. Same applies to all our free tools — file privacy is the whole point of doing this in your browser instead of on a server.
Try the PDF compressor — paste your file in and see what it does on yours.