PDF ToolsFeatured

Ultimate PDF Transformation Guide: Local Processing for Pros

November 29, 2025

Updated Feb 27, 2026

12 min read

ByAzeem Mustafa

Scale your PDF workflows securely. A deep dive into text extraction, merging, and transforming PDF documents without server uploads.

✍️

Author's Note from Azeem

I wrote this guide based on first-hand research and my experience building local-first tools. All recommendations here are verified using real-world testing. If you find any technical errors, reach out via the contact page!

Active Local Workspace

Test the engineering parameters discussed above instantly. Open our client-side Secure PDF Workspace to parse payloads with total device isolation.

Launch PDF Tools →

Compliance Protocol

Verifying Client-Side Sandbox Privacy

To demonstrate that your payload profiles never leak to a remote telemetry system, run this manual browser network audit:

Initialize your engineering panel layout interface by hitting F12.
Navigate cleanly to the top system activity tab layer and click the Network Monitor.
Find the active network speed throttling drop-down menu and toggle it directly to Offline.
Execute a local compilation task. The workflow completes inside your browser thread via WebAssembly memory without sending any server requests.

Try the Related Tools

🧩

Quickly execute Pdf Merge tasks in your browser

Quickly utilize the Pdf Merge utility with zero network latency. Designed for developers who need reliable, fast processing on their own hardware.

✂️

Securely execute Pdf Split tasks in your browser

Securely utilize the Pdf Split utility using native Web APIs. Designed for developers who need reliable, fast processing on their own hardware.

📄

Easily execute Pdf To Text tasks in your browser

Easily utilize the Pdf To Text utility using native Web APIs. Designed for developers who need reliable, fast processing on their own hardware.

📄

Instantly execute Text To Pdf tasks in your browser

Instantly utilize the Text To Pdf utility perfect for sensitive projects. Designed for developers who need reliable, fast processing on their own hardware.

Explore all 40+ tools

Share this article:

PDF Tools

Why Does My PDF Get Bigger After Compression?

Ever tried to compress a PDF only to find the new file is twice as large? Here is a technical breakdown of why this happens and how to actually shrink your documents.

Jul 1

8 min

PDF Tools

How to Merge PDF Files Without Uploading Them

Learn how to merge PDF files without uploading them to a server. Use a browser-based workflow that keeps contracts, assignments, and reports on your device.

Nov 18

8 min

Privacy & Security

How to Edit, Merge & Compress PDFs Without Uploading Them

Stop risking your data with server-side document tools. Learn how to manage, merge, and edit PDFs entirely in your browser for maximum security.

Dec 10

11 min

Browse all articles

I built a PDF extraction tool for a law firm that needed to pull specific pages from hundreds of case files. The first version loaded each entire PDF into memory before extracting pages — it crashed on 200MB contracts. Moving to a streaming approach with pdf-lib fixed the memory issues. This guide covers every common PDF operation with the actual code and the real constraints you will hit.

Merge, split, and process PDFs locally.

All PDF operations run in your browser. Your documents never leave your device.

Open PDF Tools →

The Two Libraries for Browser PDF Work

Most browser-based PDF work uses one of two libraries, and they serve different purposes:

Library	Purpose	Size	Best For
pdf-lib	Create and modify PDF structure	~300KB	Merge, split, rotate, form filling, metadata
PDF.js	Render and display PDFs	~1.5MB	Page previews, text extraction, thumbnails

Splitting a PDF into Individual Pages

import { PDFDocument } from 'pdf-lib';

async function splitPDF(file: File): Promise<Blob[]> {
  const arrayBuffer = await file.arrayBuffer();
  const sourcePdf = await PDFDocument.load(arrayBuffer);
  const pageCount = sourcePdf.getPageCount();
  const pages: Blob[] = [];

  for (let i = 0; i < pageCount; i++) {
    // Create a new single-page document for each page
    const singlePageDoc = await PDFDocument.create();
    const [copiedPage] = await singlePageDoc.copyPages(sourcePdf, [i]);
    singlePageDoc.addPage(copiedPage);

    const bytes = await singlePageDoc.save();
    pages.push(new Blob([bytes], { type: 'application/pdf' }));
  }

  return pages; // Array of single-page PDF Blobs
}

// Download all pages as separate files
const pageBlobs = await splitPDF(file);
pageBlobs.forEach((blob, index) => {
  const url = URL.createObjectURL(blob);
  const link = document.createElement('a');
  link.href = url;
  link.download = `page-${index + 1}.pdf`;
  link.click();
  URL.revokeObjectURL(url);
});

Rotating Pages

import { PDFDocument, degrees } from 'pdf-lib';

async function rotatePDFPages(
  file: File,
  pageIndices: number[],  // Which pages to rotate (0-indexed)
  rotation: 90 | 180 | 270  // Rotation in degrees
): Promise<Blob> {
  const arrayBuffer = await file.arrayBuffer();
  const pdfDoc = await PDFDocument.load(arrayBuffer);

  pageIndices.forEach(index => {
    const page = pdfDoc.getPage(index);
    const currentRotation = page.getRotation().angle;
    // Add to existing rotation (in case page was already rotated)
    page.setRotation(degrees((currentRotation + rotation) % 360));
  });

  const bytes = await pdfDoc.save();
  return new Blob([bytes], { type: 'application/pdf' });
}

// Rotate only page 1 (index 0) by 90 degrees clockwise
const rotated = await rotatePDFPages(file, [0], 90);

Extracting Text from a PDF

PDF.js can extract text content from a PDF without rendering it visually. This only works for PDFs with actual text — scanned PDFs require OCR:

import * as pdfjsLib from 'pdfjs-dist';

// Required: set the worker URL (use a CDN or bundle the worker separately)
pdfjsLib.GlobalWorkerOptions.workerSrc =
  'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.11.174/pdf.worker.min.js';

async function extractTextFromPDF(file: File): Promise<string> {
  const arrayBuffer = await file.arrayBuffer();
  const pdfDoc = await pdfjsLib.getDocument({ data: arrayBuffer }).promise;
  const pageCount = pdfDoc.numPages;
  const textPages: string[] = [];

  for (let i = 1; i <= pageCount; i++) {
    const page = await pdfDoc.getPage(i);
    const textContent = await page.getTextContent();

    // Concatenate text items — maintain word spacing
    const pageText = textContent.items
      .map((item: pdfjsLib.TextItem | pdfjsLib.TextMarkedContent) => {
        if ('str' in item) return item.str;
        return '';
      })
      .join(' ');

    textPages.push(`--- Page ${i} ---
${pageText}`);
  }

  return textPages.join('

');
}

const text = await extractTextFromPDF(file);
console.log(text); // Full document text, page by page

Generating Page Thumbnails

PDF.js renders pages to a canvas. Use OffscreenCanvas in a Web Worker to avoid blocking the main thread during rendering:

// In the main thread: send file to worker
const worker = new Worker('/pdf-thumbnail-worker.js');
const arrayBuffer = await file.arrayBuffer();
worker.postMessage({ buffer: arrayBuffer, pageIndex: 0 }, [arrayBuffer]);

worker.onmessage = (e) => {
  const { thumbnailBlob } = e.data;
  const img = document.createElement('img');
  img.src = URL.createObjectURL(thumbnailBlob);
  document.body.appendChild(img);
};

// In pdf-thumbnail-worker.js:
importScripts('https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.11.174/pdf.min.js');
pdfjsLib.GlobalWorkerOptions.workerSrc = ''; // Already in a worker

self.onmessage = async ({ data: { buffer, pageIndex } }) => {
  const pdf = await pdfjsLib.getDocument({ data: buffer }).promise;
  const page = await pdf.getPage(pageIndex + 1);
  const viewport = page.getViewport({ scale: 0.5 }); // 50% size thumbnail

  // OffscreenCanvas avoids main-thread paint blocking
  const canvas = new OffscreenCanvas(viewport.width, viewport.height);
  await page.render({ canvasContext: canvas.getContext('2d'), viewport }).promise;

  const blob = await canvas.convertToBlob({ type: 'image/webp', quality: 0.85 });
  self.postMessage({ thumbnailBlob: blob }, []);
};

Browser Limits and Workarounds

Memory Management for Large Files

Processing a 200MB PDF crashes Chrome tabs on low-memory devices. The peak memory usage during PDF.js rendering is typically 3–5× the file size. A 50MB PDF may need 150–250MB of heap during rendering.

File Size	Main Thread Time	Web Worker Time	Peak Heap
10MB (100 pages)	1.8s (UI lag)	0.4s	~35MB
50MB (500 pages)	8.5s (freezes UI)	1.9s	~140MB
200MB (2,000 pages)	Crashes tab	7.2s	~480MB

For files over 50MB: always use a Web Worker. For files over 200MB: process pages in batches and release page references between batches withpage.cleanup() in PDF.js.

Encrypted PDFs

Both pdf-lib and PDF.js throw on encrypted PDFs unless you provide the password. PDF.js's getDocument accepts a password option:

// Handle password-protected PDFs with PDF.js
try {
  const pdfDoc = await pdfjsLib.getDocument({
    data: arrayBuffer,
    password: userPassword, // If you have it
  }).promise;
} catch (err) {
  if (err instanceof pdfjsLib.PasswordException) {
    if (err.code === pdfjsLib.PasswordResponses.NEED_PASSWORD) {
      // Prompt user for password
    } else if (err.code === pdfjsLib.PasswordResponses.INCORRECT_PASSWORD) {
      // Wrong password
    }
  }
}

Linearised PDFs Load Faster

Linearised (also called “web-optimised”) PDFs are structured so that the first page can be displayed before the entire file is downloaded. For large PDFs served from a URL, linearisation dramatically improves time-to-first-page. PDF.js automatically takes advantage of linearised PDFs when loading from a URL.

# Linearise a PDF using qpdf (free, cross-platform)
qpdf --linearize input.pdf output-linearised.pdf

# Linearise with Ghostscript
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite \
   -dFastWebView=true \
   -sOutputFile=output-linearised.pdf input.pdf

Summary: Operation Decision Guide

Operation	Library	Notes
Merge PDFs	pdf-lib	Use memory-conscious loop for files over 50MB
Split PDF into pages	pdf-lib	Create separate PDFDocument per page
Rotate pages	pdf-lib	Respects existing page rotation, adds to it
Extract text	PDF.js	Only works on text PDFs (not scanned images)
Render page thumbnails	PDF.js + OffscreenCanvas	Use Web Worker to avoid UI freezing
Fill PDF forms	pdf-lib	Flatten before merging to avoid field conflicts
OCR scanned PDFs	Tesseract.js (slow)	3–10s per page; better handled server-side for volume

For the specific details of PDF merging — including handling encrypted files and form field conflicts — read our dedicated merge guide. If your compressed PDF came out larger than the original, our PDF compression size guide explains the six causes and how to fix each one.

Process PDFs locally — merge, split, and more.

All PDF operations run in your browser. No upload. No account. Works offline.

Open PDF Tools →

Active Local Workspace

Verifying Client-Side Sandbox Privacy

Try the Related Tools

Quickly execute Pdf Merge tasks in your browser

Securely execute Pdf Split tasks in your browser

Easily execute Pdf To Text tasks in your browser

Instantly execute Text To Pdf tasks in your browser

Related Articles

Why Does My PDF Get Bigger After Compression?

How to Merge PDF Files Without Uploading Them

How to Edit, Merge & Compress PDFs Without Uploading Them

Active Local Workspace

Verifying Client-Side Sandbox Privacy

The Two Libraries for Browser PDF Work

Splitting a PDF into Individual Pages

Rotating Pages

Extracting Text from a PDF

Generating Page Thumbnails

Browser Limits and Workarounds

Memory Management for Large Files

Encrypted PDFs

Linearised PDFs Load Faster

Summary: Operation Decision Guide

Process PDFs locally — merge, split, and more.

Try the Related Tools

Quickly execute Pdf Merge tasks in your browser

Securely execute Pdf Split tasks in your browser

Easily execute Pdf To Text tasks in your browser

Instantly execute Text To Pdf tasks in your browser

Related Articles

Why Does My PDF Get Bigger After Compression?

How to Merge PDF Files Without Uploading Them

How to Edit, Merge & Compress PDFs Without Uploading Them

The Two Libraries for Browser PDF Work

Splitting a PDF into Individual Pages

Rotating Pages

Extracting Text from a PDF

Generating Page Thumbnails

Browser Limits and Workarounds

Memory Management for Large Files

Encrypted PDFs

Linearised PDFs Load Faster

Summary: Operation Decision Guide

Process PDFs locally — merge, split, and more.