Skip to content
Greffon
Tutorial

Self-Host Paperless-ngx with Greffon

Every invoice, contract, and tax form you own, scanned and searchable in one place. That archive is worth keeping on hardware you control. Here is the honest setup for Paperless-ngx on a greffer.

GLGreffon Labs5 min read
~/blog/self-host-paperless-ngx

A document archive is the kind of thing you build once and lean on for years: invoices, contracts, warranties, tax forms, the paper trail of a whole life. That is exactly the data you do not want scattered across a provider you rent. Paperless-ngx turns a pile of scans into an indexed, searchable archive, and Greffon takes the assembly off your plate so it runs on a machine you own.

Why own the archive

Paperless-ngx ingests documents, runs OCR so the text inside scans becomes searchable, and tags and sorts them so you can find a receipt from three years ago in seconds. It is the open project many people reach for when they want to go paperless without handing their financial and legal records to a cloud account.

Self-hosting changes where those records sit. The OCR text, the original PDFs, the tags you build up over time all live on your greffer, not on someone else's storage. For documents this sensitive, that is the whole point.

Graft it from the catalog

On a greffer you do not hand-write a compose file or wire a reverse proxy. Pick Paperless-ngx from the catalog and graft it onto your greffer. Greffon issues the certificate and routes the app, so it comes up reachable over HTTPS from the first start. The Django secret key is generated for you at instance creation, so the one thing you set by hand is the admin password.

Give the admin password at least 12 characters
On first start you set the password for the admin superuser you will log in with. The catalog enforces a 12-character minimum. Pick a strong one: it is the front door to every document you archive. You can add more users from inside Paperless-ngx later.

Reach it from anywhere

An archive is most useful when you can pull up a document from your phone at a counter or from a laptop on the road. On the same network as your greffer that works the moment it starts. To reach it from elsewhere you have two honest options.

The simplest is tunnel mode: a greffer connects outbound to the manager's tunnel and serves its apps without opening a single inbound port, which is the answer for a box behind NAT or CGNAT with no public IP. Paperless-ngx is a plain HTTP web app, so it rides the tunnel cleanly. If you would rather expose the greffer directly, port forwarding plus dynamic DNS still works. Either way you reach the archive over HTTPS.

Storage and memory

This is the tradeoff most walkthroughs skip. Paperless-ngx keeps both your original files and the OCR output, so storage grows with every document you feed it. A few thousand scanned pages is modest, but years of high-resolution scans add up. Put it on a greffer with room to grow, and watch the disk.

OCR is also the heaviest thing it does. Crunching the text out of a scan is CPU and memory work, and it leans on a small stack of helpers (a database, a task queue, a cache) that run alongside the app. It is comfortable on a modest box, but it is not as featherweight as a single-process app. A 1 GB greffer will feel tight under a bulk import.

OCR runs in the background
When you drop in a batch of documents, OCR processes them on a queue rather than instantly. A large import can take a while to fully index. That is normal: the document lands first and becomes searchable once its OCR pass finishes.

Back it up first

For an archive, backups are the section that matters most. The whole value of going paperless is that the digital copy becomes the copy of record, often after you have shredded the paper. That makes the greffer the single place those documents live. Greffon handles TLS and routing today, and native one-click backups are coming in M2. Until then, bring your own backup tool (restic or borgbackup are the usual choices), back up the Paperless-ngx data and database on a schedule, and keep a copy off the greffer.

A backup you have not restored is not a backup
Before you trust the archive enough to recycle the originals, test a restore once. Confirm the documents and their OCR text come back intact. Five minutes now is cheaper than discovering a broken backup the day you need a tax record.

Keep it always-on

You will reach for a document at odd hours from whatever device is in hand, so the archive should be up when you are. Run it on an always-on greffer, a small VPS, a mini-PC, or a free Oracle Cloud box, rather than a laptop that sleeps at night. The Oracle walkthrough is a good place to get a greffer running before you graft the archive onto it.

FAQ

What does the OCR actually do?
It reads the text inside scanned images and PDFs and makes it searchable. After OCR runs, you can find a document by a phrase printed inside it, not just by its filename or tags.
Can I email documents straight into it?
Yes. The catalog entry exposes outbound SMTP for password resets and notifications. Pulling documents in from a mailbox is separate: you set that up inside Paperless-ngx itself, adding a mail account and rules over IMAP once it is up.
How much disk and memory does it need?
Disk grows with your documents, since it keeps both the originals and the OCR output, so size the greffer for years of growth. Memory is dominated by OCR and its helper services; a modest box is fine, but a 1 GB greffer will feel tight during a bulk import.
Is it really private if I self-host it?
Yes, in the sense that matters: the documents, OCR text, and tags all live on your greffer rather than a rented account. The parts you own are HTTPS (Greffon handles this), a strong admin password, and tested backups. Get those right and your archive stays yours.
GL
Greffon Labs
We build Greffon, the simplest way to turn any machine into a server you own.

Ready to turn on your greffer?

Install in minutes. Deploy each app in seconds. Your server, your apps.

Read the docs