Setup — Paperless-ngx 1.8.0 documentation (2024)

Overview of Paperless-ngx

Compared to paperless, paperless-ngx works a little different under the hood and hasmore moving parts that work together. While this increases the complexity ofthe system, it also brings many benefits.

Paperless consists of the following components:

  • The webserver: This is pretty much the same as in paperless. It servesthe administration pages, the API, and the new frontend. This is the maintool you’ll be using to interact with paperless. You may start the webserverwith

    $ cd /path/to/paperless/src/$ gunicorn -c ../gunicorn.conf.py paperless.wsgi

    or by any other means such as Apache mod_wsgi.

  • The consumer: This is what watches your consumption folder for documents.However, the consumer itself does not really consume your documents.Now it notifies a task processor that a new file is ready for consumption.I suppose it should be named differently.This was also used to check your emails, but that’s now done elsewhere as well.

    Start the consumer with the management command document_consumer:

    $ cd /path/to/paperless/src/$ python3 manage.py document_consumer
  • The task processor: Paperless relies on Django Qfor doing most of the heavy lifting. This is a task queue that accepts tasks frommultiple sources and processes these in parallel. It also comes with a scheduler that executescertain commands periodically.

    This task processor is responsible for:

    • Consuming documents. When the consumer finds new documents, it notifies the task processor tostart a consumption task.

    • The task processor also performs the consumption of any documents you upload throughthe web interface.

    • Consuming emails. It periodically checks your configured accounts for new emails andnotifies the task processor to consume the attachment of an email.

    • Maintaining the search index and the automatic matching algorithm. These are things that paperlessneeds to do from time to time in order to operate properly.

    This allows paperless to process multiple documents from your consumption folder in parallel! Ona modern multi core system, this makes the consumption process with full OCR blazingly fast.

    The task processor comes with a built-in admin interface that you can use to check whenever any of thetasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specificfile, etc).

    You may start the task processor by executing:

    $ cd /path/to/paperless/src/$ python3 manage.py qcluster
  • A redis message broker: This is a really lightweight service that is responsiblefor getting the tasks from the webserver and the consumer to the task scheduler. These run in a differentprocess (maybe even on different machines!), and therefore, this is necessary.

  • Optional: A database server. Paperless supports both PostgreSQL and SQLite for storing its data.

Installation

You can go multiple routes to setup and run Paperless:

  • Use the easy install docker script

  • Pull the image from Docker Hub

  • Build the Docker image yourself

  • Install Paperless directly on your system manually (bare metal)

The Docker routes are quick & easy. These are the recommended routes. This configures all the stufffrom the above automatically so that it just works and uses sensible defaults for all configuration options.Here you find a cheat-sheet for docker beginners: CLI Basics

The bare metal route is complicated to setup but makes it easiershould you want to contribute some code back. You need to configure andrun the above mentioned components yourself.

Install Paperless from Docker Hub using the installation script

Paperless provides an interactive installation script. This script will ask youfor a couple configuration options, download and create the necessary configuration files, pull the docker image, start paperless and create your user account. This script essentiallyperforms all the steps described in Install Paperless from Docker Hub automatically.

  1. Make sure that docker and docker-compose are installed.

  2. Download and run the installation script:

    $ bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"

Install Paperless from Docker Hub

  1. Login with your user and create a folder in your home-directory mkdir -v ~/paperless-ngx to have a place for your configuration files and consumption directory.

  2. Go to the /docker/compose directory on the project pageand download one of the docker-compose.*.yml files, depending on which database backend youwant to use. Rename this file to docker-compose.yml.If you want to enable optional support for Office documents, download a file with -tika in the file name.Download the docker-compose.env file and the .env file as well and store themin the same directory.

    Hint

    For new installations, it is recommended to use PostgreSQL as the databasebackend.

  3. Install Docker and docker-compose.

    Caution

    If you want to use the included docker-compose.*.yml file, youneed to have at least Docker version 17.09.0 and docker-composeversion 1.17.0.To check do: docker-compose -v or docker -v

    See the Docker installation guide on how to install the currentversion of Docker for your operating system or Linux distribution ofchoice. To get the latest version of docker-compose, follow thedocker-compose installation guide if your package repository doesn’tinclude it.

  4. Modify docker-compose.yml to your preferences. You may want to change the pathto the consumption directory. Find the line that specifies whereto mount the consumption directory:

    - ./consume:/usr/src/paperless/consume

    Replace the part BEFORE the colon with a local directory of your choice:

    - /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume

    Don’t change the part after the colon or paperless wont find your documents.

    You may also need to change the default port that the webserver will usefrom the default (8000):

    ports: - 8000:8000

    Replace the part BEFORE the colon with a port of your choice:

    ports: - 8010:8000

    Don’t change the part after the colon or edit other lines that refer toport 8000. Modifying the part before the colon will map requests on anotherport to the webserver running on the default port.

    Rootless

    If you want to run Paperless as a rootless container, you will need to do thefollowing in your docker-compose.yml:

    • set the user running the container to map to the paperless user in thecontainer.This value (user_id below), should be the same id that USERMAP_UID andUSERMAP_GID are set to in the next step.See USERMAP_UID and USERMAP_GID here.

    Your entry for Paperless should contain something like:

    webserver: image: ghcr.io/paperless-ngx/paperless-ngx:latest user: <user_id>
  5. Modify docker-compose.env, following the comments in the file. Themost important change is to set USERMAP_UID and USERMAP_GIDto the uid and gid of your user on the host system. Use id -u andid -g to get these.

    This ensures thatboth the docker container and you on the host machine have write accessto the consumption directory. If your UID and GID on the host system is1000 (the default for the first normal user on most systems), it willwork out of the box without any modifications. id “username” to check.

    Note

    You can copy any setting from the file paperless.conf.example and paste it here.Have a look at Configuration to see what’s available.

    Note

    You can utilize Docker secrets for some configuration settings byappending _FILE to some configuration values. This is supported currentlyonly by:

    • PAPERLESS_DBUSER

    • PAPERLESS_DBPASS

    • PAPERLESS_SECRET_KEY

    • PAPERLESS_AUTO_LOGIN_USERNAME

    • PAPERLESS_ADMIN_USER

    • PAPERLESS_ADMIN_MAIL

    • PAPERLESS_ADMIN_PASSWORD

    Caution

    Some file systems such as NFS network shares don’t support file systemnotifications with inotify. When storing the consumption directoryon such a file system, paperless will not pick up new fileswith the default configuration. You will need to use PAPERLESS_CONSUMER_POLLING,which will disable inotify. See here.

  6. Run docker-compose pull, followed by docker-compose up -d.This will pull the image, create and start the necessary containers.

  7. To be able to login, you will need a super user. To create it, execute thefollowing command:

    $ docker-compose run --rm webserver createsuperuser

    This will prompt you to set a username, an optional e-mail address andfinally a password (at least 8 characters).

  8. The default docker-compose.yml exports the webserver on your local port8000. If you did not change this, you should now be able to visit yourPaperless instance at http://127.0.0.1:8000 or your servers IP-Address:8000.Use the login credentials you have created with the previous step.

Build the Docker image yourself

  1. Clone the entire repository of paperless:

    git clone https://github.com/paperless-ngx/paperless-ngx

    The master branch always reflects the latest stable version.

  2. Copy one of the docker/compose/docker-compose.*.yml to docker-compose.yml in the root folder,depending on which database backend you want to use. Copydocker-compose.env into the project root as well.

  3. In the docker-compose.yml file, find the line that instructs docker-compose to pull the paperless image from Docker Hub:

    webserver: image: ghcr.io/paperless-ngx/paperless-ngx:latest

    and replace it with a line that instructs docker-compose to build the image from the current working directory instead:

    webserver: build: .
  4. Follow steps 3 to 8 of Install Paperless from Docker Hub. When asked to rundocker-compose pull to pull the image, do

    $ docker-compose build

    instead to build the image.

Bare Metal Route

Paperless runs on linux only. The following procedure has been tested on a minimalinstallation of Debian/Buster, which is the current stable release at the time ofwriting. Windows is not and will never be supported.

  1. Install dependencies. Paperless requires the following packages.

    • python3 3.8, 3.9

    • python3-pip

    • python3-dev

    • fonts-liberation for generating thumbnails for plain text files

    • imagemagick >= 6 for PDF conversion

    • gnupg for handling encrypted documents

    • libpq-dev for PostgreSQL

    • libmagic-dev for mime type detection

    • mime-support for mime type detection

    • libzbar0 for barcode detection

    • poppler-utils for barcode detection

    Use this list for your preferred package management:

    python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev libmagic-dev mime-support libzbar0 poppler-utils

    These dependencies are required for OCRmyPDF, which is used for text recognition.

    • unpaper

    • ghostscript

    • icc-profiles-free

    • qpdf

    • liblept5

    • libxml2

    • pngquant (suggested for certain PDF image optimizations)

    • zlib1g

    • tesseract-ocr >= 4.0.0 for OCR

    • tesseract-ocr language packs (tesseract-ocr-eng, tesseract-ocr-deu, etc)

    Use this list for your preferred package management:

    unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr

    On Raspberry Pi, these libraries are required as well:

    • libatlas-base-dev

    • libxslt1-dev

    You will also need build-essential, python3-setuptools and python3-wheelfor installing some of the python dependencies.

  2. Install redis >= 5.0 and configure it to start automatically.

  3. Optional. Install postgresql and configure a database, user and password for paperless. If you do not wishto use PostgreSQL, SQLite is available as well.

    Note

    On bare-metal installations using SQLite, ensure theJSON1 extension is enabled. This isusually the case, but not always.

  4. Get the release archive from https://github.com/paperless-ngx/paperless-ngx/releases.If you clone the git repo as it is, you also have to compile the front end by yourself.Extract the archive to a place from where you wish to execute it, such as /opt/paperless.

  5. Configure paperless. See Configuration for details. Edit the included paperless.conf and adjust thesettings to your needs. Required settings for getting paperless running are:

    • PAPERLESS_REDIS should point to your redis server, such as redis://localhost:6379.

    • PAPERLESS_DBHOST should be the hostname on which your PostgreSQL server is running. Do not configure thisto use SQLite instead. Also configure port, database name, user and password as necessary.

    • PAPERLESS_CONSUMPTION_DIR should point to a folder which paperless should watch for documents. You mightwant to have this somewhere else. Likewise, PAPERLESS_DATA_DIR and PAPERLESS_MEDIA_ROOT define wherepaperless stores its data. If you like, you can point both to the same directory.

    • PAPERLESS_SECRET_KEY should be a random sequence of characters. It’s used for authentication. Failureto do so allows third parties to forge authentication credentials.

    • PAPERLESS_URL if you are behind a reverse proxy. This should point to your domain. Please seeConfiguration for more information.

    Many more adjustments can be made to paperless, especially the OCR part. The following options are recommendedfor everyone:

    • Set PAPERLESS_OCR_LANGUAGE to the language most of your documents are written in.

    • Set PAPERLESS_TIME_ZONE to your local time zone.

  6. Create a system user under which you wish to run paperless.

    adduser paperless --system --home /opt/paperless --group
  7. Ensure that these directories existand that the paperless user has write permissions to the following directories:

    • /opt/paperless/media

    • /opt/paperless/data

    • /opt/paperless/consume

    Adjust as necessary if you configured different folders.

  8. Install python requirements from the requirements.txt file.It is up to you if you wish to use a virtual environment or not. First you should update your pip, so it gets the actual packages.

    sudo -Hu paperless pip3 install --upgrade pip
    sudo -Hu paperless pip3 install -r requirements.txt

    This will install all python dependencies in the home directory ofthe new paperless user.

  9. Go to /opt/paperless/src, and execute the following commands:

    # This creates the database schema.sudo -Hu paperless python3 manage.py migrate# This creates your first paperless usersudo -Hu paperless python3 manage.py createsuperuser
  10. Optional: Test that paperless is working by executing

    # This collects static files from paperless and django.sudo -Hu paperless python3 manage.py runserver

    and pointing your browser to http://localhost:8000/.

    Warning

    This is a development server which should not be used inproduction. It is not audited for security and performanceis inferior to production ready web servers.

    Hint

    This will not start the consumer. Paperless does this in aseparate process.

  11. Setup systemd services to run paperless automatically. You mayuse the service definition files included in the scripts folderas a starting point.

    Paperless needs the webserver script to run the webserver, theconsumer script to watch the input folder, and the schedulerscript to run tasks such as email checking and document consumption.

    The socket script enables gunicorn to run on port 80 withoutroot privileges. For this you need to uncomment the Require=paperless-webserver.socketin the webserver script and configure gunicorn to listen on port 80 (see paperless/gunicorn.conf.py).

    You may need to adjust the path to the gunicorn executable. Thiswill be installed as part of the python dependencies, and is either locatedin the bin folder of your virtual environment, or in ~/.local/bin/ ifno virtual environment is used.

    These services rely on redis and optionally the database server, butdon’t need to be started in any particular order. The example filesdepend on redis being started. If you use a database server, you shouldadd additional dependencies.

    Caution

    The included scripts run a gunicorn standalone server,which is fine for running paperless. It does support SSL,however, the documentation of GUnicorn states that you shoulduse a proxy server in front of gunicorn instead.

    For instructions on how to use nginx for that,see the instructions below.

  12. Optional: Install a samba server and make the consumption folderavailable as a network share.

  13. Configure ImageMagick to allow processing of PDF documents. Most distributions havethis disabled by default, since PDF documents can contain malware. Ifyou don’t do this, paperless will fall back to ghostscript for certain stepssuch as thumbnail generation.

    Edit /etc/ImageMagick-6/policy.xml and adjust

    <policy domain="coder" rights="none" pattern="PDF" />

    to

    <policy domain="coder" rights="read|write" pattern="PDF" />
  14. Optional: Install the jbig2encencoder. This will reduce the size of generated PDF documents. You’ll most likely needto compile this by yourself, because this software has been patented until around 2017 andbinary packages are not available for most distributions.

Migrating to Paperless-ngx

Migration is possible both from Paperless-ng or directly from the ‘original’ Paperless.

Migrating from Paperless-ng

Paperless-ngx is meant to be a drop-in replacement for Paperless-ng and thus upgrading should betrivial for most users, especially when using docker. However, as with any major change, it isrecommended to take a full backup first. Once you are ready, simply change the docker image topoint to the new source. E.g. if using Docker Compose, edit docker-compose.yml and change:

image: jonaswinkler/paperless-ng:latest

to

image: ghcr.io/paperless-ngx/paperless-ngx:latest

and then run docker-compose up -d which will pull the new image recreate the container.That’s it!

Users who installed with the bare-metal route should also update their Git clone to point tohttps://github.com/paperless-ngx/paperless-ngx, e.g. using the commandgit remote set-url origin https://github.com/paperless-ngx/paperless-ngx and then pull thelastest version.

Migrating from Paperless

At its core, paperless-ngx is still paperless and fully compatible. However, somethings have changed under the hood, so you need to adapt your setup depending onhow you installed paperless.

This setup describes how to update an existing paperless Docker installation.The important things to keep in mind are as follows:

  • Read the changelog and take note of breaking changes.

  • You should decide if you want to stick with SQLite or want to migrate your databaseto PostgreSQL. See Moving data from SQLite to PostgreSQL for details on how to move your data fromSQLite to PostgreSQL. Both work fine with paperless. However, if you already have adatabase server running for other services, you might as well use it for paperless as well.

  • The task scheduler of paperless, which is used to execute periodic taskssuch as email checking and maintenance, requires a redis message brokerinstance. The docker-compose route takes care of that.

  • The layout of the folder structure for your documents and data remains thesame, so you can just plug your old docker volumes into paperless-ngx andexpect it to find everything where it should be.

Migration to paperless-ngx is then performed in a few simple steps:

  1. Stop paperless.

    $ cd /path/to/current/paperless$ docker-compose down
  2. Do a backup for two purposes: If something goes wrong, you still have yourdata. Second, if you don’t like paperless-ngx, you can switch back topaperless.

  3. Download the latest release of paperless-ngx. You can either go with thedocker-compose files from hereor clone the repository to build the image yourself (see above).You can either replace your current paperless folder or put paperless-ngxin a different location.

    Caution

    Paperless-ngx includes a .env file. This will set theproject name for docker compose to paperless, which will also define the nameof the volumes by paperless-ngx. However, if you experience that paperless-ngxis not using your old paperless volumes, verify the names of your volumes with

    $ docker volume ls | grep _data

    and adjust the project name in the .env file so that it matches the nameof the volumes before the _data part.

  4. Download the docker-compose.sqlite.yml file to docker-compose.yml.If you want to switch to PostgreSQL, do that after you migrated your existingSQLite database.

  5. Adjust docker-compose.yml and docker-compose.env to your needs.See Install Paperless from Docker Hub for details on which edits are advised.

  6. Update paperless.

  7. In order to find your existing documents with the new search feature, you needto invoke a one-time operation that will create the search index:

    $ docker-compose run --rm webserver document_index reindex

    This will migrate your database and create the search index. After that,paperless will take care of maintaining the index by itself.

  8. Start paperless-ngx.

    $ docker-compose up -d

    This will run paperless in the background and automatically start it on system boot.

  9. Paperless installed a permanent redirect to admin/ in your browser. Thisredirect is still in place and prevents access to the new UI. Clear yourbrowsing cache in order to fix this.

  10. Optionally, follow the instructions below to migrate your existing data to PostgreSQL.

Moving data from SQLite to PostgreSQL

Moving your data from SQLite to PostgreSQL is done via executing a series of djangomanagement commands as below.

Caution

Make sure that your SQLite database is migrated to the latest version.Starting paperless will make sure that this is the case. If your try toload data from an old database schema in SQLite into a newer databaseschema in PostgreSQL, you will run into trouble.

Warning

On some database fields, PostgreSQL enforces predefined limits on maximumlength, whereas SQLite does not. The fields in question are the title of documents(128 characters), names of document types, tags and correspondents (128 characters),and filenames (1024 characters). If you have data in these fields that surpasses theselimits, migration to PostgreSQL is not possible and will fail with an error.

  1. Stop paperless, if it is running.

  2. Tell paperless to use PostgreSQL:

    1. With docker, copy the provided docker-compose.postgres.yml file todocker-compose.yml. Remember to adjust the consumption directory,if necessary.

    2. Without docker, configure the database in your paperless.conf file.See Configuration for details.

  3. Open a shell and initialize the database:

    1. With docker, run the following command to open a shell within the paperlesscontainer:

      $ cd /path/to/paperless$ docker-compose run --rm webserver /bin/bash

      This will launch the container and initialize the PostgreSQL database.

    2. Without docker, remember to activate any virtual environment, switch tothe src directory and create the database schema:

      $ cd /path/to/paperless/src$ python3 manage.py migrate

      This will not copy any data yet.

  4. Dump your data from SQLite:

    $ python3 manage.py dumpdata --database=sqlite --exclude=contenttypes --exclude=auth.Permission > data.json
  5. Load your data into PostgreSQL:

    $ python3 manage.py loaddata data.json
  6. If operating inside Docker, you may exit the shell now.

    $ exit
  7. Start paperless.

Moving back to Paperless

Lets say you migrated to Paperless-ngx and used it for a while, but decided thatyou don’t like it and want to move back (If you do, send me a mail about whatpart you didn’t like!), you can totally do that with a few simple steps.

Paperless-ngx modified the database schema slightly, however, these changes canbe reverted while keeping your current data, so that your current data willbe compatible with original Paperless.

Execute this:

$ cd /path/to/paperless$ docker-compose run --rm webserver migrate documents 0023

Or without docker:

$ cd /path/to/paperless/src$ python3 manage.py migrate documents 0023

After that, you need to clear your cookies (Paperless-ngx comes with updateddependencies that do cookie-processing differently) and probably your cacheas well.

Considerations for less powerful devices

Paperless runs on Raspberry Pi. However, some things are rather slow on the Pi andconfiguring some options in paperless can help improve performance immensely:

  • Stick with SQLite to save some resources.

  • Consider setting PAPERLESS_OCR_PAGES to 1, so that paperless will only OCRthe first page of your documents. In most cases, this page contains enoughinformation to be able to find it.

  • PAPERLESS_TASK_WORKERS and PAPERLESS_THREADS_PER_WORKER are configuredto use all cores. The Raspberry Pi models 3 and up have 4 cores, meaning thatpaperless will use 2 workers and 2 threads per worker. This may result insluggish response times during consumption, so you might want to lower thesesettings (example: 2 workers and 1 thread to always have some computing powerleft for other tasks).

  • Keep PAPERLESS_OCR_MODE at its default value skip and consider OCR’ingyour documents before feeding them into paperless. Some scanners are able todo this! You might want to even specify skip_noarchive to skip archivefile generation for already ocr’ed documents entirely.

  • If you want to perform OCR on the device, consider using PAPERLESS_OCR_CLEAN=none.This will speed up OCR times and use less memory at the expense of slightly worseOCR results.

  • If using docker, consider setting PAPERLESS_WEBSERVER_WORKERS to1. This will save some memory.

For details, refer to Configuration.

Note

Updating the automatic matching algorithmtakes quite a bit of time. However, the update mechanism checks if yourdata has changed before doing the heavy lifting. If you experience thealgorithm taking too much cpu time, consider changing the schedule in theadmin interface to daily. You can also manually invoke the taskby changing the date and time of the next run to today/now.

The actual matching of the algorithm is fast and works on Raspberry Pi aswell as on any other device.

Using nginx as a reverse proxy

If you want to expose paperless to the internet, you should hide it behind areverse proxy with SSL enabled.

In addition to the usual configuration for SSL,the following configuration is required for paperless to operate:

http { # Adjust as required. This is the maximum size for file uploads. # The default value 1M might be a little too small. client_max_body_size 10M; server { location / { # Adjust host and port as required. proxy_pass http://localhost:8000/; # These configuration options are required for WebSockets to work. proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Host $server_name; } }}

The PAPERLESS_URL configuration variable is also required when using a reverse proxy. Please refer to the docs.

Also read this, towards the end of the section.

Setup — Paperless-ngx 1.8.0 documentation (2024)
Top Articles
Latest Posts
Article information

Author: Virgilio Hermann JD

Last Updated:

Views: 5972

Rating: 4 / 5 (61 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Virgilio Hermann JD

Birthday: 1997-12-21

Address: 6946 Schoen Cove, Sipesshire, MO 55944

Phone: +3763365785260

Job: Accounting Engineer

Hobby: Web surfing, Rafting, Dowsing, Stand-up comedy, Ghost hunting, Swimming, Amateur radio

Introduction: My name is Virgilio Hermann JD, I am a fine, gifted, beautiful, encouraging, kind, talented, zealous person who loves writing and wants to share my knowledge and understanding with you.