matrix-paperless-ingest/README.md
Jeena d5a3528cde ingest: Initial implementation
Bot that monitors a Matrix room for PDF and JPEG files and uploads
them to Paperless-ngx. Supports E2E encrypted attachments via inline
AES keys, historical catchup on startup, exponential backoff retries
with a permanent give-up after max attempts, file format validation
via magic bytes, Uptime Kuma heartbeat monitoring, and email alerts
on errors via SMTP SSL.
2026-03-11 13:45:28 +00:00

135 lines
3.4 KiB
Markdown

# matrix-paperless-ingest
Monitors a Matrix room for PDF and JPEG files and uploads them to Paperless-ngx.
Designed for rooms bridged from WhatsApp via mautrix-whatsapp.
- Processes the full room history on startup (skips files already in Paperless)
- Listens for new files indefinitely
- Retries failed uploads with exponential backoff
- State is tracked in a local SQLite database
## Requirements
- Python 3.11+
- [uv](https://docs.astral.sh/uv/) (`curl -LsSf https://astral.sh/uv/install.sh | sh`)
- `libolm` + `python-olm` — must be installed via the system package manager
because `python-olm`'s build system is incompatible with modern CMake
**Arch Linux:**
```bash
sudo pacman -S libolm python-olm
```
**Ubuntu:**
```bash
sudo apt install libolm3 python3-olm
```
## Setup
### 1. Clone and install dependencies
```bash
git clone <repo>
cd matrix-paperless-ingest
uv venv --system-site-packages
uv sync --no-install-package python-olm
```
### 2. Create a Matrix bot account
Create a new Matrix account for the bot on your homeserver (e.g. via Element),
then invite it to the room you want to monitor and accept the invite.
### 3. Generate a Matrix access token
Log in with the bot account to obtain an access token and device ID:
```bash
curl -XPOST 'https://jeena.net/_matrix/client/v3/login' \
-H 'Content-Type: application/json' \
-d '{
"type": "m.login.password",
"user": "@yourbot:jeena.net",
"password": "yourpassword"
}'
```
Copy `access_token` and `device_id` from the response. You can then delete the
password from your notes — it is not needed again.
### 4. Find your Matrix room ID
In Element: open the room → Settings → Advanced → Internal room ID.
It looks like `!abc123:jeena.net`.
### 5. Find your Paperless inbox tag ID
In Paperless-ngx, go to Tags and note the ID of your inbox tag, or look it up
via the API:
```bash
curl -H 'Authorization: Token YOUR_TOKEN' https://paperless.jeena.net/api/tags/
```
### 6. Configure
```bash
cp .env.example .env
$EDITOR .env
```
Fill in all values:
```
MATRIX_HOMESERVER=https://jeena.net
MATRIX_USER=@yourbot:jeena.net
MATRIX_ACCESS_TOKEN=syt_...
MATRIX_DEVICE_ID=ABCDEFGH
MATRIX_ROOM_ID=!abc123:jeena.net
PAPERLESS_URL=https://paperless.jeena.net
PAPERLESS_TOKEN=your_paperless_api_token
PAPERLESS_INBOX_TAG_ID=1
```
### 7. Test
```bash
uv run --no-sync python ingest.py
```
Watch the logs. It will process all historical messages, then listen for new ones.
Press Ctrl-C to stop.
## Install as a systemd service
```bash
# Create a dedicated user
sudo useradd -r -s /bin/false matrix-paperless-ingest
# Copy the project
sudo cp -r . /opt/matrix-paperless-ingest
sudo chown -R matrix-paperless-ingest:matrix-paperless-ingest /opt/matrix-paperless-ingest
# Install and start the service
sudo cp paperless-ingest.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now paperless-ingest
# Check logs
sudo journalctl -u paperless-ingest -f
```
## Viewing retry queue
```bash
sqlite3 state.db "SELECT filename, status, retry_count, datetime(next_retry, 'unixepoch') FROM processed_events WHERE status = 'failed';"
```
## Moving to a new server
1. Copy the project directory (including `.env` and `state.db`)
2. Install `uv`, `libolm3`, and `python3-olm` on the new server
3. Run `uv venv --system-site-packages && uv sync --no-install-package python-olm`
4. Install the systemd service as above