ingest: Initial implementation
Bot that monitors a Matrix room for PDF and JPEG files and uploads them to Paperless-ngx. Supports E2E encrypted attachments via inline AES keys, historical catchup on startup, exponential backoff retries with a permanent give-up after max attempts, file format validation via magic bytes, Uptime Kuma heartbeat monitoring, and email alerts on errors via SMTP SSL.
This commit is contained in:
commit
d5a3528cde
7 changed files with 1844 additions and 0 deletions
135
README.md
Normal file
135
README.md
Normal file
|
|
@ -0,0 +1,135 @@
|
|||
# matrix-paperless-ingest
|
||||
|
||||
Monitors a Matrix room for PDF and JPEG files and uploads them to Paperless-ngx.
|
||||
Designed for rooms bridged from WhatsApp via mautrix-whatsapp.
|
||||
|
||||
- Processes the full room history on startup (skips files already in Paperless)
|
||||
- Listens for new files indefinitely
|
||||
- Retries failed uploads with exponential backoff
|
||||
- State is tracked in a local SQLite database
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.11+
|
||||
- [uv](https://docs.astral.sh/uv/) (`curl -LsSf https://astral.sh/uv/install.sh | sh`)
|
||||
- `libolm` + `python-olm` — must be installed via the system package manager
|
||||
because `python-olm`'s build system is incompatible with modern CMake
|
||||
|
||||
**Arch Linux:**
|
||||
```bash
|
||||
sudo pacman -S libolm python-olm
|
||||
```
|
||||
|
||||
**Ubuntu:**
|
||||
```bash
|
||||
sudo apt install libolm3 python3-olm
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Clone and install dependencies
|
||||
|
||||
```bash
|
||||
git clone <repo>
|
||||
cd matrix-paperless-ingest
|
||||
uv venv --system-site-packages
|
||||
uv sync --no-install-package python-olm
|
||||
```
|
||||
|
||||
### 2. Create a Matrix bot account
|
||||
|
||||
Create a new Matrix account for the bot on your homeserver (e.g. via Element),
|
||||
then invite it to the room you want to monitor and accept the invite.
|
||||
|
||||
### 3. Generate a Matrix access token
|
||||
|
||||
Log in with the bot account to obtain an access token and device ID:
|
||||
|
||||
```bash
|
||||
curl -XPOST 'https://jeena.net/_matrix/client/v3/login' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"type": "m.login.password",
|
||||
"user": "@yourbot:jeena.net",
|
||||
"password": "yourpassword"
|
||||
}'
|
||||
```
|
||||
|
||||
Copy `access_token` and `device_id` from the response. You can then delete the
|
||||
password from your notes — it is not needed again.
|
||||
|
||||
### 4. Find your Matrix room ID
|
||||
|
||||
In Element: open the room → Settings → Advanced → Internal room ID.
|
||||
It looks like `!abc123:jeena.net`.
|
||||
|
||||
### 5. Find your Paperless inbox tag ID
|
||||
|
||||
In Paperless-ngx, go to Tags and note the ID of your inbox tag, or look it up
|
||||
via the API:
|
||||
|
||||
```bash
|
||||
curl -H 'Authorization: Token YOUR_TOKEN' https://paperless.jeena.net/api/tags/
|
||||
```
|
||||
|
||||
### 6. Configure
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
$EDITOR .env
|
||||
```
|
||||
|
||||
Fill in all values:
|
||||
|
||||
```
|
||||
MATRIX_HOMESERVER=https://jeena.net
|
||||
MATRIX_USER=@yourbot:jeena.net
|
||||
MATRIX_ACCESS_TOKEN=syt_...
|
||||
MATRIX_DEVICE_ID=ABCDEFGH
|
||||
MATRIX_ROOM_ID=!abc123:jeena.net
|
||||
|
||||
PAPERLESS_URL=https://paperless.jeena.net
|
||||
PAPERLESS_TOKEN=your_paperless_api_token
|
||||
PAPERLESS_INBOX_TAG_ID=1
|
||||
```
|
||||
|
||||
### 7. Test
|
||||
|
||||
```bash
|
||||
uv run --no-sync python ingest.py
|
||||
```
|
||||
|
||||
Watch the logs. It will process all historical messages, then listen for new ones.
|
||||
Press Ctrl-C to stop.
|
||||
|
||||
## Install as a systemd service
|
||||
|
||||
```bash
|
||||
# Create a dedicated user
|
||||
sudo useradd -r -s /bin/false matrix-paperless-ingest
|
||||
|
||||
# Copy the project
|
||||
sudo cp -r . /opt/matrix-paperless-ingest
|
||||
sudo chown -R matrix-paperless-ingest:matrix-paperless-ingest /opt/matrix-paperless-ingest
|
||||
|
||||
# Install and start the service
|
||||
sudo cp paperless-ingest.service /etc/systemd/system/
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now paperless-ingest
|
||||
|
||||
# Check logs
|
||||
sudo journalctl -u paperless-ingest -f
|
||||
```
|
||||
|
||||
## Viewing retry queue
|
||||
|
||||
```bash
|
||||
sqlite3 state.db "SELECT filename, status, retry_count, datetime(next_retry, 'unixepoch') FROM processed_events WHERE status = 'failed';"
|
||||
```
|
||||
|
||||
## Moving to a new server
|
||||
|
||||
1. Copy the project directory (including `.env` and `state.db`)
|
||||
2. Install `uv`, `libolm3`, and `python3-olm` on the new server
|
||||
3. Run `uv venv --system-site-packages && uv sync --no-install-package python-olm`
|
||||
4. Install the systemd service as above
|
||||
Loading…
Add table
Add a link
Reference in a new issue