WhatsApp bridge images arrive as RoomMessageImage events with an empty body field, so the previous .jpg/.jpeg extension check silently rejected all of them. Accept all RoomMessageImage events and fall back to "image.jpg" as filename when body is empty. File content is still validated via magic bytes before upload. |
||
|---|---|---|
| .env.example | ||
| .gitignore | ||
| ingest.py | ||
| matrix-paperless-ingest.service | ||
| pyproject.toml | ||
| README.md | ||
| requirements.txt | ||
matrix-paperless-ingest
Monitors a Matrix room for PDF and JPEG files and uploads them to Paperless-ngx. Designed for rooms bridged from WhatsApp via mautrix-whatsapp.
- Processes the full room history on startup (skips files already in Paperless)
- Listens for new files indefinitely
- Retries failed uploads with exponential backoff
- State is tracked in a local SQLite database
Requirements
- Python 3.11+
libolm+python-olm— must be installed via the system package manager becausepython-olm's build system is incompatible with modern CMake
Arch Linux:
sudo pacman -S libolm python-olm
Ubuntu:
sudo apt install libolm3 python3-olm
Setup
1. Clone and install dependencies
git clone <repo>
cd matrix-paperless-ingest
python3 -m venv .venv --system-site-packages
.venv/bin/pip install -r requirements.txt
2. Create a Matrix bot account
Create a new Matrix account for the bot on your homeserver (e.g. via Element), then invite it to the room you want to monitor and accept the invite.
3. Generate a Matrix access token
Log in with the bot account to obtain an access token and device ID:
curl -XPOST 'https://jeena.net/_matrix/client/v3/login' \
-H 'Content-Type: application/json' \
-d '{
"type": "m.login.password",
"user": "@yourbot:jeena.net",
"password": "yourpassword"
}'
Copy access_token and device_id from the response. You can then delete the
password from your notes — it is not needed again.
4. Find your Matrix room ID
In Element: open the room → Settings → Advanced → Internal room ID.
It looks like !abc123:jeena.net.
5. Find your Paperless inbox tag ID
In Paperless-ngx, go to Tags and note the ID of your inbox tag, or look it up via the API:
curl -H 'Authorization: Token YOUR_TOKEN' https://paperless.jeena.net/api/tags/
6. Configure
cp .env.example .env
$EDITOR .env
Fill in all values:
MATRIX_HOMESERVER=https://jeena.net
MATRIX_USER=@yourbot:jeena.net
MATRIX_ACCESS_TOKEN=syt_...
MATRIX_DEVICE_ID=ABCDEFGH
MATRIX_ROOM_ID=!abc123:jeena.net
PAPERLESS_URL=https://paperless.jeena.net
PAPERLESS_TOKEN=your_paperless_api_token
PAPERLESS_INBOX_TAG_ID=1
7. Test
.venv/bin/python ingest.py
Watch the logs. It will process all historical messages, then listen for new ones. Press Ctrl-C to stop.
Install as a systemd user service
# Enable lingering so the service starts at boot without requiring login
loginctl enable-linger
# Install the service
mkdir -p ~/.config/systemd/user
cp matrix-paperless-ingest.service ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable --now matrix-paperless-ingest
# Check logs
journalctl --user -u matrix-paperless-ingest -f
Updating dependencies
If you need to add or update a dependency, edit pyproject.toml and regenerate
the locked requirements.txt:
.venv/bin/pip install pip-tools
.venv/bin/pip-compile pyproject.toml
.venv/bin/pip install -r requirements.txt
Viewing retry queue
sqlite3 state.db "SELECT filename, status, retry_count, datetime(next_retry, 'unixepoch') FROM processed_events WHERE status = 'failed';"
Moving to a new server
- Copy the project directory to
~/matrix-paperless-ingest(including.envandstate.db) - Install
libolm3andpython3-olmvia the system package manager - Run
python3 -m venv .venv --system-site-packages && .venv/bin/pip install -r requirements.txt - Install the systemd user service as above