Commit graph

6 commits

Author SHA1 Message Date
9663232d84 ingest: Use timestamp-based filenames for WhatsApp files
WhatsApp files arrive with empty or non-descriptive body fields. Rather
than falling back to generic names like "image.jpg" or "document.pdf",
generate names from the event timestamp:

  whatsapp_YYYY-MM-DD_HH-MM-SS.jpg
  whatsapp_YYYY-MM-DD_HH-MM-SS.pdf

If the body contains text (e.g. a caption), it is prepended:

  Test - whatsapp_2026-03-11_23-35-13.pdf

Files whose body already ends in the correct extension are used as-is.
2026-03-12 00:08:26 +00:00
fa4662b5f3 ingest: Determine PDF vs JPEG from event type, not filename
WhatsApp bridge files may have arbitrary body text (e.g. "Test") that
does not end in .pdf, causing the filename-based magic byte check to
apply JPEG validation to PDF files and reject them.

Pass is_pdf through extract_event_fields and process_event based on
the Matrix event type (RoomMessageFile → PDF, RoomMessageImage → JPEG,
BadEvent → inferred from msgtype), so validation and content-type are
always correct regardless of the filename.
2026-03-12 00:03:18 +00:00
f49ea1dbc5 ingest: Assign uploaded documents to a configurable Paperless owner
The post_document endpoint does not support setting ownership on upload,
so after a successful upload the document is PATCHed to set the owner.

Add optional PAPERLESS_OWNER_ID env var. When set, every newly uploaded
document is assigned to that Paperless user ID via PATCH /api/documents/{id}/.
2026-03-11 23:55:00 +00:00
0aa044eead ingest: Accept RoomMessageFile events regardless of body content
WhatsApp bridge files (e.g. PDFs) may arrive with an empty body field,
causing the previous .pdf extension check to silently skip them. Accept
all RoomMessageFile events and fall back to "document.pdf" as filename.
File content is still validated via magic bytes before upload.
2026-03-11 23:35:23 +00:00
eec2d076e4 ingest: Accept RoomMessageImage events regardless of body content
WhatsApp bridge images arrive as RoomMessageImage events with an empty
body field, so the previous .jpg/.jpeg extension check silently rejected
all of them. Accept all RoomMessageImage events and fall back to
"image.jpg" as filename when body is empty. File content is still
validated via magic bytes before upload.
2026-03-11 23:32:15 +00:00
d5a3528cde ingest: Initial implementation
Bot that monitors a Matrix room for PDF and JPEG files and uploads
them to Paperless-ngx. Supports E2E encrypted attachments via inline
AES keys, historical catchup on startup, exponential backoff retries
with a permanent give-up after max attempts, file format validation
via magic bytes, Uptime Kuma heartbeat monitoring, and email alerts
on errors via SMTP SSL.
2026-03-11 13:45:28 +00:00