From 26eb13ab2fb25b03a05add24acb11ab44c21cfa0 Mon Sep 17 00:00:00 2001 From: MajorLinux Date: Fri, 5 Jun 2026 14:04:22 -0400 Subject: [PATCH] troubleshooting: document majormail client-connectivity incident (2026-06-05) - New page: Dovecot IMAP vsz_limit OOM from a bloated/corrupt index.log (152M index on an empty folder killed IMAP children with error 83). - fail2ban IMAP self-ban: add permanent ignoreip-whitelist fix + dynamic-IP caveat. - firewalld mail ports: add 'submission/587 never added' variant + correct Fedora service name; note Ansible now manages the full mail-service set. - Index + SUMMARY updated with the new page. --- 05-troubleshooting/index.md | 1 + ...ovecot-imap-oom-vsz-limit-bloated-index.md | 105 ++++++++++++++++++ .../fail2ban-imap-self-ban-mail-client.md | 17 ++- .../networking/firewalld-mail-ports-reset.md | 19 +++- SUMMARY.md | 1 + 5 files changed, 141 insertions(+), 2 deletions(-) create mode 100644 05-troubleshooting/networking/dovecot-imap-oom-vsz-limit-bloated-index.md diff --git a/05-troubleshooting/index.md b/05-troubleshooting/index.md index 682ba4d..1388a88 100644 --- a/05-troubleshooting/index.md +++ b/05-troubleshooting/index.md @@ -14,6 +14,7 @@ Practical fixes for common Linux, networking, and application problems. - [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md) - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md) - [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md) +- [Dovecot IMAP Clients Fail to Sync: vsz_limit OOM from a Bloated Index Log](networking/dovecot-imap-oom-vsz-limit-bloated-index.md) - [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md) - [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](networking/tailscale-status-json-hostname-localhost-ios.md) - [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](networking/rsync-tailscale-teardown-stall.md) diff --git a/05-troubleshooting/networking/dovecot-imap-oom-vsz-limit-bloated-index.md b/05-troubleshooting/networking/dovecot-imap-oom-vsz-limit-bloated-index.md new file mode 100644 index 0000000..d58d576 --- /dev/null +++ b/05-troubleshooting/networking/dovecot-imap-oom-vsz-limit-bloated-index.md @@ -0,0 +1,105 @@ +--- +title: "Dovecot IMAP Clients Fail to Sync: vsz_limit OOM from a Bloated Index Log" +domain: troubleshooting +category: networking +tags: [dovecot, imap, oom, vsz_limit, index, maildir, fedora, mail] +status: published +created: 2026-06-05 +updated: 2026-06-05 +--- +# Dovecot IMAP Clients Fail to Sync: vsz_limit OOM from a Bloated Index Log + +All IMAP clients fail to connect or hang while syncing a particular folder, even though the box has plenty of free RAM and disk. The cause is a corrupt/bloated per-folder `dovecot.index.log` that overflows Dovecot's **per-process** virtual-memory cap (`default_vsz_limit`, 256 MB by default) when it is `mmap`ed — so the IMAP child is killed on every sync attempt. + +> First seen on **majormail** (Fedora 44, Dovecot 2.4.4), 2026-06-05. An empty `.Later` folder had a 152 MB `dovecot.index.log`. + +## Symptoms + +- Multiple/all IMAP clients can't connect, or connect but never finish syncing. +- Often only **one folder** is the trigger — the client hangs the moment it opens/syncs that folder. +- The server is otherwise healthy: Postfix delivering, Dovecot `active`, ports listening, TLS valid. +- `free -h` shows the host has plenty of RAM available — this is **not** a host-level OOM. + +## Log Signature + +`journalctl -u dovecot` shows, per affected user/folder: + +``` +imap(user@dom): Fatal: block_alloc(8388608): Out of memory +imap(user@dom): Fatal: master: service(imap): child NNN returned error 83 + (Out of memory (service imap { vsz_limit=256 MB }, you may need to increase it) ...) +imap(user@dom): Error: Mailbox X: mmap(size=158769660) failed ...: Cannot allocate memory +imap(user@dom): Error: Mailbox X: Failed to map transaction log .../dovecot.index.log + at sync_offset=N after locking: Beginning of the log isn't available +``` + +The two tells: **`error 83` naming `vsz_limit`** (Dovecot literally suggests raising it), and an **`mmap(size=…)` value that is huge relative to the folder's real contents**. + +## Why It Happens + +Each Maildir folder has its own `dovecot.index.log` transaction log. If it grows or corrupts to tens/hundreds of MB (here: 152 MB on a folder with **zero** messages), Dovecot tries to `mmap` the whole thing into the IMAP worker. That worker runs under `default_vsz_limit` (compiled default **256 MB**). The mapping blows the cap, the kernel refuses the allocation, and the child dies with `error 83`. Because every client re-syncs that folder on connect, it fails for **all** of them at once. + +Key point: the limit is **per-process virtual size**, not host memory. A box with 2.5 GB free RAM still hits it. + +## Diagnosis + +```bash +# 1. The smoking gun — OOM / error 83 mentioning vsz_limit +journalctl -u dovecot --since "-3h" | grep -iE "out of memory|error 83|vsz_limit" + +# 2. Confirm it is NOT a host OOM (expect plenty free) +free -h ; df -h /var/vmail + +# 3. Current per-process cap (256 M = compiled default, no explicit setting) +doveconf default_vsz_limit + +# 4. Find the bloated index — size wildly out of proportion to message count +du -sh /var/vmail///. +ls -lh /var/vmail///./dovecot.index* +ls -1 /var/vmail///./{cur,new} | wc -l # real message count +``` + +## Fix + +Two parts: raise the cap, and repair the bloated index. + +```bash +# (1) Raise default_vsz_limit. Flat Fedora dovecot.conf has no !include conf.d/*, +# so add it at top-level scope (after `protocols = ...`): +# default_vsz_limit = 1G +doveconf -n >/dev/null && echo CONFIG_OK # validate +systemctl restart dovecot # required to apply the new vsz +doveconf default_vsz_limit # -> 1G + +# (2a) Rebuild the index from the real messages +doveadm force-resync -u + +# (2b) If force-resync leaves a stale multi-MB index.log AND the folder has +# 0 message files, it is safe to delete the index files and let Dovecot +# regenerate them clean (152 M -> 24 K in the original case): +L=/var/vmail///. +rm -f $L/dovecot.index $L/dovecot.index.log $L/dovecot.index.cache $L/dovecot.index.backup +doveadm mailbox status -u "messages vsize" # regenerates +``` + +Verify: `journalctl -u dovecot --since "-2m" | grep -ic "out of memory"` returns `0`, and the folder reads without error. + +> **Only delete index files when the folder's `cur/` and `new/` are empty** (or you are certain the messages are intact). The index is rebuildable from the message files; deleting indexes never deletes mail, but verify the count first. + +## Codified + +majormail's role sets this permanently so the cap survives a config rebuild: +`roles/majormail/templates/dovecot.conf.j2` → `default_vsz_limit = 1G` (MajorAnsible commit `a69ac5d`). + +## Key Notes + +- **`error 83` = vsz, not host RAM.** Don't go chasing free memory — read the parenthetical in the error; Dovecot names the exact setting. +- **A huge index on a tiny/empty folder is the corruption,** not the messages. Resync, and truncate the index if the folder is empty. +- **`tcpdump` may not be installed** on a minimal Fedora mail host — don't conclude "no packets arrived" from an empty capture without confirming the tool exists (`which tcpdump`). +- 1 G is a comfortable headroom for large mailboxes; raise further only if a genuinely large single mailbox needs it. + +## Related + +- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](fail2ban-imap-self-ban-mail-client.md) +- [firewalld: Mail Ports Wiped After Reload](firewalld-mail-ports-reset.md) +- [SELinux: Dovecot vmail Context](../selinux-dovecot-vmail-context.md) diff --git a/05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md b/05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md index 6acdd1b..a5fa470 100644 --- a/05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md +++ b/05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md @@ -5,7 +5,7 @@ category: networking tags: [fail2ban, imap, dovecot, email, self-ban] status: published created: 2026-04-02 -updated: 2026-04-02 +updated: 2026-06-05 --- # Mail Client Stops Receiving: Fail2ban IMAP Self-Ban @@ -79,6 +79,21 @@ fail2ban-client set dovecot-invalid unbanip Mail should resume immediately without restarting any services. +### Permanent fix — whitelist the trusted IP (`ignoreip`) + +Unbanning is temporary: if the client keeps failing auth (wrong password, stale token), the same IP gets re-banned within minutes. For a **known, trusted network** (e.g. your home egress IP) add it to Fail2ban's `ignoreip` so it can never be banned: + +```bash +# /etc/fail2ban/jail.local — [DEFAULT] section, applies to ALL jails +ignoreip = 127.0.0.1/8 ::1 100.64.0.0/10 +fail2ban-client reload +fail2ban-client get postfix-sasl ignoreip # confirm the IP is listed +``` + +On majormail this is codified via `fail2ban_ignoreip` in `host_vars/majormail-hetzner/vars.yml` (MajorAnsible commit `fa91fe3`). + +> ⚠️ `ignoreip` takes a **public egress** IP, which may be dynamic. If your ISP reassigns it, the whitelist points at a stale address and bans can return — recheck the egress IP first. Use a subnet only if you trust the whole range. + --- ## 🔁 Why This Happens diff --git a/05-troubleshooting/networking/firewalld-mail-ports-reset.md b/05-troubleshooting/networking/firewalld-mail-ports-reset.md index a6e9da2..e51a9fe 100644 --- a/05-troubleshooting/networking/firewalld-mail-ports-reset.md +++ b/05-troubleshooting/networking/firewalld-mail-ports-reset.md @@ -5,7 +5,7 @@ category: networking tags: [firewalld, mail, imap, fedora, ports] status: published created: 2026-04-02 -updated: 2026-04-02 +updated: 2026-06-05 --- # firewalld: Mail Ports Wiped After Reload (IMAP + Webmail Outage) @@ -66,8 +66,24 @@ Expected output: dhcpv6-client http https imap imaps mdns smtp smtp-submission smtps ssh ``` +## Variant: One port (587) fails while the rest work — service never added + +A subtler version of this: IMAP (993) and implicit-TLS submission (465) work fine, but **only STARTTLS submission on 587 fails** — clients on 587 get "no route to host." This is **not** a reload wipe; the `submission` service was simply never added during initial setup (the box's mail ports were opened by hand and one was missed). + +```bash +# Each mail service, individually — submission will be the odd one out +for s in smtp smtps submission imap imaps; do printf "%-12s " "$s"; firewall-cmd --query-service=$s; done + +# Fix (Fedora 44 / firewalld names the 587 service `submission`, NOT `smtp-submission`) +firewall-cmd --permanent --zone=public --add-service=submission +firewall-cmd --reload +``` + +> On majormail the full mail-service set is now managed declaratively in `roles/majormail/tasks/postfix.yml` (smtp/smtps/**submission**/imap/imaps), so a hand-edit can't leave 587 behind again (MajorAnsible commit `b75f14a`). Seen 2026-06-05. + ## Key Notes +- **Service name differs by distro/version:** the 587 service is `submission` on current Fedora firewalld; older/other docs may say `smtp-submission`. Verify with `firewall-cmd --get-services | tr ' ' '\n' | grep submission`. - **Always use `--permanent`** when adding services to firewalld on a server. Without it, the rule exists only until the next reload. - **Fail2ban + firewalld**: Fail2ban uses firewalld as its ban backend (`firewallcmd-rich-rules`). When Fail2ban restarts or crashes, it may trigger a `firewall-cmd --reload`, resetting any runtime-only rules. - **Verify after any firewall event**: After Fail2ban restarts, system reboots, or `firewall-cmd --reload`, always confirm mail services are still present with `firewall-cmd --list-services --zone=public`. @@ -77,3 +93,4 @@ dhcpv6-client http https imap imaps mdns smtp smtp-submission smtps ssh - [Linux Server Hardening Checklist](../../02-selfhosting/security/linux-server-hardening-checklist.md) - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](fail2ban-imap-self-ban-mail-client.md) +- [Dovecot IMAP Clients Fail to Sync: vsz_limit OOM from a Bloated Index Log](dovecot-imap-oom-vsz-limit-bloated-index.md) diff --git a/SUMMARY.md b/SUMMARY.md index 8730d92..cc6d664 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -80,6 +80,7 @@ updated: 2026-05-15T09:00 * [Postfix + SendGrid: TLS Handshake Failure (Port 465 vs 587)](05-troubleshooting/networking/postfix-sendgrid-tls-handshake-failure.md) * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md) * [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md) + * [Dovecot IMAP Clients Fail to Sync: vsz_limit OOM from a Bloated Index Log](05-troubleshooting/networking/dovecot-imap-oom-vsz-limit-bloated-index.md) * [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md) * [ssh.socket Unreachable After Reboot (Tailscale Race Condition)](05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md) * [Fail2ban & UFW Rule Bloat Cleanup](05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md)