troubleshooting: document majormail client-connectivity incident (2026-06-05)

- New page: Dovecot IMAP vsz_limit OOM from a bloated/corrupt index.log
  (152M index on an empty folder killed IMAP children with error 83).
- fail2ban IMAP self-ban: add permanent ignoreip-whitelist fix + dynamic-IP caveat.
- firewalld mail ports: add 'submission/587 never added' variant + correct
  Fedora service name; note Ansible now manages the full mail-service set.
- Index + SUMMARY updated with the new page.
This commit is contained in:
Marcus Summers 2026-06-05 14:04:22 -04:00
parent 5260548caa
commit 26eb13ab2f
5 changed files with 141 additions and 2 deletions

View file

@ -14,6 +14,7 @@ Practical fixes for common Linux, networking, and application problems.
- [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md) - [Apache Outage: Fail2ban Self-Ban + Missing iptables Rules](networking/fail2ban-self-ban-apache-outage.md)
- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md) - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](networking/fail2ban-imap-self-ban-mail-client.md)
- [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md) - [firewalld: Mail Ports Wiped After Reload](networking/firewalld-mail-ports-reset.md)
- [Dovecot IMAP Clients Fail to Sync: vsz_limit OOM from a Bloated Index Log](networking/dovecot-imap-oom-vsz-limit-bloated-index.md)
- [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md) - [Tailscale SSH: Unexpected Re-Authentication Prompt](networking/tailscale-ssh-reauth-prompt.md)
- [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](networking/tailscale-status-json-hostname-localhost-ios.md) - [iOS Tailscale Clients Report HostName="localhost" — Breaks /etc/hosts Generators](networking/tailscale-status-json-hostname-localhost-ios.md)
- [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](networking/rsync-tailscale-teardown-stall.md) - [rsync over Tailscale: Hung in TCP Teardown After Transfer Completes](networking/rsync-tailscale-teardown-stall.md)

View file

@ -0,0 +1,105 @@
---
title: "Dovecot IMAP Clients Fail to Sync: vsz_limit OOM from a Bloated Index Log"
domain: troubleshooting
category: networking
tags: [dovecot, imap, oom, vsz_limit, index, maildir, fedora, mail]
status: published
created: 2026-06-05
updated: 2026-06-05
---
# Dovecot IMAP Clients Fail to Sync: vsz_limit OOM from a Bloated Index Log
All IMAP clients fail to connect or hang while syncing a particular folder, even though the box has plenty of free RAM and disk. The cause is a corrupt/bloated per-folder `dovecot.index.log` that overflows Dovecot's **per-process** virtual-memory cap (`default_vsz_limit`, 256 MB by default) when it is `mmap`ed — so the IMAP child is killed on every sync attempt.
> First seen on **majormail** (Fedora 44, Dovecot 2.4.4), 2026-06-05. An empty `.Later` folder had a 152 MB `dovecot.index.log`.
## Symptoms
- Multiple/all IMAP clients can't connect, or connect but never finish syncing.
- Often only **one folder** is the trigger — the client hangs the moment it opens/syncs that folder.
- The server is otherwise healthy: Postfix delivering, Dovecot `active`, ports listening, TLS valid.
- `free -h` shows the host has plenty of RAM available — this is **not** a host-level OOM.
## Log Signature
`journalctl -u dovecot` shows, per affected user/folder:
```
imap(user@dom): Fatal: block_alloc(8388608): Out of memory
imap(user@dom): Fatal: master: service(imap): child NNN returned error 83
(Out of memory (service imap { vsz_limit=256 MB }, you may need to increase it) ...)
imap(user@dom): Error: Mailbox X: mmap(size=158769660) failed ...: Cannot allocate memory
imap(user@dom): Error: Mailbox X: Failed to map transaction log .../dovecot.index.log
at sync_offset=N after locking: Beginning of the log isn't available
```
The two tells: **`error 83` naming `vsz_limit`** (Dovecot literally suggests raising it), and an **`mmap(size=…)` value that is huge relative to the folder's real contents**.
## Why It Happens
Each Maildir folder has its own `dovecot.index.log` transaction log. If it grows or corrupts to tens/hundreds of MB (here: 152 MB on a folder with **zero** messages), Dovecot tries to `mmap` the whole thing into the IMAP worker. That worker runs under `default_vsz_limit` (compiled default **256 MB**). The mapping blows the cap, the kernel refuses the allocation, and the child dies with `error 83`. Because every client re-syncs that folder on connect, it fails for **all** of them at once.
Key point: the limit is **per-process virtual size**, not host memory. A box with 2.5 GB free RAM still hits it.
## Diagnosis
```bash
# 1. The smoking gun — OOM / error 83 mentioning vsz_limit
journalctl -u dovecot --since "-3h" | grep -iE "out of memory|error 83|vsz_limit"
# 2. Confirm it is NOT a host OOM (expect plenty free)
free -h ; df -h /var/vmail
# 3. Current per-process cap (256 M = compiled default, no explicit setting)
doveconf default_vsz_limit
# 4. Find the bloated index — size wildly out of proportion to message count
du -sh /var/vmail/<domain>/<user>/.<Folder>
ls -lh /var/vmail/<domain>/<user>/.<Folder>/dovecot.index*
ls -1 /var/vmail/<domain>/<user>/.<Folder>/{cur,new} | wc -l # real message count
```
## Fix
Two parts: raise the cap, and repair the bloated index.
```bash
# (1) Raise default_vsz_limit. Flat Fedora dovecot.conf has no !include conf.d/*,
# so add it at top-level scope (after `protocols = ...`):
# default_vsz_limit = 1G
doveconf -n >/dev/null && echo CONFIG_OK # validate
systemctl restart dovecot # required to apply the new vsz
doveconf default_vsz_limit # -> 1G
# (2a) Rebuild the index from the real messages
doveadm force-resync -u <user@dom> <Folder>
# (2b) If force-resync leaves a stale multi-MB index.log AND the folder has
# 0 message files, it is safe to delete the index files and let Dovecot
# regenerate them clean (152 M -> 24 K in the original case):
L=/var/vmail/<domain>/<user>/.<Folder>
rm -f $L/dovecot.index $L/dovecot.index.log $L/dovecot.index.cache $L/dovecot.index.backup
doveadm mailbox status -u <user@dom> "messages vsize" <Folder> # regenerates
```
Verify: `journalctl -u dovecot --since "-2m" | grep -ic "out of memory"` returns `0`, and the folder reads without error.
> **Only delete index files when the folder's `cur/` and `new/` are empty** (or you are certain the messages are intact). The index is rebuildable from the message files; deleting indexes never deletes mail, but verify the count first.
## Codified
majormail's role sets this permanently so the cap survives a config rebuild:
`roles/majormail/templates/dovecot.conf.j2``default_vsz_limit = 1G` (MajorAnsible commit `a69ac5d`).
## Key Notes
- **`error 83` = vsz, not host RAM.** Don't go chasing free memory — read the parenthetical in the error; Dovecot names the exact setting.
- **A huge index on a tiny/empty folder is the corruption,** not the messages. Resync, and truncate the index if the folder is empty.
- **`tcpdump` may not be installed** on a minimal Fedora mail host — don't conclude "no packets arrived" from an empty capture without confirming the tool exists (`which tcpdump`).
- 1 G is a comfortable headroom for large mailboxes; raise further only if a genuinely large single mailbox needs it.
## Related
- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](fail2ban-imap-self-ban-mail-client.md)
- [firewalld: Mail Ports Wiped After Reload](firewalld-mail-ports-reset.md)
- [SELinux: Dovecot vmail Context](../selinux-dovecot-vmail-context.md)

View file

@ -5,7 +5,7 @@ category: networking
tags: [fail2ban, imap, dovecot, email, self-ban] tags: [fail2ban, imap, dovecot, email, self-ban]
status: published status: published
created: 2026-04-02 created: 2026-04-02
updated: 2026-04-02 updated: 2026-06-05
--- ---
# Mail Client Stops Receiving: Fail2ban IMAP Self-Ban # Mail Client Stops Receiving: Fail2ban IMAP Self-Ban
@ -79,6 +79,21 @@ fail2ban-client set dovecot-invalid unbanip <IP>
Mail should resume immediately without restarting any services. Mail should resume immediately without restarting any services.
### Permanent fix — whitelist the trusted IP (`ignoreip`)
Unbanning is temporary: if the client keeps failing auth (wrong password, stale token), the same IP gets re-banned within minutes. For a **known, trusted network** (e.g. your home egress IP) add it to Fail2ban's `ignoreip` so it can never be banned:
```bash
# /etc/fail2ban/jail.local — [DEFAULT] section, applies to ALL jails
ignoreip = 127.0.0.1/8 ::1 100.64.0.0/10 <home_ip>
fail2ban-client reload
fail2ban-client get postfix-sasl ignoreip # confirm the IP is listed
```
On majormail this is codified via `fail2ban_ignoreip` in `host_vars/majormail-hetzner/vars.yml` (MajorAnsible commit `fa91fe3`).
> ⚠️ `ignoreip` takes a **public egress** IP, which may be dynamic. If your ISP reassigns it, the whitelist points at a stale address and bans can return — recheck the egress IP first. Use a subnet only if you trust the whole range.
--- ---
## 🔁 Why This Happens ## 🔁 Why This Happens

View file

@ -5,7 +5,7 @@ category: networking
tags: [firewalld, mail, imap, fedora, ports] tags: [firewalld, mail, imap, fedora, ports]
status: published status: published
created: 2026-04-02 created: 2026-04-02
updated: 2026-04-02 updated: 2026-06-05
--- ---
# firewalld: Mail Ports Wiped After Reload (IMAP + Webmail Outage) # firewalld: Mail Ports Wiped After Reload (IMAP + Webmail Outage)
@ -66,8 +66,24 @@ Expected output:
dhcpv6-client http https imap imaps mdns smtp smtp-submission smtps ssh dhcpv6-client http https imap imaps mdns smtp smtp-submission smtps ssh
``` ```
## Variant: One port (587) fails while the rest work — service never added
A subtler version of this: IMAP (993) and implicit-TLS submission (465) work fine, but **only STARTTLS submission on 587 fails** — clients on 587 get "no route to host." This is **not** a reload wipe; the `submission` service was simply never added during initial setup (the box's mail ports were opened by hand and one was missed).
```bash
# Each mail service, individually — submission will be the odd one out
for s in smtp smtps submission imap imaps; do printf "%-12s " "$s"; firewall-cmd --query-service=$s; done
# Fix (Fedora 44 / firewalld names the 587 service `submission`, NOT `smtp-submission`)
firewall-cmd --permanent --zone=public --add-service=submission
firewall-cmd --reload
```
> On majormail the full mail-service set is now managed declaratively in `roles/majormail/tasks/postfix.yml` (smtp/smtps/**submission**/imap/imaps), so a hand-edit can't leave 587 behind again (MajorAnsible commit `b75f14a`). Seen 2026-06-05.
## Key Notes ## Key Notes
- **Service name differs by distro/version:** the 587 service is `submission` on current Fedora firewalld; older/other docs may say `smtp-submission`. Verify with `firewall-cmd --get-services | tr ' ' '\n' | grep submission`.
- **Always use `--permanent`** when adding services to firewalld on a server. Without it, the rule exists only until the next reload. - **Always use `--permanent`** when adding services to firewalld on a server. Without it, the rule exists only until the next reload.
- **Fail2ban + firewalld**: Fail2ban uses firewalld as its ban backend (`firewallcmd-rich-rules`). When Fail2ban restarts or crashes, it may trigger a `firewall-cmd --reload`, resetting any runtime-only rules. - **Fail2ban + firewalld**: Fail2ban uses firewalld as its ban backend (`firewallcmd-rich-rules`). When Fail2ban restarts or crashes, it may trigger a `firewall-cmd --reload`, resetting any runtime-only rules.
- **Verify after any firewall event**: After Fail2ban restarts, system reboots, or `firewall-cmd --reload`, always confirm mail services are still present with `firewall-cmd --list-services --zone=public`. - **Verify after any firewall event**: After Fail2ban restarts, system reboots, or `firewall-cmd --reload`, always confirm mail services are still present with `firewall-cmd --list-services --zone=public`.
@ -77,3 +93,4 @@ dhcpv6-client http https imap imaps mdns smtp smtp-submission smtps ssh
- [Linux Server Hardening Checklist](../../02-selfhosting/security/linux-server-hardening-checklist.md) - [Linux Server Hardening Checklist](../../02-selfhosting/security/linux-server-hardening-checklist.md)
- [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](fail2ban-imap-self-ban-mail-client.md) - [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](fail2ban-imap-self-ban-mail-client.md)
- [Dovecot IMAP Clients Fail to Sync: vsz_limit OOM from a Bloated Index Log](dovecot-imap-oom-vsz-limit-bloated-index.md)

View file

@ -80,6 +80,7 @@ updated: 2026-05-15T09:00
* [Postfix + SendGrid: TLS Handshake Failure (Port 465 vs 587)](05-troubleshooting/networking/postfix-sendgrid-tls-handshake-failure.md) * [Postfix + SendGrid: TLS Handshake Failure (Port 465 vs 587)](05-troubleshooting/networking/postfix-sendgrid-tls-handshake-failure.md)
* [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md) * [Mail Client Stops Receiving: Fail2ban IMAP Self-Ban](05-troubleshooting/networking/fail2ban-imap-self-ban-mail-client.md)
* [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md) * [firewalld: Mail Ports Wiped After Reload](05-troubleshooting/networking/firewalld-mail-ports-reset.md)
* [Dovecot IMAP Clients Fail to Sync: vsz_limit OOM from a Bloated Index Log](05-troubleshooting/networking/dovecot-imap-oom-vsz-limit-bloated-index.md)
* [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md) * [Tailscale SSH: Unexpected Re-Authentication Prompt](05-troubleshooting/networking/tailscale-ssh-reauth-prompt.md)
* [ssh.socket Unreachable After Reboot (Tailscale Race Condition)](05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md) * [ssh.socket Unreachable After Reboot (Tailscale Race Condition)](05-troubleshooting/networking/ssh-socket-tailscale-race-condition.md)
* [Fail2ban & UFW Rule Bloat Cleanup](05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md) * [Fail2ban & UFW Rule Bloat Cleanup](05-troubleshooting/networking/fail2ban-ufw-rule-bloat-cleanup.md)