Merge cowork/majorair/wiki-updates-apr25 — 3 new articles + nav updates

This commit is contained in:
Marcus Summers 2026-04-29 22:48:34 -04:00
commit 5d7ce294b6
10 changed files with 1038 additions and 329 deletions

View file

@ -1,203 +0,0 @@
---
title: WSL2 Fedora 43 Training Environment Rebuild
domain: linux
category: distro-specific
tags:
- wsl2
- fedora
- unsloth
- pytorch
- cuda
- majorrig
- majortwin
status: published
created: 2026-03-16
updated: 2026-04-29T22:45
---
# WSL2 Fedora 43 Training Environment Rebuild
How to rebuild the MajorTwin training environment from scratch on MajorRig after a WSL2 loss. Covers Fedora 43 install, Python 3.11 via pyenv, PyTorch with CUDA, Unsloth, and llama.cpp for GGUF conversion.
## The Short Answer
```bash
# 1. Install Fedora 43 and move to D:
wsl --install -d FedoraLinux-43 --no-launch
wsl --export FedoraLinux-43 D:\WSL\fedora43.tar
wsl --unregister FedoraLinux-43
wsl --import FedoraLinux-43 D:\WSL\Fedora43 D:\WSL\fedora43.tar
# 2. Set default user
echo -e "[boot]\nsystemd=true\n[user]\ndefault=majorlinux" | sudo tee /etc/wsl.conf
useradd -m -G wheel majorlinux && passwd majorlinux
echo "%wheel ALL=(ALL) ALL" | sudo tee /etc/sudoers.d/wheel
# 3. Install Python 3.11 via pyenv, PyTorch, Unsloth
# See full steps below
```
## Step 1 — System Packages
```bash
sudo dnf update -y
sudo dnf install -y git curl wget tmux screen htop rsync unzip \
python3 python3-pip python3-devel gcc gcc-c++ make cmake \
ninja-build pkg-config openssl-devel libffi-devel \
gawk patch readline-devel sqlite-devel
```
## Step 2 — Python 3.11 via pyenv
Fedora 43 ships Python 3.13. Unsloth requires 3.11. Use pyenv:
```bash
curl https://pyenv.run | bash
# Add to ~/.bashrc
export PYENV_ROOT="$HOME/.pyenv"
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init - bash)"
source ~/.bashrc
pyenv install 3.11.9
pyenv global 3.11.9
```
The tkinter warning during install is harmless — it's not needed for training.
## Step 3 — Training Virtualenv + PyTorch
```bash
mkdir -p ~/majortwin/{staging,datasets,outputs,scripts}
python -m venv ~/majortwin/venv
source ~/majortwin/venv/bin/activate
pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
# Verify GPU
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
```
Expected output: `True NVIDIA GeForce RTX 3080 Ti`
## Step 4 — Unsloth + Training Stack
```bash
source ~/majortwin/venv/bin/activate
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install transformers datasets accelerate peft trl bitsandbytes \
sentencepiece protobuf scipy einops
# Pin transformers for unsloth-zoo compatibility
pip install "transformers<=5.2.0"
# Verify
python -c "import unsloth; print('Unsloth OK')"
```
> [!warning] Never run `pip install -r requirements.txt` from inside llama.cpp while the training venv is active. It installs CPU-only PyTorch and downgrades transformers, breaking the CUDA setup.
## Step 5 — llama.cpp (CPU-only for GGUF conversion)
CUDA 12.8 is incompatible with Fedora 43's glibc for compiling llama.cpp (math function conflicts in `/usr/include/bits/mathcalls.h`). Build CPU-only — it's sufficient for GGUF conversion, which doesn't need GPU:
```bash
# Install GCC 14 (CUDA 12.8 doesn't support GCC 15 which Fedora 43 ships)
sudo dnf install -y gcc14 gcc14-c++
cd ~/majortwin
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build \
-DGGML_CUDA=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=/usr/bin/gcc-14 \
-DCMAKE_CXX_COMPILER=/usr/bin/g++-14
cmake --build build --config Release -j$(nproc) 2>&1 | tee /tmp/llama_build.log &
tail -f /tmp/llama_build.log
```
Verify:
```bash
ls ~/majortwin/llama.cpp/build/bin/llama-quantize && echo "OK"
ls ~/majortwin/llama.cpp/build/bin/llama-cli && echo "OK"
```
## Step 6 — Shell Environment
```bash
cat >> ~/.bashrc << 'EOF'
# MajorInfrastructure Paths
export VAULT="/mnt/c/Users/majli/Documents/MajorVault"
export MAJORANSIBLE="/mnt/d/MajorAnsible"
export MAJORTWIN_D="/mnt/d/MajorTwin"
export MAJORTWIN_WSL="$HOME/majortwin"
export LLAMA_CPP="$HOME/majortwin/llama.cpp"
# Venv
alias mtwin='source $MAJORTWIN_WSL/venv/bin/activate && cd $MAJORTWIN_WSL'
alias vault='cd $VAULT'
alias ll='ls -lah --color=auto'
# SSH Fleet Aliases
alias majorhome='ssh majorlinux@100.120.209.106'
alias dca='ssh root@100.104.11.146'
alias majortoot='ssh root@100.110.197.17'
alias majorlinuxvm='ssh root@100.87.200.5'
alias majordiscord='ssh root@100.122.240.83'
alias majorlab='ssh root@100.86.14.126'
alias majormail='ssh root@100.84.165.52'
alias teelia='ssh root@100.120.32.69'
alias tttpod='ssh root@100.84.42.102'
alias majorrig='ssh majorlinux@100.98.47.29' # port 2222 retired 2026-03-25, fleet uses port 22
# DNF5
alias update='sudo dnf upgrade --refresh'
alias install='sudo dnf install'
alias clean='sudo dnf clean all'
# MajorTwin helpers
stage_dataset() {
cp "$VAULT/20-Projects/MajorTwin/03-Datasets/$1" "$MAJORTWIN_WSL/datasets/"
echo "Staged: $1"
}
export_gguf() {
cp "$MAJORTWIN_WSL/outputs/$1" "$MAJORTWIN_D/models/"
echo "Exported: $1 → $MAJORTWIN_D/models/"
}
EOF
source ~/.bashrc
```
## Key Rules
- **Always activate venv before pip installs:** `source ~/majortwin/venv/bin/activate`
- **Never train from /mnt/c or /mnt/d** — stage files in `~/majortwin/staging/` first
- **Never put ML artifacts inside MajorVault** — models, venvs, artifacts go on D: drive
- **Max viable training model:** 7B at QLoRA 4-bit (RTX 3080 Ti, 12GB VRAM)
- **Current base model:** Qwen2.5-7B-Instruct (ChatML format — stop token: `<|im_end|>` only)
- **Transformers must be pinned:** `pip install "transformers<=5.2.0"` for unsloth-zoo compatibility
## D: Drive Layout
```
D:\MajorTwin\
models\ ← finished GGUFs
datasets\ ← dataset archives
artifacts\ ← training run artifacts
training-runs\ ← logs, checkpoints
D:\WSL\
Fedora43\ ← WSL2 VHDX
Backups\ ← weekly WSL2 backup tars
```
## See Also
- [WSL2 Instance Migration](wsl2-instance-migration-fedora43.md)
- [WSL2 Backup via PowerShell](wsl2-backup-powershell.md)

View file

@ -0,0 +1,180 @@
---
title: Pi-hole DoH / DoT Bypass Defense
domain: selfhosting
category: dns-networking
tags:
- pihole
- dns
- doh
- dot
- privacy
- adblock
- bypass
- hagezi
status: published
created: 2026-04-22
updated: 2026-04-23T09:09
---
# Pi-hole DoH / DoT Bypass Defense
## The Problem
A LAN-wide ad/tracker/threat-intel blocklist at the DNS layer is only effective if clients actually use the DNS server doing the blocking. Three classes of client routinely bypass LAN DNS:
1. **Modern browsers with built-in DNS-over-HTTPS (DoH).** Chrome, Firefox, Safari, Edge all ship with DoH either on by default or a one-toggle opt-in. When enabled, the browser sends DNS queries over HTTPS directly to Cloudflare / Google / Quad9 / NextDNS, bypassing the OS resolver and every DNS-layer blocklist on the network.
2. **IoT / smart devices with hardcoded public DNS.** Chromecast, Google Home, Nest, many Samsung TVs, some Amazon devices include hardcoded `8.8.8.8` or `1.1.1.1`. They ignore DHCP-pushed DNS entirely.
3. **Applications using DNS-over-TLS (DoT).** Rarer than DoH but used by some privacy-focused apps and occasional malware C2 — hits Cloudflare / Quad9 on port 853 instead of 53.
Without defense, a compromised IoT or a telemetry-hungry app can exfil DNS traffic freely even though Pi-hole is "running."
## What This Guide Covers
- How Pi-hole's `blocking.mode = NULL` structurally prevents the most common fallback-resolver bypass.
- Why the `HaGeZi doh-vpn-proxy-bypass` adlist is the single highest-leverage defense against browser DoH.
- What still leaks and how to assess whether the router-level firewall is worth the effort for your threat model.
## Pi-hole's block mode matters
Pi-hole v6's default `dns.blocking.mode` is `NULL`. A blocked domain resolves to `0.0.0.0` — a **valid** DNS answer, not an NXDOMAIN. Verify on your host:
```bash
dig +short <blocked-domain> @<pihole-ip>
# → 0.0.0.0
```
Why this matters: multi-resolver OSes (macOS, iOS, Windows) only consult fallback resolvers on a **failure** (timeout, SERVFAIL). A valid NULL answer short-circuits that — the client accepts the 0.0.0.0, tries to connect, fails at TCP, and never retries DNS. Even if `/etc/resolv.conf` has `1.1.1.1` as a secondary, it's never queried.
If you've set blocking mode to `NXDOMAIN`, clients **will** fall back — and every telemetry domain on every adlist becomes bypassable through whatever secondary resolver the OS is configured with. **Leave it at NULL.**
Check:
```bash
pihole-FTL --config dns.blocking.mode
# → NULL
```
## HaGeZi DoH/VPN/Proxy Bypass — the biggest single win
HaGeZi maintains `adblock/doh-vpn-proxy-bypass.txt` — ~18,000 DoH resolver hostnames, including the bootstrap domains used by every major browser:
| Browser | DoH bootstrap |
|---|---|
| Firefox | `mozilla.cloudflare-dns.com` |
| Chrome | `chrome.cloudflare-dns.com`, `dns.google` |
| Safari (iCloud Private Relay bootstrap) | Apple-specific, *not* in this list — Apple uses QUIC |
| Edge | `dns.google`, other public resolvers |
When the bootstrap hostname can't be resolved (Pi-hole answers `0.0.0.0`), the browser's DoH setup fails and it falls back to the system resolver — which is Pi-hole. This flips the default behavior from "browsers can bypass" to "browsers respect LAN DNS."
### Adding it
```bash
NOW=$(date +%s)
sudo pihole-FTL sqlite3 /etc/pihole/gravity.db <<SQL
INSERT INTO adlist (address, enabled, comment, date_added, date_modified, type)
VALUES
('https://cdn.jsdelivr.net/gh/hagezi/dns-blocklists@latest/adblock/doh-vpn-proxy-bypass.txt',
1, 'HaGeZi DoH/VPN/Proxy bypass', $NOW, $NOW, 0);
SQL
sudo pihole -g
```
See [[pihole-v6-adlist-management]] for general adlist CRUD via SQL.
### Verification
After `pihole -g` completes, probe major DoH hostnames:
```bash
for h in mozilla.cloudflare-dns.com dns.google chrome.cloudflare-dns.com dns.quad9.net; do
echo -n "$h → "
dig +short $h @<pihole-ip>
done
# All should return 0.0.0.0
```
### Known false positives
The list is aggressive. Expect occasional pushback:
- **`claude.ai`** — gets caught by the broader `pro.txt` or TIF list in some combinations; DoH bypass list itself is usually clean. If you use Claude on LAN and see blocks, allowlist `claude.ai` — note that `api.anthropic.com` is typically **not** on any of these lists, so Claude Code / API traffic is unaffected.
- **Zscaler ZPA / Zscaler Internet Access****this will break work-from-home auth if you don't allowlist it.** The DoH/VPN bypass list classifies Zscaler's ZTNA backbone as a "VPN proxy" and blocks it. Symptom: users see a blank / failed page at `https://samlsp.private.zscaler.com/...` during SAML sign-in, and the Zscaler Client Connector fails to authenticate.
The critical piece is that Zscaler's SAML SP hostname is a **CNAME chain**:
```
samlsp.private.zscaler.com. CNAME samlsp.prod.zpath.net.
samlsp.prod.zpath.net. CNAME zapp2saml.gslb.prod.zpath.net.
zapp2saml.gslb.prod.zpath.net. CNAME snico2br.gslb.prod.zpath.net.
snico2br.gslb.prod.zpath.net. A <IP>
```
Pi-hole walks the CNAME chain and blocks on the target (status 9 = `blocked_gravity_cname`), so **an exact-hostname allowlist for `samlsp.private.zscaler.com` will NOT fix it** — you have to allowlist the CNAME target domain. The GSLB subdomains rotate, so use a regex allowlist for the whole `zpath.net` zone:
```sql
INSERT OR IGNORE INTO domainlist (type, domain, enabled, comment)
VALUES (2, '(\.|^)zpath\.net$', 1, 'Zscaler ZPA CNAME backbone — do not block');
```
Don't forget `pihole reloaddns` after. Expect to also need regex allowlists for `zscaler.net`, `zscalertwo.net`, `zscalerthree.net`, `zscalerone.net`, `zscloud.net` if any are gravity-blocked — HaGeZi's lists may cover different combinations over time.
- **iCloud Private Relay** — if you want iCPR to keep working on your Apple devices, allowlist its mask ingresses. The DoH/VPN bypass list blocks `mask.icloud.com`, `mask-h2.icloud.com`, and `mask-api.icloud.com` (Apple's iCPR entrance points). Without them, iCPR silently falls back to standard DNS — which means **Pi-hole is covering the bypass whether you want it to or not**. For hosts where iCPR is desired:
```sql
INSERT OR IGNORE INTO domainlist (type, domain, enabled, comment)
VALUES (2, '(\.|^)mask[a-z0-9-]*\.icloud\.com$', 1, 'iCloud Private Relay ingress');
```
Keep this surgical — do **not** allowlist all of `icloud.com`. Other subdomains (`metrics.icloud.com`, `init.gc.apple.com` family) are Apple telemetry that the adlists correctly block. After allowlist + `pihole reloaddns`, toggle Wi-Fi or flip iCPR off/on in Settings on each Apple device to force DNS re-resolution — iOS/macOS caches DNS aggressively and won't pick up the change otherwise.
- **`dot.txt` companion adlist** — as of April 2026, HaGeZi's separate `adblock/dot.txt` URL returns 403. DoT resolver hostnames are folded into `doh-vpn-proxy-bypass.txt` already.
## What still leaks
The DoH adlist does not defend against:
1. **IoT devices with hardcoded public DNS.** Chromecast et al. send UDP/53 queries directly to `8.8.8.8`. Pi-hole never sees them.
2. **Apps that hardcode a DoH or DoT endpoint by IP.** If an app has `1.1.1.1` baked in rather than `cloudflare-dns.com`, the hostname block can't help.
3. **Apple iCloud Private Relay.** Uses QUIC (UDP/443) to Cloudflare with oblivious DNS. Safari + Apple services route around Pi-hole entirely. Acceptable tradeoff for most users; mostly a privacy win even if it weakens your LAN-side visibility.
Estimated residual gap after the DoH adlist: **~3%** of tracker/telemetry traffic, mostly from hardcoded-DNS IoT.
## Router-level enforcement (optional, higher effort)
To close the remaining 3%, block outbound `udp/53`, `tcp/53`, `tcp/853` at the router for everything except the Pi-hole's IP. Two rules:
```bash
# Transparently redirect all LAN :53 traffic to Pi-hole, except Pi-hole itself
iptables -t nat -I PREROUTING -i br0 -p udp --dport 53 ! -s <pihole-ip> -j DNAT --to <pihole-ip>:53
iptables -t nat -I PREROUTING -i br0 -p tcp --dport 53 ! -s <pihole-ip> -j DNAT --to <pihole-ip>:53
# Reject DoT so apps fall back to classic DNS (→ Pi-hole via above)
iptables -I FORWARD -i br0 -p tcp --dport 853 ! -s <pihole-ip> -j REJECT --reject-with tcp-reset
iptables -I FORWARD -i br0 -p udp --dport 853 ! -s <pihole-ip> -j REJECT
```
Design choices:
- **REDIRECT (DNAT), not DROP, for port 53** — devices with hardcoded `8.8.8.8` receive transparent answers from Pi-hole instead of silently breaking.
- **REJECT, not DROP, for port 853** — DoT clients see a fast error and fall back to classic DNS immediately instead of timing out.
- **Exempt the Pi-hole** — it needs to reach upstream resolvers (`1.1.1.1` etc.) unimpeded.
- **`-i br0` only** — LAN ingress, not WAN.
### Persistence depends on router firmware
- **Asuswrt-Merlin:** add rules to `/jffs/scripts/firewall-start` — runs on every firewall init.
- **Stock AsusWRT 388+:** `/jffs/scripts/firewall-start` is **not** honored. Rules added live persist until the next `restart_firewall` event (reboot, WAN flap, GUI config change). Workarounds: flash to Merlin, use the GUI's "LAN ▸ Network Services Filter" (DROP-only, less flexible), or schedule a cron re-apply in `/jffs/configs/crontab`.
- **OpenWrt / pfSense / OPNsense:** their respective firewall config persistence works out of the box.
## Summary — minimum viable DoH defense
1. Pi-hole block mode = `NULL` (default — verify).
2. Install HaGeZi `doh-vpn-proxy-bypass` adlist.
3. Run `pihole -g`.
4. Verify major DoH bootstraps return `0.0.0.0`.
5. Optional: add router iptables rules to close the IoT/hardcoded-DNS gap.
Steps 14 give you ~97% effectiveness with zero client-side changes and no broken devices. Step 5 is polish for threat models where LAN-wide DNS visibility matters.
## Related
- [[MajorPi]] — local Pi-hole deployment
- [[pihole-v6-adlist-management]] — adlist CRUD via SQL (v5 CLI commands don't work in v6)
- [[Network Overview]] — fleet network context

View file

@ -0,0 +1,180 @@
---
title: "Pi-hole v6 Adlist Management via SQL"
domain: selfhosting
category: dns-networking
tags: [pihole, pihole-v6, adlist, dns, sql, sqlite, gravity, runbook]
status: published
created: 2026-04-22
updated: 2026-04-22
---
# Pi-hole v6 Adlist Management via SQL
## The Problem
Pi-hole v6 removed the `pihole -a adlist` CLI subcommands. The old muscle-memory commands (`pihole -a adlist add <url>`, `pihole -a adlist remove <url>`, `pihole -a adlist list`) all return errors or are no-ops on v6. The Web UI works, but for scripting, Ansible, or SSH-only hosts, you need a CLI-level method.
The answer is to hit the `gravity.db` SQLite database directly. It's simple, idempotent, and scriptable.
## Prerequisites
- Pi-hole v6 installed (`pihole -v` → Core version v6.x).
- `sudo` access — `gravity.db` is owned `pihole:pihole` mode 660.
- `sqlite3` binary is **not** required. Pi-hole ships `pihole-FTL` with a built-in `sqlite3` subcommand that you can use instead:
```bash
sudo pihole-FTL sqlite3 /etc/pihole/gravity.db "SELECT 1;"
```
Use this on any host where you don't want to install the standalone `sqlite3` package (e.g., Raspberry Pi OS minimal).
## Listing adlists
```bash
sudo pihole-FTL sqlite3 -column -header /etc/pihole/gravity.db \
"SELECT id, enabled, address, comment FROM adlist ORDER BY id;"
```
| Column | Meaning |
|---|---|
| `id` | Internal ID (autoincrement, **does not match `queries.list_id`** — see note below) |
| `enabled` | `1` = active, `0` = disabled (still in DB but not compiled into gravity) |
| `address` | The URL fetched by `pihole -g` |
| `comment` | Human-readable label shown in the Web UI |
## Adding an adlist
```bash
NOW=$(date +%s)
sudo pihole-FTL sqlite3 /etc/pihole/gravity.db <<SQL
INSERT INTO adlist (address, enabled, comment, date_added, date_modified, type)
VALUES
('https://example.com/blocklist.txt', 1, 'My Blocklist', $NOW, $NOW, 0);
SQL
```
`type = 0` means a regular blocklist (as opposed to an allowlist). `date_added` and `date_modified` are unix seconds.
**Always follow with `pihole -g`** to fetch the list and rebuild the gravity blob:
```bash
sudo pihole -g
```
This takes 30s3min depending on adlist size. Expect output like:
```
[✓] Parsed 0 exact domains and 18121 ABP-style domains (blocking, ignored 0 non-domain entries)
[i] Number of gravity domains: 2669352 (2409506 unique domains)
[✓] Building gravity tree
```
## Removing an adlist
By address:
```bash
sudo pihole-FTL sqlite3 /etc/pihole/gravity.db \
"DELETE FROM adlist WHERE address = 'https://example.com/blocklist.txt';"
sudo pihole -g
```
By id:
```bash
sudo pihole-FTL sqlite3 /etc/pihole/gravity.db \
"DELETE FROM adlist WHERE id = 9;"
sudo pihole -g
```
## Enabling / disabling without removing
```bash
sudo pihole-FTL sqlite3 /etc/pihole/gravity.db \
"UPDATE adlist SET enabled=0 WHERE id=9;"
sudo pihole -g
```
This is the right move when you want to toggle an adlist on/off without losing the URL/comment (e.g., a situational blocklist like Disney+ streaming).
## Verifying a new adlist is actually blocking
After `pihole -g` finishes, probe a known domain from the list directly against Pi-hole:
```bash
dig +short <known-blocked-domain> @192.168.50.238
# Expected: 0.0.0.0 (when dns.blocking.mode = NULL)
```
If you get a real answer, either the adlist fetch failed (check `pihole -g` output for 403/404), or the entry isn't in the list you added.
## Common gotchas
### `pihole -g` fails with "Forbidden"
The adlist URL returned HTTP 403 or 404. HaGeZi and OISD in particular reorganize file paths occasionally. Remove the broken entry and either substitute the new URL or drop it:
```bash
sudo pihole-FTL sqlite3 /etc/pihole/gravity.db \
"DELETE FROM adlist WHERE address = '<404-url>';"
```
### `queries.list_id` doesn't match `adlist.id`
In Pi-hole v6's FTL query log, the `list_id` column on `queries`/`query_storage` does **not** reliably point back at the `adlist.id`. For `status=4` (regex), it references a `domainlist.id`. For `status=1` (gravity), it can reference a `gravity` table rowid, not the adlist. Do not assume a bidirectional mapping — treat `list_id` as an opaque debug hint.
### Stale regex after editing `domainlist`
FTL compiles regex rules into memory at process start and on explicit reload. Editing `domainlist` via SQL without calling `pihole reloaddns` afterwards leaves the old compiled regex active. Symptom: `queries.status=4` blocks firing for domains whose `list_id` points at deleted entries.
Fix: always follow `domainlist` edits with:
```bash
sudo pihole reloaddns
```
Verify via the FTL log:
```bash
sudo grep "Compiled .* regex" /var/log/pihole/FTL.log | tail
# → "Compiled N allow and M deny regex for X clients"
```
The numbers should match the count of `enabled=1` entries in `domainlist` by `type`.
### No standalone `sqlite3` on the host
Use `pihole-FTL sqlite3` — ships with every Pi-hole install, behaves identically to the standalone binary for the commands shown here. Do not install the `sqlite3` package just to manage Pi-hole.
## Useful introspection queries
**Total gravity domains by adlist:**
```sql
SELECT a.id, a.comment, COUNT(g.domain) AS domains
FROM gravity g
JOIN adlist a ON a.id = g.adlist_id
GROUP BY a.id
ORDER BY domains DESC;
```
**Active regex rules (what FTL SHOULD be running):**
```sql
SELECT * FROM vw_regex_denylist;
SELECT * FROM vw_regex_allowlist;
```
**Blocked queries in the last hour by adlist source:**
```sql
SELECT
CASE status
WHEN 1 THEN 'gravity'
WHEN 4 THEN 'regex_deny'
WHEN 5 THEN 'exact_deny'
WHEN 9 THEN 'gravity_cname'
WHEN 10 THEN 'regex_cname'
WHEN 11 THEN 'exact_cname'
END AS source,
COUNT(*) AS n
FROM queries
WHERE timestamp > strftime('%s','now','-1 hour')
AND status IN (1,4,5,9,10,11)
GROUP BY status;
```
## Related
- [[MajorPi]] — the host running this
- [[pihole-doh-dot-bypass-defense]] — DoH/DoT bypass defense (reasons to add specific adlists)

View file

@ -0,0 +1,143 @@
---
title: Mastodon DB Maintenance — Statuses, Accounts, and VACUUM
domain: selfhosting
category: services
tags:
- mastodon
- database
- postgresql
- maintenance
- tootctl
- majortoot
status: published
created: 2026-04-22
updated: 2026-04-22
---
# Mastodon DB Maintenance
Mastodon aggressively caches remote content — avatars, statuses, follow graphs — from every instance it federates with. On an active instance, this causes substantial PostgreSQL bloat over time. Without periodic maintenance, the database grows unbounded even if S3 handles media.
## The Problem — majortoot at ~3.5 years
| Table | Size | Rows |
|-------|------|------|
| `statuses` | 3.5 GB | 3.6M rows (3.6M remote cached, 37K local) |
| `accounts` | 499 MB | 214,770 remote cached, 18 local |
| `preview_cards` | 837 MB | remote link previews |
| `statuses_tags` | 506 MB | cascades from statuses |
| `conversations` | 436 MB | cascades from statuses |
| `mentions` | 305 MB | cascades from statuses |
The `statuses remove` and `accounts cull` commands address most of this.
---
## Maintenance Tasks
### 1. Cache Clear
Clears in-memory Rails caches. Fast (<5 seconds), safe to run anytime.
```bash
tootctl cache clear
```
### 2. Statuses Remove
Removes cached remote statuses (and their cascaded rows in `statuses_tags`, `mentions`, `conversations`, `status_stats`) older than N days. Does **not** touch local statuses.
```bash
tootctl statuses remove --days=90
```
> [!warning] This is the slowest step
> On a 3.6M-row statuses table, the extraction phase alone can take 2040 minutes. PostgreSQL will be under heavy load. Run during off-peak hours.
**What gets removed:** Remote statuses not pinned, not boosted by local users, and not replied to by local users, older than the threshold.
### 3. Accounts Cull
Contacts each remote account's home instance via WebFinger to check if it still exists. Removes accounts that return 404 or `Gone`. Catches dead instances, deleted accounts, and renamed handles.
```bash
tootctl accounts cull
```
> [!note] Network-bound
> Cull makes HTTP requests to remote servers. It's slower on flaky network conditions and will skip accounts it can't reach (to avoid false deletions).
### 4. VACUUM ANALYZE
After large deletions, PostgreSQL does not immediately return space to the OS — dead rows sit in pages marked for reuse. `VACUUM ANALYZE` reclaims that space and updates query planner statistics.
```bash
sudo -u postgres psql mastodon_production -c "VACUUM ANALYZE;"
```
For recovering actual disk space (not just marking pages free), `VACUUM FULL` is more aggressive but locks tables. Stick with plain `VACUUM ANALYZE` for routine maintenance.
---
## The Maintenance Script
**Location:** `/home/mastodon/maintenance.sh`
**Cron:** `0 2 * * 0` — Sunday 2 AM (runs before media prune at 3 AM)
**Log:** `/var/log/mastodon/maintenance.log`
**Notifications:** Email to `marcus@majorshouse.com` at each step via Postfix → MajorMail
The script runs all four tasks in sequence and sends a notification email:
- **On start** — lists steps and current DB size
- **After cache clear** — confirms done, warns statuses remove will take a while
- **After statuses remove** — summary output + current DB size
- **After accounts cull** — accounts removed + current DB size
- **On completion** — full timing breakdown and final DB size
### Running Manually
```bash
ssh root@100.110.197.17
bash /home/mastodon/maintenance.sh
```
### Monitoring Progress
```bash
ssh root@100.110.197.17 "tail -f /var/log/mastodon/maintenance.log"
```
### tootctl Wrapper (one-off commands)
The `mastodon` user's rbenv is not on PATH in a login shell. Always use the wrapper:
```bash
su - mastodon -c 'export PATH="/home/mastodon/.rbenv/bin:/home/mastodon/.rbenv/shims:$PATH" && eval "$(rbenv init -)" && cd /home/mastodon/live && RAILS_ENV=production bin/tootctl <command>'
```
---
## Full Cron Schedule on majortoot
| Time | Job | Script |
|------|-----|--------|
| Sun 2 AM | DB maintenance | `/home/mastodon/maintenance.sh` |
| Sun 3 AM | Media prune (S3) | `/home/mastodon/media-prune.sh` |
| Daily 8 AM | Fail2Ban digest | `/usr/local/bin/fail2ban-digest.sh` |
| Monthly | Fail2Ban nginx-botsearch prune | `/usr/local/bin/f2b-prune.sh` |
| Daily | Certbot renewal | `service nginx stop; certbot renew; service nginx start` |
---
## First Run Results (2026-04-22)
First maintenance run ever on majortoot after ~3.5 years of operation. Results pending (job running in background at time of writing). Check `/var/log/mastodon/maintenance.log` for final numbers.
---
## See Also
- [[Mastodon]] — service doc (deployment, access, S3 config)
- [[majortoot]] — server doc (incident log, specs)
- [[mastodon-federation]] — domain blocks, silencing, FediSeer
- [[mastodon-instance-tuning]] — character limits, media cache

View file

@ -0,0 +1,168 @@
---
title: Mastodon Federation — Domain Blocks, Silencing, and FediSeer
domain: selfhosting
category: services
tags:
- mastodon
- federation
- fediverse
- domain-blocks
- fediseer
- majortoot
status: published
created: 2026-04-22
updated: 2026-04-22
---
# Mastodon Federation — Domain Blocks, Silencing, and FediSeer
## Domain Block Severity — Critical Gotcha
The Mastodon admin UI labels severities as **Silence** and **Suspend**, but the integer values stored in the database are **not** in alphabetical order. The Rails enum is:
```ruby
# app/models/domain_block.rb
enum :severity, { silence: 0, suspend: 1, noop: 2 }, validate: true
```
| DB value | Meaning | Effect |
|----------|---------|--------|
| `0` | **silence** | Instance limited — posts hidden from public timelines; follows require manual approval |
| `1` | **suspend** | Full defederation — all content removed, all follows severed |
| `2` | **noop** | No effect — entry tracked but no federation action taken |
> [!warning] Don't trust raw integer queries
> If you query `domain_blocks` directly via psql, severity `0` looks like "the lowest level" but it's actually **silence** — a meaningful restriction. Always map through the enum. This tripped up a defederation investigation on 2026-04-22 where 13 silenced instances (including mastodon.social) were initially misread as noop.
### majortoot block inventory (as of 2026-04-22)
| Severity | Count | Notable entries |
|----------|-------|-----------------|
| silence (0) | 13 | mastodon.social, mastodon.world, chaos.social, fosstodon.org, tech.lgbt, threads.net |
| suspend (1) | 413 | Full defederation list |
| noop (2) | 0 | — |
---
## How Silencing Affects Follows
When your instance silences a remote domain, **every follow request from that domain requires manual approval** — even if your account has `locked = false`.
This is enforced in `app/lib/activitypub/activity/follow.rb`:
```ruby
if target_account.locked? || @account.silenced?
LocalNotificationWorker.perform_async(target_account.id, follow_request.id, 'FollowRequest', 'follow_request')
```
`@account.silenced?` returns true when the sending account's domain is in your `domain_blocks` at severity=0. The follow goes to the follow_requests queue instead of being automatically accepted.
**Practical effect on majortoot:** mastodon.social is silenced (added 2026-12-11, same day as a FluentInFinance follow-spam report). All follows from mastodon.social accounts appear as pending follow requests requiring manual approval. This is intentional — it's the expected behavior of a silence block.
---
## Checking Defederation Status
### Are major instances blocking you?
Check if your domain appears in another instance's public block list:
```bash
# Check mastodon.social's public block list (397 entries as of 2026-04-22)
curl -s "https://mastodon.social/api/v1/instance/domain_blocks" | \
python3 -c "import sys,json; data=json.load(sys.stdin); \
found=[b for b in data if b['domain']=='toot.majorshouse.com']; \
print('BLOCKED' if found else 'Not in public block list')"
```
Note: instances can mark blocks as private, so absence from the public list is not a guarantee.
### Are you in their peer list?
If you're in an instance's peer list, they've federated with you at some point:
```bash
curl -s "https://mastodon.social/api/v1/instance/peers" | \
python3 -c "import sys,json; data=json.load(sys.stdin); print('toot.majorshouse.com' in data)"
```
### Is the account visible from a remote instance?
```bash
curl -s "https://mastodon.social/api/v1/accounts/lookup?acct=majorlinux@toot.majorshouse.com" | \
python3 -c "import sys,json; d=json.load(sys.stdin); print('limited:', d.get('limited'), 'suspended:', d.get('suspended'))"
```
`limited: true` means the remote instance has silenced toot.majorshouse.com.
### Check federation delivery health (Sidekiq)
```bash
ssh root@100.110.197.17 "redis-cli llen sidekiq:dead; redis-cli llen sidekiq:retry"
# Both should be 0 for a healthy instance
```
### Check unavailable domains (delivery consistently failing)
```bash
ssh root@100.110.197.17 "
sudo -u postgres psql mastodon_production -c \
'SELECT domain, updated_at FROM unavailable_domains ORDER BY updated_at DESC LIMIT 20;'"
```
These are domains where ActivityPub delivery has repeatedly failed. Most are dead instances, not active blocks.
---
## FediSeer Registration
[FediSeer](https://fediseer.com) is a community service that tracks censures (formal complaints) against fediverse instances. Registering lets you monitor if any instance formally censures toot.majorshouse.com.
### majortoot status (registered 2026-04-22)
| Field | Value |
|-------|-------|
| Domain | toot.majorshouse.com |
| ID | 5575 |
| State | UP |
| Censures received | 0 |
| Endorsements | 0 |
| Tags | mastodon, selfhosted, leftist, foss |
| Guarantor | none |
| API key | Bitwarden — "FediSeer — toot.majorshouse.com" |
### Claiming / re-claiming your instance
```bash
# Claim (sends API key via DM from @fediseer@fediseer.com)
curl -s -X PUT "https://fediseer.com/api/v1/whitelist/toot.majorshouse.com" \
-H "Content-Type: application/json" \
-d '{"admin": "majorlinux", "pm_proxy": "MASTODON"}'
# The API key arrives as a DM — delete the DM after saving to Bitwarden
```
### Check censures
```bash
curl -s "https://fediseer.com/api/v1/censures/toot.majorshouse.com" | python3 -c \
"import sys,json; d=json.load(sys.stdin); print('Censures:', d.get('total',0))"
```
### Update tags
```bash
curl -s -X PUT "https://fediseer.com/api/v1/tags" \
-H "Content-Type: application/json" \
-H "apikey: <key-from-bitwarden>" \
-d '{"tags_csv": "mastodon,selfhosted,leftist,foss"}'
```
---
## See Also
- [[Mastodon]] — service doc
- [[majortoot]] — server doc
- [[mastodon-db-maintenance]] — statuses remove, accounts cull, vacuum
- [[mastodon-instance-tuning]] — character limits, media cache

View file

@ -0,0 +1,111 @@
---
title: "Fantastical Google Sync Error Flood — Phantom Calendars Fixed via syncselect"
domain: troubleshooting
category: productivity
tags: [fantastical, google-calendar, caldav, sync, macos, syncselect]
status: published
created: 2026-04-24
updated: 2026-04-24
---
# Fantastical Google Sync Error Flood — Phantom Calendars Fixed via syncselect
Fantastical floods its macOS unified log with Google Calendar sync errors, the app shows persistent sync failures in the UI, and re-adding the Google account inside Fantastical doesn't fix it. The cause is usually orphan calendar references — calendars that were deleted from Google Calendar but still enabled in Google's per-account CalDAV sync whitelist.
## The Short Answer
Visit **`https://www.google.com/calendar/syncselect`**, uncheck any calendars that no longer exist or you don't want Fantastical / Apple Calendar to try syncing, click Save. Fantastical's error flood stops within one sync cycle.
This is a per-Google-account page — completely independent of Fantastical's settings, and independent of the calendar list inside Google Calendar's main web UI.
## Background
Google Calendar has **three** separate notions of calendar "visibility" for a given account:
| Surface | What it controls |
|---|---|
| `calendar.google.com` main UI — calendar list in the left sidebar | What you see in Google's own web interface |
| `calendar.google.com/calendar/u/0/r/settings/calendar/...` — per-calendar settings | Sharing, notifications, access control |
| **`google.com/calendar/syncselect`** — sync selection | **What Google exposes to third-party CalDAV/Exchange clients** (Apple Calendar, Fantastical, Outlook, Thunderbird, etc.) |
Fantastical talks to Google via CalDAV. It asks Google for the list of calendars enabled for CalDAV sync. If `syncselect` still has a calendar flagged for sync but the calendar has been deleted from Google (or unshared from you), Google returns an inconsistent response — the CalDAV principal lists the calendar ID but any request for its data returns 404. Fantastical dutifully logs an error and retries next sync cycle. Multiply by the number of orphans and you get a flood.
Deleting a calendar from Google Calendar's main UI does **not** automatically remove it from `syncselect`. That's the gotcha.
## Symptoms
- Fantastical UI shows "Sync Error" or a red badge on the account
- macOS unified log filling with lines like:
```
[FBGooglePrincipalSyncSession.m] Unable to find Google Calendar information:
<calendar-id>@group.calendar.google.com in (<list of real calendars>)
```
- `dataaccessd` logs `Error Domain=kEKAccountErrorDomain Code=0` with `lastSyncStartDate = (null)`
- Fantastical's helper `85C27NK92C.com.flexibits.fantastical2.mac.helper` spams XPC / CoreData token errors every 3 seconds (secondary symptom when the token store gets wedged in the retry loop)
## Diagnosis
### Step 1 — Spot the phantom calendar IDs in the log
```bash
log show --last 5m --style compact \
--predicate 'eventMessage CONTAINS "Unable to find Google Calendar"' 2>/dev/null \
| grep -oE 'information: [a-zA-Z0-9._%@-]+' | sort -u
```
Each line returned is a calendar ID Fantastical is asking Google for that Google can't find.
### Step 2 — Get calendar names from Fantastical's local DB
The orphan IDs alone look random. To match them to what the calendars were called (so you know what to uncheck in syncselect), query Fantastical's SQLite DB:
```bash
DB="$HOME/Library/Group Containers/85C27NK92C.com.flexibits.fantastical2.mac/Database/Fantastical-8.fcdata"
for id in <each-orphan-id-here>; do
echo "--- $id ---"
strings "$DB" 2>/dev/null | grep "$id" | head -5
done
```
Fantastical stores the calendar's display name near each ID in the binary form. You may see names like `Kitchen Lights`, `Major7`, or other labels that remind you what the calendar was used for — often a deleted smart-home automation trigger, an old device's dedicated calendar, a former coworker's shared calendar, a subscribed sports or holiday calendar that moved.
### Step 3 — Visit syncselect
Open `https://www.google.com/calendar/syncselect` in the same browser you're signed in with. You'll see every calendar Google knows about for this account, with a checkbox per entry:
- ✅ Live calendars you want on devices — leave checked
- ❌ Orphans, former smart-home triggers, deleted shared calendars — **uncheck**
- Unsure? Cross-reference against the names from Step 2
Click **Save**.
## Fix
1. Uncheck orphans at `https://www.google.com/calendar/syncselect`, click Save.
2. Let Fantastical complete one more sync cycle (or quit + relaunch for faster turnaround).
3. Verify the log is clean:
```bash
log show --last 2m --style compact \
--predicate 'eventMessage CONTAINS "Unable to find Google Calendar"' 2>/dev/null \
| wc -l
```
Should return 0.
**What you should NOT do as a first attempt:**
- Remove and re-add the Google account inside Fantastical. This fixes some orphans but not all — Fantastical's local event cache keeps references to calendars that have associated cached events, so orphans with historical data survive a standard account re-add. Hit `syncselect` first.
- Delete Fantastical's `.fcdata` SQLite. Nuclear, loses local cache, unnecessary for this specific issue.
## Gotchas & Notes
- **syncselect is per-Google-account**, so if you have multiple Google accounts in Fantastical, each needs its own visit. The URL will use whichever account you're currently signed in with in the browser.
- **Calendar deletion from `calendar.google.com` doesn't propagate to syncselect.** This is a Google quirk, not a Fantastical bug.
- **The same fix applies to Apple Calendar.app** if it's showing the same sync errors — Fantastical and Apple Calendar use identical CalDAV plumbing via macOS's `dataaccessd`.
- The phantom calendar IDs will remain in Fantastical's `.fcdata` for a while even after the fix — Fantastical doesn't aggressively garbage-collect cached event data. This is cosmetic and doesn't re-trigger sync errors as long as syncselect no longer lists them.
- The XPC `Unable to create token NSXPCConnection` loop is downstream of the sync error flood — when Fantastical's helper gets wedged on repeated failed syncs, its CoreData-backed OAuth token store can't initialize cleanly. Fixing syncselect + a full Fantastical quit (menubar → Quit Fantastical, not just `Cmd+Q`) + relaunch clears this too.
## Related
- [[Recap]] skill — uses Google Calendar MCPs that are unaffected by this issue (MCPs go through Google's API directly, not CalDAV)
- Google's syncselect URL: https://www.google.com/calendar/syncselect

View file

@ -0,0 +1,98 @@
---
title: "Fantastical MCP Server: Permission Denied on Launch (macOS Quarantine)"
domain: troubleshooting
category: productivity
tags: [fantastical, mcp, claude, macos, gatekeeper, quarantine, cowork]
status: published
created: 2026-04-26
updated: 2026-04-26
---
# Fantastical MCP Server: Permission Denied on Launch (macOS Quarantine)
Fantastical's MCP server fails to connect in Claude/Cowork with a `Server disconnected` error and no dialog or prompt to explain why. The binary is installed but macOS Gatekeeper silently blocks it from executing.
## The Short Answer
```bash
xattr -d com.apple.quarantine "/Users/majorlinux/Library/Application Support/Claude/Claude Extensions/ant.dir.gh.flexibits.fantastical-mcp/server/FantasticalMCP.app"
```
Fully quit and reopen Cowork. Fantastical MCP reconnects cleanly.
If the quarantine attribute isn't present, also try setting the executable bit:
```bash
chmod +x "/Users/majorlinux/Library/Application Support/Claude/Claude Extensions/ant.dir.gh.flexibits.fantastical-mcp/server/FantasticalMCP.app/Contents/MacOS/FantasticalMCP"
```
## Why This Happens
macOS automatically tags downloaded files with a `com.apple.quarantine` extended attribute. When you launch an app yourself, macOS shows a Gatekeeper dialog — click Open, and the quarantine flag is cleared. But the FantasticalMCP binary is never launched by the user directly; Claude/Cowork spawns it as a subprocess. There's no dialog, and Gatekeeper just returns `Permission denied`. Claude sees the process die immediately and logs `Server disconnected`.
This recurs after any Fantastical update that replaces the MCP binary — the new binary comes in quarantined again.
## Diagnosis
The log tells the whole story. Check:
```bash
tail -n 50 ~/Library/Logs/Claude/mcp-server-Fantastical.log
```
If you see this sequence, it's the quarantine issue:
```
Server started and connected successfully
Failed to spawn process: Permission denied
Server transport closed
Server disconnected.
```
## Full Fix
**Step 1 — Remove the quarantine flag:**
```bash
xattr -d com.apple.quarantine \
"/Users/majorlinux/Library/Application Support/Claude/Claude Extensions/ant.dir.gh.flexibits.fantastical-mcp/server/FantasticalMCP.app"
```
**Step 2 — Verify the attribute is gone:**
```bash
xattr "/Users/majorlinux/Library/Application Support/Claude/Claude Extensions/ant.dir.gh.flexibits.fantastical-mcp/server/FantasticalMCP.app"
```
Should return empty or only non-quarantine attributes. If `com.apple.quarantine` is still listed, re-run step 1.
**Step 3 — Fully quit and reopen Cowork:**
Cmd+Q (not just close the window). Closing the window leaves the MCP host process running — it won't retry the failed server until the app fully relaunches.
**Step 4 — Verify connection:**
Check the log again:
```bash
tail -n 10 ~/Library/Logs/Claude/mcp-server-Fantastical.log
```
You should see `Server started and connected successfully` with no `Permission denied` line following it.
## After a Fantastical Update
If this breaks again after Fantastical auto-updates, re-run the `xattr -d` command from Step 1. The update replaces the binary and macOS re-quarantines the new one.
## MCP Log Locations
| Log | Path |
|-----|------|
| Fantastical MCP | `~/Library/Logs/Claude/mcp-server-Fantastical.log` |
| All MCP servers | `~/Library/Logs/Claude/mcp*.log` |
| Combined MCP log | `~/Library/Logs/Claude/mcp.log` |
## Related
- [[Fantastical Google Sync Error Flood — Phantom Calendars Fixed via syncselect]]
- MCP debugging docs: https://modelcontextprotocol.io/docs/tools/debugging

View file

@ -1,126 +0,0 @@
---
title: "Fedora usrmerge: ebtables Symlink Blocks Directory Consolidation"
domain: troubleshooting
category: fedora
tags: [fedora, usrmerge, ebtables, update-alternatives, ansible, dnf]
status: published
created: 2026-04-19
updated: 2026-04-19
---
# Fedora usrmerge: ebtables Symlink Blocks Directory Consolidation
## Symptom
Every `dnf upgrade` on Fedora 43 (and some earlier Fedora releases) emits a warning partway through the transaction:
```
/usr/sbin cannot be merged yet, /usr/sbin/ebtables points to /etc/alternatives/ebtables
```
When the upgrade is driven by Ansible, the warning contaminates the module's JSON output and surfaces in a play run as:
```
TASK [Upgrade all packages on CentOS/Fedora servers] ***
changed: [majorlab]
[WARNING]: Module invocation had junk after the JSON data:
/usr/sbin cannot be merged yet, /usr/sbin/ebtables points to /etc/alternatives/ebtables
changed: [majordiscord]
```
The upgrade succeeds — the warning is cosmetic — but it keeps firing on every run until the underlying state is cleaned up.
## Why It Happens
Fedora's `usrmerge` transition turns `/usr/sbin` into a symlink to `/usr/bin`. The `filesystem` package's post-install scriptlet enforces that at every transaction: it walks `/usr/sbin` looking for any entity still pinned to the old path and refuses to consolidate until they're removed.
`ebtables` triggers this because `update-alternatives` can create registrations at `/usr/sbin/<cmd>` with targets in `/etc/alternatives/<cmd>`. Those symlinks:
- Are **not owned by any rpm** (confirmable with `rpm -qf /usr/sbin/ebtables` → "not owned")
- Predate the usrmerge — they were created when `/usr/sbin` was still a real directory
- Point to a target (`/etc/alternatives/ebtables`) that in turn points back into `/usr/sbin/ebtables-legacy` or `/usr/bin/ebtables-nft`
Because these live outside rpm, no package upgrade can clean them up. The filesystem scriptlet detects the blocker and backs off.
## Investigation
1. Confirm which hosts are affected:
```bash
ansible fedora -m shell -a '[ -e /usr/sbin/ebtables ] && ls -la /usr/sbin/ebtables'
```
2. Inspect the alternatives registration:
```bash
update-alternatives --display ebtables
```
Note whether the link points at `/usr/bin/ebtables-nft` (nft backend) or `/usr/sbin/ebtables-legacy` (legacy backend). Different Fedora images ship with different defaults.
3. Confirm ownership:
```bash
rpm -qf /usr/sbin/ebtables /etc/alternatives/ebtables
```
Both should report "not owned by any package." That's the signal.
## Fix
Tear down the alternative, delete the blocker symlinks, then re-register with **`/usr/bin` paths on both sides of the registration** so the scriptlet has nothing left to complain about.
```bash
# Capture current provider first (nft or legacy)
update-alternatives --display ebtables
# Remove the stale registration
update-alternatives --remove-all ebtables
# Clear the blocking symlinks (not rpm-owned)
rm -f /usr/sbin/ebtables /etc/alternatives/ebtables
# Re-register with /usr/bin paths — example for nft backend
update-alternatives --install /usr/bin/ebtables ebtables /usr/bin/ebtables-nft 10 \
--slave /usr/bin/ebtables-restore ebtables-restore /usr/bin/ebtables-nft-restore \
--slave /usr/bin/ebtables-save ebtables-save /usr/bin/ebtables-nft-save \
--slave /usr/share/man/man8/ebtables.8.gz ebtables.8.gz /usr/share/man/man8/ebtables-nft.8.gz
# For legacy backend, swap -nft suffixes for -legacy
```
Verify:
```bash
which ebtables # should resolve to /usr/bin/ebtables
ebtables -V # should print the version without error
test -e /usr/sbin/ebtables && echo BLOCKER || echo clean
```
Next `dnf upgrade` will consolidate `/usr/sbin` cleanly with no warning.
## Ansible Playbook
`MajorAnsible/fix_ebtables_usrmerge.yml` handles this fleet-wide:
- Detects the backend (nft vs legacy) per host via `update-alternatives --display`
- Uses `check_mode: false` on the detection query — otherwise `ansible.builtin.command` is skipped in `--check`, the detection fact defaults, and downstream conditionals misfire (see [Ansible Check Mode False Positives](ansible-check-mode-false-positives.md) for the broader pattern)
- Safety check: bails out if `/usr/bin/ebtables-<backend>` is missing before touching anything
- Idempotent on re-run — no alternative registered → `end_host`
Applied 2026-04-19 across the four Fedora hosts:
| Host | Backend |
|---|---|
| majorlab | nft (`ebtables v1.8.11 nf_tables`) |
| majorhome | nft |
| majormail | legacy (`ebtables v2.0.11 (legacy)`) |
| majordiscord | legacy |
## Why not just remove ebtables?
Tempting, since nothing on the fleet currently writes L2 bridge firewall rules. But:
- `ebtables` is a transitive dependency of iptables/libvirt/networking packages on Fedora — removing it fights the package manager
- The package itself isn't the problem; the **stale alternatives state** is
Cleaning up the registration is cheaper than untangling the dependency graph.
## Related
- [Ansible Check Mode False Positives in Verify/Assert Tasks](ansible-check-mode-false-positives.md)
- Playbook: `MajorAnsible/fix_ebtables_usrmerge.yml`
- Fedora usrmerge background: `man file-hierarchy`, Fedora Change page "UsrMove"

View file

@ -0,0 +1,60 @@
---
title: "Python smtplib: Missing Date/Message-ID Headers Break Mail Clients"
domain: troubleshooting
category: general
tags: [email, python, smtplib, spam, rfc, spark]
status: published
created: 2026-04-29
updated: 2026-04-29
---
# Python smtplib: Missing Date/Message-ID Headers Break Mail Clients
## Problem
Emails sent via Python's `smtplib` and `EmailMessage` appear on some mail clients but not others. The emails are delivered to the server and visible in Maildir, but specific clients silently suppress them.
## Root Cause
Python's `EmailMessage` does **not** automatically add `Date:` or `Message-ID:` headers. These are required by RFC 5322. Without them:
- **SpamAssassin** flags `MISSING_DATE` and `MISSING_MID`, and may set `X-Spam-Flag: YES` even if the overall score is below the spam threshold
- **Mail clients** (e.g., Spark) may filter on the spam flag header and silently hide the message — no Junk folder, just invisible
- **Other clients** (e.g., iPhone Mail, some Spark builds) may be more lenient and display the message anyway
This creates a confusing situation where the same email appears on one device but not another, despite both using the same IMAP account.
## Fix
Always include `Date` and `Message-ID` headers when constructing emails with `EmailMessage`:
```python
import smtplib
from email.message import EmailMessage
from email.utils import formatdate, make_msgid
msg = EmailMessage()
msg['Subject'] = 'Your subject here'
msg['From'] = 'sender@example.com'
msg['To'] = 'recipient@example.com'
msg['Date'] = formatdate(localtime=True)
msg['Message-ID'] = make_msgid(domain='example.com')
msg.set_content('Email body here')
with smtplib.SMTP('mail.example.com', 25) as s:
s.send_message(msg)
```
## Verification
After applying the fix, check that SpamAssassin no longer flags the headers:
```bash
# Check email headers on the mail server
grep -E 'MISSING_DATE|MISSING_MID|X-Spam' /var/vmail/domain/user/cur/<message-file>
```
A clean message should show `X-Spam-Status: No` with no `MISSING_DATE` or `MISSING_MID` in the test list.
## Key Takeaway
Python's `EmailMessage` is a low-level builder — it trusts you to set all required headers. Unlike higher-level mail libraries or webmail interfaces, it will happily send a message with no date or message ID. Always add both explicitly in any script that sends email via `smtplib`.

View file

@ -0,0 +1,98 @@
---
title: "Ubuntu dist-upgrade Quarantines Third-Party Repos"
domain: troubleshooting
category: ubuntu
tags: [ubuntu, apt, dist-upgrade, repositories, tailscale, digitalocean]
status: published
created: 2026-04-28
updated: 2026-04-28
---
# Ubuntu dist-upgrade Quarantines Third-Party Repos
## Problem
When running `do-release-upgrade` (e.g., Jammy 22.04 to Noble 24.04), Ubuntu renames all third-party `.list` files in `/etc/apt/sources.list.d/` to `.list.distUpgrade`. This silently disables every third-party repo — packages from those repos stop receiving updates with no warning.
The upgrade process does this intentionally because it can't guarantee third-party repos will have packages for the new release. Some repos get re-added as `.sources` files during the upgrade, but many don't.
## Symptoms
- `apt list --upgradable` shows nothing for packages you know have updates (e.g., Tailscale stuck on an old version)
- `apt list --installed` shows packages as `[installed,local]` instead of `[installed]` — the "local" tag means apt has no repo to check for updates
- `.distUpgrade` files accumulate in `/etc/apt/sources.list.d/` indefinitely
## Diagnosis
Check for quarantined repos:
```bash
ls /etc/apt/sources.list.d/*.distUpgrade
```
For each file, check whether a replacement `.list` or `.sources` file already exists:
```bash
ls /etc/apt/sources.list.d/*.list /etc/apt/sources.list.d/*.sources
```
## Fix
### Distro-agnostic repos (e.g., DigitalOcean agents)
If the repo URL doesn't reference a distro codename (jammy/noble), just rename:
```bash
mv /etc/apt/sources.list.d/digitalocean-agent.list.distUpgrade \
/etc/apt/sources.list.d/digitalocean-agent.list
```
### Distro-specific repos (e.g., Tailscale, ondrej-php)
The quarantined file references the old distro (jammy). Re-run the upstream install script to get a correct entry for the new release:
```bash
# Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
# Or manually: update the codename
sed 's/jammy/noble/' /etc/apt/sources.list.d/tailscale.list.distUpgrade \
> /etc/apt/sources.list.d/tailscale.list
apt update && apt upgrade tailscale
```
### Already replaced by .sources
If the upgrade process already created a `.sources` replacement (common for ubuntu-esm-apps, ondrej-php), the `.distUpgrade` file is just clutter — delete it:
```bash
rm /etc/apt/sources.list.d/ondrej-ubuntu-php-jammy.list.distUpgrade
```
### After all fixes
```bash
apt update
apt list --upgradable # should now show pending updates
apt upgrade
```
## Real-World Example: MajorsHouse Fleet (2026-04-28)
Five Ubuntu 24.04 servers were dist-upgraded from Jammy in October 2024. The `.distUpgrade` quarantine was discovered 6 months later when Tailscale's website wouldn't load (Pi-hole was blocking subdomains, but the investigation revealed teelia was stuck on Tailscale 1.76.0 — 20 versions behind — because the repo was disabled).
| Host | Quarantined files | Impact |
|------|------------------|--------|
| dcaprod | 8 | Tailscale, DO agents, MySQL, ondrej-php, ESM, vector |
| teelia | 4 | Tailscale (stuck on 1.76.0), DO agents, certbot bionic PPA |
| majorlinux | 8 | Tailscale, DO agents, MySQL, ondrej-php, ESM, apt-fast |
| majortoot | 11 | Tailscale, DO agents, nodesource, PostgreSQL, vector, zabbix, ESM |
| tttpod | 0 | Clean — was likely rebuilt rather than upgraded |
All files were audited, stale ones deleted, distro-agnostic repos renamed, and distro-specific repos re-added via upstream install scripts. DO agents upgraded from 3.16.11 to 3.18.12, teelia's Tailscale jumped from 1.76.0 to 1.96.4.
## Prevention
- **Post-upgrade audit:** After any `do-release-upgrade`, immediately run `ls /etc/apt/sources.list.d/*.distUpgrade` and resolve each file.
- **Prefer `.sources` format:** When adding new third-party repos, use the DEB822 `.sources` format — it's what Ubuntu itself uses on Noble and is handled more gracefully during upgrades.
- **Ansible playbook:** Consider a post-upgrade play that checks for `.distUpgrade` files and alerts or auto-fixes distro-agnostic repos.