majorwiki/02-selfhosting/services/mastodon-s3-acl-upload-failures.md
MajorLinux 4e63d8546c mastodon: document S3 ACL upload failures + bulk avatar restore
New article mastodon-s3-acl-upload-failures.md: a BucketOwnerEnforced S3
bucket plus a stale S3_PERMISSION/S3_ACL in .env.production makes every
Mastodon upload fail with AccessControlListNotSupported, silently. Covers
symptoms (incl. why a missing object returns 403 not 404), diagnosis,
the fix (S3_PERMISSION= empty, public read via bucket policy), recovery,
a synthetic-write health check, and Ansible enforcement.

Extend mastodon-prune-profiles-trap.md: add a "Bulk restore at scale"
procedure (list existing keys, null missing DB refs, enqueue
RedownloadAvatar/HeaderWorker), a "storage-level deletion without DB
de-ref" section, and a stronger recommendation to disable automated
profile pruning (and scheduled accounts refresh --all) entirely.

Link both from SUMMARY.md and the selfhosting index.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 15:45:23 -04:00

138 lines
7.1 KiB
Markdown

---
title: Mastodon on S3 — Silent Upload Failures When the Bucket Disables ACLs
description: Why a BucketOwnerEnforced S3 bucket plus a stale S3_PERMISSION/S3_ACL in .env.production makes every Mastodon media upload fail with AccessControlListNotSupported, how to diagnose it, and how to fix and monitor it.
domain: selfhosting
category: services
tags:
- mastodon
- fediverse
- self-hosting
- aws
- s3
- paperclip
- troubleshooting
status: published
created: 2026-06-01
updated: 2026-06-01
---
# Mastodon on S3 — Silent Upload Failures When the Bucket Disables ACLs
If your Mastodon instance stores media on S3 and you switch the bucket to **Object Ownership = `BucketOwnerEnforced`** (which AWS now recommends, and which the console nudges you toward), every media upload can start failing **silently** unless you also remove the object-ACL setting from `.env.production`. New avatars, headers, and attachments stop appearing; old ones keep working; nothing obvious is logged. This article is the diagnosis and fix.
## TL;DR
- `BucketOwnerEnforced` **disables ACLs entirely** on the bucket. Any request that carries an `x-amz-acl` header is rejected with `AccessControlListNotSupported: The bucket does not allow ACLs`.
- Mastodon (via Paperclip) attaches `x-amz-acl` to every upload **if** `S3_PERMISSION` (or `S3_ACL`) is set in `.env.production`. The common value `S3_PERMISSION=public-read` — or a migration leftover like `S3_PERMISSION=private` — triggers the rejection.
- Result: **every new upload fails**, but the database row is still updated, so Mastodon believes it has the file. The object never lands → broken image. Objects written *before* the bucket changed keep serving fine, which masks the problem.
- **Fix:** set `S3_PERMISSION=` (empty) and remove any `S3_ACL=` line, then restart `mastodon-web` + `mastodon-sidekiq`. Public read is now served by the **bucket policy**, not per-object ACLs.
## Symptoms
- Newly-changed avatars/headers show broken; attachments on new posts fail to display.
- Avatars that were cached **before** the bucket setting changed still work — so "some work, some don't."
- `tootctl` and the web UI report success; Sidekiq doesn't obviously error.
- Direct fetch of a broken object's URL returns **403 AccessDenied** (not 404 — see below).
## Why a missing object returns 403, not 404
A typical Mastodon S3 bucket policy grants public `s3:GetObject` but **not** `s3:ListBucket`. Without `ListBucket`, S3 hides whether a key exists: a `GET` on a **missing** key returns **403 AccessDenied**, identical to a permissions denial. So "403" here usually means *the object isn't there*, not *the object is forbidden*. This is why the failure reads like a permissions problem when it's really a failed write.
## Diagnosis
Run these with the instance's own S3 credentials (e.g. via `bin/rails runner`, which loads `.env.production`):
```ruby
require "aws-sdk-s3"
c = Aws::S3::Client.new(region: ENV["S3_REGION"],
access_key_id: ENV["AWS_ACCESS_KEY_ID"],
secret_access_key: ENV["AWS_SECRET_ACCESS_KEY"])
b = ENV["S3_BUCKET"]
# 1. Is the bucket ACL-disabled?
puts c.get_bucket_ownership_controls(bucket: b).ownership_controls.rules.map(&:object_ownership).inspect
# => ["BucketOwnerEnforced"] <-- ACLs are OFF
# 2. Does an upload WITH an ACL fail, and WITHOUT one succeed?
begin
c.put_object(bucket: b, key: "tmp/acltest", body: "x", acl: "public-read")
puts "PUT+acl: OK"
rescue => e
puts "PUT+acl FAILS: #{e.class} / #{e.message}" # AccessControlListNotSupported
end
c.put_object(bucket: b, key: "tmp/noacltest", body: "x") # succeeds
c.delete_object(bucket: b, key: "tmp/noacltest")
# 3. Confirm a "broken" avatar's object is actually missing
key = Account.find_by(username: "someuser", domain: "remote.tld").avatar.path.sub(%r{^/}, "")
begin; c.head_object(bucket: b, key: key); puts "EXISTS"
rescue Aws::S3::Errors::NotFound; puts "MISSING"; end
```
If #1 shows `BucketOwnerEnforced` and #2 shows the ACL'd PUT failing while the plain PUT succeeds, you've confirmed it.
Check `.env.production` for the offending settings:
```bash
grep -E '^S3_(ACL|PERMISSION|NO_INHERIT)' /home/mastodon/live/.env.production
# S3_ACL=private <-- remove
# S3_PERMISSION=private <-- set empty
```
## The fix
1. Edit `.env.production`:
- `S3_PERMISSION=` (empty — Paperclip then sends no `x-amz-acl` header)
- remove/comment any `S3_ACL=` line
2. Restart so the env is reloaded: `systemctl restart mastodon-sidekiq mastodon-web`
3. Verify the previously-failing write path now works — reprocess any existing avatar and confirm it serves 200:
```ruby
a = Account.local.first
a.avatar.reprocess! # used to raise AccessControlListNotSupported; now succeeds
```
Public readability is now provided by the **bucket policy** (grant `s3:GetObject` on `arn:aws:s3:::your-bucket/*` to `Principal: "*"`), with the account-level **Block Public Access** "ACLs" toggles off and "policy" allowed. You do **not** need per-object ACLs at all.
### Recovering the avatars that broke while it was failing
Any media that failed to upload during the broken window is gone from S3 while the DB still references it. Because Mastodon's redownload workers **skip accounts whose `*_file_name` is already set**, you must null the dead reference first, then enqueue the worker. See [Mastodon — The `--prune-profiles` Trap and How to Recover](mastodon-prune-profiles-trap.md#bulk-restore-at-scale) for the bulk procedure.
## Don't let it happen silently again — monitor uploads
The worst part of this bug is the silence. Add a periodic **synthetic write check** that uploads a tiny object with the app's own credentials, confirms it, deletes it, and alerts on failure:
```ruby
s3.put_object(bucket: b, key: "health/upload-check", body: "ok") # no acl
s3.head_object(bucket: b, key: "health/upload-check")
s3.delete_object(bucket: b, key: "health/upload-check")
# any exception -> email an alert
```
Pair it with an HTTP check that your **local** account avatars all return 200 (they always should). Run both every few hours from cron. A regression then pages you in hours instead of being discovered by a user weeks later.
## Ansible enforcement
If you manage the host with Ansible, enforce the safe values so a future template render can't reintroduce the ACL header:
```yaml
- name: Ensure S3_PERMISSION is empty (no x-amz-acl on uploads)
ansible.builtin.lineinfile:
path: /home/mastodon/live/.env.production
regexp: '^S3_PERMISSION='
line: 'S3_PERMISSION='
notify: Restart Mastodon services
- name: Remove any active S3_ACL line (ACLs unsupported on this bucket)
ansible.builtin.lineinfile:
path: /home/mastodon/live/.env.production
regexp: '^S3_ACL=.+'
state: absent
notify: Restart Mastodon services
```
## Related
- [Mastodon — The `--prune-profiles` Trap and How to Recover](mastodon-prune-profiles-trap.md) — the other way avatars go missing, plus the bulk-restore script
- [Mastodon Post-Install Hardening (Permissions + Account)](mastodon-post-install-hardening.md)
- [AWS S3 Cost Management](../cloud/aws-s3-cost-management.md) — pruning attachments to control bucket size (safely)