MajorLinux 1c17bdb60a Add: Castopod federation — stale cached avatar URL fix

When a remote actor updates their avatar, Mastodon (Paperclip) deletes the
old S3 object and stores only the new filename. Castopod 2.0.0 caches the
URL of every federated actor in cp_fediverse_actors and never refetches,
so its admin templates emit a dead link forever (the resulting S3 403 is
anti-enumeration, hiding what is really a 404). Article documents the
diagnosis pattern and three fixes (manual UPDATE, DELETE-and-refetch,
bulk audit), plus the Mastodon-side query for sourcing the correct URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 01:51:18 -04:00

8.1 KiB

Raw Blame History

title

domain

Castopod: Stale Federated Avatar URLs After Remote Profile Updates

🛑 Problem

Your Castopod admin pages — most visibly the notifications list (/cp-admin/podcasts/<id>/notifications) — show broken avatars for federated actors. The browser dev tools (or a direct curl -I) on the avatar URL returns:

HTTP/1.1 403 Forbidden
Server: AmazonS3

…with the response body:

<Error>
  <Code>AccessDenied</Code>
  <Message>Access Denied</Message>
  ...
</Error>

The hostname is the remote instance's S3 bucket (e.g. s3.amazonaws.com/<their-bucket>/accounts/avatars/...). Other actors in the same notifications list — those with avatars on Mastodon's own CDN, or on instances using path-stable storage — render fine.

This article explains why the alarm code is misleading, what's actually broken, and how to fix it on Castopod.

🔬 Why "AccessDenied" is misleading

S3 returns 403 AccessDenied to anonymous requesters for any missing object — by design, as anti-enumeration. Anonymous users typically don't have s3:ListBucket permission on the bucket, so S3 deliberately can't tell them whether the key is missing or merely forbidden. Both cases produce the same 403.

So when you see 403 AccessDenied on a remote avatar URL, the actual problem is almost always that the object no longer exists. The bucket is fine; the file is gone.

Verifying that interpretation

If you have access to the remote instance (or to S3 credentials for that bucket):

aws s3api head-object --bucket <bucket> --key accounts/avatars/.../<filename>.jpeg

If you see An error occurred (404) when calling the HeadObject operation: Not Found, the object is genuinely gone — and the upstream user has updated their avatar.

🔍 What's actually broken

Mastodon (and most ActivityPub servers using Paperclip-style storage) deletes the old object on avatar replacement and stores only the current filename in the DB. The remote instance is functioning normally — its current <img> URL points to a different filename and serves correctly.

Castopod 2.0.0 (verified up to 2.0.0-next.4) caches the avatar URL of every federated actor in cp_fediverse_actors.avatar_image_url when it first sees activity from that actor — and never refetches. The admin templates (e.g. themes/cp_admin/podcast/notifications.php) emit that stored URL directly into <img src>. Once the upstream replaces the avatar:

Old object deleted → S3 returns 403 to anonymous fetchers
Castopod still renders the dead URL forever
Every cached page using that template shows a broken image

The same pattern applies to cover_image_url (header).

✅ Fix

You have three options, in increasing order of "this stays fixed."

Option 1 — Manual SQL update (one-shot)

Recommended for one or two stale actors. Get the current URL from the upstream instance.

If the upstream is your own Mastodon instance:

sudo -u postgres psql mastodon_production -t -A \
  -c "SELECT id, avatar_file_name, header_file_name FROM accounts WHERE username='<their-username>'"

Construct the canonical URL using the standard Paperclip path scheme. For an account ID like 109326168175475699, the path is built by chunking the ID three digits at a time:

accounts/avatars/109/326/168/175/475/699/original/<avatar_file_name>
accounts/headers/109/326/168/175/475/699/original/<header_file_name>

Then UPDATE the Castopod row:

mysql -u $CP_DB_USER -p$CP_DB_PASS $CP_DB_NAME <<'SQL'
UPDATE cp_fediverse_actors
SET avatar_image_url = 'https://<s3-host>/<bucket>/accounts/avatars/109/326/168/175/475/699/original/<new>.jpeg',
    cover_image_url  = 'https://<s3-host>/<bucket>/accounts/headers/109/326/168/175/475/699/original/<new>.jpg',
    updated_at       = NOW()
WHERE username = '<their-username>'
  AND domain   = '<their-domain>';
SQL

Then clear the Castopod cache so any cached HTML rerenders:

cd /var/www/html/castopod
sudo -u www-data php spark cache:clear

Verify:

curl -sI 'https://<new-url>' | head -1   # expect HTTP/1.1 200 OK

Option 2 — Delete and let Castopod refetch

For a one-shot self-healing fix, delete the actor row entirely:

DELETE FROM cp_fediverse_actors WHERE username='<u>' AND domain='<d>';

Castopod will repopulate the row from the next inbound activity from that actor (favourite, boost, mention, follow…). Caveat — verify foreign-key cascades first: cp_fediverse_favourites, cp_fediverse_follows, cp_fediverse_posts, and cp_fediverse_notifications all reference actor_id. Depending on the migration version, ON DELETE may cascade or restrict. Check with:

mysql -u $CP_DB_USER -p$CP_DB_PASS $CP_DB_NAME -e "
  SELECT TABLE_NAME, CONSTRAINT_NAME, DELETE_RULE
  FROM information_schema.REFERENTIAL_CONSTRAINTS
  WHERE CONSTRAINT_SCHEMA = '$CP_DB_NAME'
    AND REFERENCED_TABLE_NAME = 'cp_fediverse_actors';
"

If deletes cascade, you'll lose the activity history attributed to that actor. Use Option 1 instead.

Option 3 — Bulk audit and update

If multiple federated actors have likely-stale avatars (any old enough that an upstream user might have refreshed their profile picture), audit them all:

mysql -u $CP_DB_USER -p$CP_DB_PASS $CP_DB_NAME -BNe "
  SELECT id, username, domain, avatar_image_url
  FROM cp_fediverse_actors
  WHERE avatar_image_url IS NOT NULL
" | while IFS=$'\t' read -r id user dom url; do
    code=$(curl -s -o /dev/null -w "%{http_code}" "$url")
    [ "$code" != "200" ] && echo "BROKEN $code $id $user@$dom $url"
done

For each broken row, fetch the upstream's current actor JSON and update from icon.url / image.url:

curl -s -H 'Accept: application/activity+json' \
  "https://<their-domain>/users/<their-username>" | jq '{icon, image}'

Then run the Option 1 SQL update with the fresh URLs.

🧪 Why this isn't fixable on the upstream side

Once the old object is deleted, you can't restore the URL without re-uploading bytes to the exact original key — which Mastodon won't do, because its DB only knows about the new filename. Trying to "fix" it on the Mastodon side means resurrecting a file Mastodon has no record of and that no fresh ActivityPub request would emit a URL for. The fix has to live on the consumer (Castopod) because Castopod is the one holding the stale reference.

This applies to every federation consumer that caches URLs by reference rather than fetching bytes locally. Mastodon, Pleroma, Akkoma, and Misskey all cache the bytes; that's why they self-heal across remote avatar swaps. Castopod 2.0.0 currently does not.

🛠 Long-term mitigations

This is a Castopod design issue worth raising upstream:

Add a last_refreshed_at to cp_fediverse_actors and a worker that refetches actor JSON on a schedule.
Or fetch and store avatars locally on first sight, the way Mastodon does.

A fediverse:refresh-actor spark command would also let admins fix stale rows without writing SQL.

If you have a recurring case (you update your Mastodon avatar often, and you also operate a Castopod instance under your own control), keep the Option 1 SQL handy as a one-liner. After your own avatar update, run it within minutes and the dead-URL window closes before it spreads to many cached pages.

📚 References

Castopod source (themes/cp_admin/podcast/notifications.php) — uses avatar_image_url directly in <img src>
AWS S3 anti-enumeration: 403 vs 404 is bucket-policy-dependent; see GetObject — Permissions Required
Mastodon Paperclip storage layout: accounts/avatars/<3-digit chunks of account id>/original/<file_name>
Related fix patterns: Tuning Netdata web_log_1m_successful for Redirect-Heavy WordPress Sites — shares the "the alarm is technically correct, but means something different than you think" theme

8.1 KiB Raw Blame History