github: shadowy octopus with the head of a robot, emblazoned with the Dreamwidth swirl (Default)
github ([personal profile] github) wrote in [site community profile] changelog2026-03-12 05:08 pm

[dreamwidth/dreamwidth] 64b109: Remove dead utf8convert links, handle invalid UTF-...

Branch: refs/heads/main Home: https://github.com/dreamwidth/dreamwidth Commit: 64b109f6fdd36a9130ef4a90057e71e07be5ec86 https://github.com/dreamwidth/dreamwidth/commit/64b109f6fdd36a9130ef4a90057e71e07be5ec86 Author: Mark Smith mark@dreamwidth.org Date: 2026-03-12 (Thu, 12 Mar 2026)

Changed paths: M bin/upgrading/deadphrases.dat M cgi-bin/DW/Controller/Create.pm M cgi-bin/DW/Controller/Manage/Profile.pm M cgi-bin/LJ/TextUtil.pm M t/plack-request.t M t/textutil.t M views/create/setup.tt M views/manage/profile.tt M views/manage/profile.tt.text

Log Message:


Remove dead utf8convert links, handle invalid UTF-8 in profiles (#3535)

  • Remove dead utf8convert links and handle invalid UTF-8 in profiles

The utf8convert page was removed years ago, but the profile editing and account creation pages still linked to it when a user's name or bio contained invalid UTF-8. This left users unable to edit those fields at all.

Instead of hiding fields behind a dead link, clean invalid UTF-8 byte sequences on load using a new LJ::clean_utf8() utility function. This strips broken sequences while preserving valid multi-byte characters, so the edit fields are always shown.

  • Add LJ::clean_utf8() to LJ::TextUtil
  • Clean name/bio on load in profile and create controllers
  • Remove text_in/is_utf8 conditionals from profile.tt and setup.tt
  • Remove name_absent/bio_absent hidden input fallback logic
  • Mark dead translation strings in deadphrases.dat
  • Add 16 regression tests for text_in, text_trim, and clean_utf8

Fixes #1894

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • Add tests for undef input and 4-byte UTF-8 (emoji) in clean_utf8

Cover edge cases: undef returns empty string, emoji (4-byte sequences) are preserved, and truncated 4-byte sequences are properly stripped while preserving valid preceding characters.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com