[dreamwidth/dreamwidth] 39a949: SearchCopier: stream from cluster DBs, match Sphin...
Branch: refs/heads/main Home: https://github.com/dreamwidth/dreamwidth Commit: 39a9497745cdab9f36c8d8cd669f7457a3595fd6 https://github.com/dreamwidth/dreamwidth/commit/39a9497745cdab9f36c8d8cd669f7457a3595fd6 Author: Mark Smith mark@dreamwidth.org Date: 2026-04-23 (Thu, 23 Apr 2026)
Changed paths: M cgi-bin/DW/Task/SearchCopier.pm
Log Message:
SearchCopier: stream from cluster DBs, match SphinxCopier logging
importfull was doing selectall_arrayref on both log2+logtext2 and talk2+talktext2 for the journal, which loads every row into perl memory before doing anything. Workers were OOMing on real-world accounts.
Switched both loops to prepare + execute + fetchrow_hashref with mysql_use_result=1 so DBD::mysql actually streams rather than buffering the full result set client-side. Also merged the old "fetch metadata, then fetch text in batches of 1000" comment path into a single talk2 + talktext2 join, since we're streaming now. Working memory is bounded at one row at a time plus the %entry_bits map (jitemid -> bits arrayref) kept around for comment security inheritance.
Also upgraded work()'s logging to match SphinxCopier's verbosity so
it's actually possible to tell who a job is for from the logs:
"Search copier started for [Unknown site tag](
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
To unsubscribe from these emails, change your notification settings at https://github.com/dreamwidth/dreamwidth/settings/notifications
