github: shadowy octopus with the head of a robot, emblazoned with the Dreamwidth swirl (Default)
github ([personal profile] github) wrote in [site community profile] changelog2026-04-25 12:18 pm

[dreamwidth/dreamwidth] 94382a: search-tool/SearchCopier: depth-based dispatch, sk...

Branch: refs/heads/main Home: https://github.com/dreamwidth/dreamwidth Commit: 94382a85a692de7e4c9184cc808301b916ab406d https://github.com/dreamwidth/dreamwidth/commit/94382a85a692de7e4c9184cc808301b916ab406d Author: Mark Smith mark@dreamwidth.org Date: 2026-04-25 (Sat, 25 Apr 2026)

Changed paths: M bin/search-tool M cgi-bin/DW/Task/SearchCopier.pm

Log Message:


search-tool/SearchCopier: depth-based dispatch, skip controls, timestamped logs

bin/search-tool import-all: - Pace dispatches against SQS queue depth instead of a fixed inter-job sleep. Default cap is --max-depth 100; when local-tracked depth hits the cap, the dispatcher waits for SQS-reported drain. Reconciliation with the real ApproximateNumberOfMessages happens every 50 dispatches and is logged when drift exceeds 20. - Keep a small per-dispatch sleep (50ms) as a cushion against ApproximateNumberOfMessages staleness — the real depth can lag behind the local counter, and without a sleep we can overshoot the cap in tight loops. - Fix queuedepth: a queue with 0 visible messages was being treated as "no depth signal" because '0' is falsy; use defined-check. - Show MAX(userid) up front for ETA scope (PK-index lookup, vs COUNT(*) which would scan). - Timestamp every log line "[YYYY-MM-DD HH:MM:SS] ...". Progress lines emit current count, userid position, jobs/min, ETA duration, local depth. Wait/resume transitions log the wait duration. Final line reports total elapsed.

cgi-bin/DW/Task/SearchCopier.pm: - [profile] lj::SKIP_SEARCH_IMPORT: list of journalids to short-circuit. Checked at the top of work(), so it applies to all task types — full recopies, chunks, single-item updates. Adding a journalid drains any queued chunks for that journal fast (workers immediately return COMPLETED for them) so the operator can defer specific large journals while the rest of an import-all run progresses. - $LJ::SEARCH_MAX_COMMENT_RECOPY: limit for the comment recopy pass. In copy_comment's mass-dispatch path, if MAX(jtalkid) for the journal exceeds this value, the comment recopy is skipped (entries still process). Unset means no limit.

Both are additive config options; defaults are unset so prior behavior is unchanged until an operator opts in.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

To unsubscribe from these emails, change your notification settings at https://github.com/dreamwidth/dreamwidth/settings/notifications