[dreamwidth/dreamwidth] 94382a: search-tool/SearchCopier: depth-based dispatch, sk...
Branch: refs/heads/main Home: https://github.com/dreamwidth/dreamwidth Commit: 94382a85a692de7e4c9184cc808301b916ab406d https://github.com/dreamwidth/dreamwidth/commit/94382a85a692de7e4c9184cc808301b916ab406d Author: Mark Smith mark@dreamwidth.org Date: 2026-04-25 (Sat, 25 Apr 2026)
Changed paths: M bin/search-tool M cgi-bin/DW/Task/SearchCopier.pm
Log Message:
search-tool/SearchCopier: depth-based dispatch, skip controls, timestamped logs
bin/search-tool import-all: - Pace dispatches against SQS queue depth instead of a fixed inter-job sleep. Default cap is --max-depth 100; when local-tracked depth hits the cap, the dispatcher waits for SQS-reported drain. Reconciliation with the real ApproximateNumberOfMessages happens every 50 dispatches and is logged when drift exceeds 20. - Keep a small per-dispatch sleep (50ms) as a cushion against ApproximateNumberOfMessages staleness — the real depth can lag behind the local counter, and without a sleep we can overshoot the cap in tight loops. - Fix queuedepth: a queue with 0 visible messages was being treated as "no depth signal" because '0' is falsy; use defined-check. - Show MAX(userid) up front for ETA scope (PK-index lookup, vs COUNT(*) which would scan). - Timestamp every log line "[YYYY-MM-DD HH:MM:SS] ...". Progress lines emit current count, userid position, jobs/min, ETA duration, local depth. Wait/resume transitions log the wait duration. Final line reports total elapsed.
cgi-bin/DW/Task/SearchCopier.pm:
-
lj::SKIP_SEARCH_IMPORT: list of journalids to short-circuit. Checked
at the top of work(), so it applies to all task types — full
recopies, chunks, single-item updates. Adding a journalid drains
any queued chunks for that journal fast (workers immediately
return COMPLETED for them) so the operator can defer specific
large journals while the rest of an import-all run progresses.
- $LJ::SEARCH_MAX_COMMENT_RECOPY: limit for the comment recopy pass.
In copy_comment's mass-dispatch path, if MAX(jtalkid) for the
journal exceeds this value, the comment recopy is skipped (entries
still process). Unset means no limit.
Both are additive config options; defaults are unset so prior behavior is unchanged until an operator opts in.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
To unsubscribe from these emails, change your notification settings at https://github.com/dreamwidth/dreamwidth/settings/notifications
