Background job slowdown
Incident Report for Clockwork Recruiting Status Page
Resolved
Incident Details:
Background job slowdown, hence delay in notification, indexing, reports and exports processing.

Root Cause Analysis:
There was a data export request by a firm to export all their people data (around 200K). For every such request we create a background job, with the list of people ids to be exported.

Furthermore, due to the large size of the job (200k * 40 =~ 8MB), this also caused a slow down in the fetching of other records from the background jobs table (Database sort buffer getting filled up). This resulted in other job workers, i.e., ones for indexing, notifications etc. to slow down as well.

Immediate Resolution:
- Existing problematic export jobs were stashed, extra workers spawned to clean up the backlog.
- A limit of 50K records was added for exporting of data.
- Excel export and csv job separated into two so that things can be simplified.
- Checks added to prevent creation of unnecessary jobs
- Index added on background jobs table for faster access within a queue, to limit the impact of such huge jobs on other background processing.
Posted Mar 22, 2023 - 18:00 UTC