Redis Feature Flag Error
Incident Report for Clockwork Recruiting Status Page
Postmortem

On the morning of October 20 Clockwork views and application feature settings reverted to a non setting state for approximately 2 hours. This is the first time we’ve had any feature flag reversion. We immediately discovered the issue and remediated it as quickly as possible. We understand this was a significant inconvenience that we take very seriously. Below we have shared our postmortem and our actions to prevent this from occurring in the future.

  • Incident Details

All the feature flags stored in Redis were wiped away causing users to see old Clockwork views and hence users were not able to find certain functionalities.

  • Investigation Summary

The team reviewed the code and the Redis Server configuration. Identified that the same Redis instance is shared between Demo and Production environments.

The team worked on restoring the data from backup Redis servers.

  • Data Exposure Summary

No, data was exposed. UI settings data stored in Redis was deleted, but has been recovered from the backup.

  • Remediation Summary 

Review Feature Flagging System, use separate Redis namespace for Demo. 

Upgrade to Redis 6.x and use of better security features and ACL, current version is 5.0.

Setup Redis cluster with multi-AZ replication.

Posted Oct 21, 2022 - 17:42 UTC

Resolved
Feature flags have been restored. All page views are rendered as expected.
Posted Oct 20, 2022 - 17:05 UTC
Identified
Issue has been identified. Redis feature flags were removed; Restoring from backup. Restoration should be complete within no more than 30 minutes
Posted Oct 20, 2022 - 16:56 UTC
Investigating
We’re currently experiencing a service disruption.
Our dev ops team is working to identify the root cause and implement a solution.
Users may be experiencing errors in page views and feature flags.
Posted Oct 20, 2022 - 16:09 UTC
This incident affected: Clockwork Production Application.