As part of the scheduled update of our In-memory database, we observed the following issues:
- Ingestion service downtime: Between 8:50 AM and 9:05 AM (UTC-5), the ingestion service was unavailable. This occurred because authentication tokens—stored in the In-memory database—were inaccessible during the update, resulting in failed authentication requests with
401 errors.
- Events engine backlog: Between 7:30 AM and 8:00 AM (UTC-5), the events engine, which also relies on the In-memory database, experienced a backlog of queued events. As a consequence, certain events may have been triggered falsely (false positives).
To avoid similar issues in future updates of the In-memory database:
- For the events engine, all workers and event streams must be stopped prior to updating the instances it depends on. This procedure will prevent the buildup of invalid executions and reduce the risk of false-positive triggers.
These measures will be integrated into our maintenance playbook and strictly followed in future scheduled updates to ensure service continuity.