MQTT socket connection service was degraded during 13 minutes during the morning of the 13th of April, 2019 UTC
Medium, some socket connection from external devices were not possible.
The amount of socket connections for MQTT has been increased in the last months due to the number of new users with an active account at Ubidots. The DevOps team decided to increase the available socket connections per server to avoid future issues and to support this user increment, and to deploy this change on the 13th of April.
Once the change was deployed, a huge amount of logs began to be stored in the hard disk, making that the access to certain spaces of the disk slightly delayed, which was expected, but the RAM usage was suddenly triggered to non-expected rates. This made that the server began to use the swap memory, which belongs to the hard drive that was in fact already slow due to the new logs storing, from this slowness was derived a socket timeout raise exception and some of the new socket connections were rejected.
Latent bug triggered by sudden increase of RAM usage.
The stored RAM caché was erased, and the server began to use the non-volatile memory instead of the hard disk Swap.
Detected by the automated internal service health checker.
|Free RAM caché to avoid to use SWAP memory||mitigate||gustavo firstname.lastname@example.org||DONE|
|Script to free RAM caché every 3 hours||prevent||gustavo email@example.com||DONE|
The automated health checker alerted to the DevOps team once the issue was presented.
RAM caché overflow was a non-expected state of the normal operation workflow. This is something that must be handled for future updates.