We experienced issues from 20th, March 10:11:47 UTC to 22nd, March, 17:46:10 UTC with our MQTT broker that prevented devices subscribed to any topic to receive new incoming data. Retained values worked properly and were updated, the issue was related just to devices that had already an opened socket to be subscribed to an MQTT topic.
Major, all the topics that should be updated by subscribed values did not receive new incoming data.
We experienced issues with one of our servers in charge of managing the MQTT subscribe routines. Unfortunately, the server’s RAM got filled and thus our REDIS db could not manage in a proper way the updates to the different topics in the broker. This sort of issues usually are fixed with a reboot of the REDIS cluster, but this time, even after a reset, the service could not be available again.
The Server’s RAM got filled and thus the REDIS db experienced an outage.
A new server to manage just MQTT subscription routines was deployed
Detected by bug reports coming from Ubidots users
|Deploy a new server for MQTT subscribe update routines||mitigate||gustavo firstname.lastname@example.org||DONE|
|Update MQTT subscribe check: To monitor not just the retained values but the updates coming from susbcribed topics||prevent||Jose email@example.com||SD-3792, IN PROGRESS|
|Specify architecture for REDIS memory checks to be deployed||prevent||Gustavo firstname.lastname@example.org||DEV-1671, IN PROGRESS|
|To migrate our actual REDIS DB to REDIS cluster||prevent||Juan email@example.com||DEV-1672, IN PROGRESS|
|To create new checks to monitor the ingestion service and the IoT protocols service uptime independently||prevent||Jose firstname.lastname@example.org||DEV-1673, IN PROGRESS|
We really do not see anything that went well, we experienced a very extended, in time, issue.
Our automated checks did not monitor the subscribe service in its entire, just retained values
We were not monitoring the available memory for some of our REDIS databases