MQTT Subscribe
Incident Report for Ubidots
Postmortem

Token Authentication issues

Date

2021-03-29

Authors

jose garcia

Status

Complete

Summary

We experienced issues from 20th, March 10:11:47 UTC to 22nd, March, 17:46:10 UTC with our MQTT broker that prevented devices subscribed to any topic to receive new incoming data. Retained values worked properly and were updated, the issue was related just to devices that had already an opened socket to be subscribed to an MQTT topic.

Impact

Major, all the topics that should be updated by subscribed values did not receive new incoming data.

Root Causes

We experienced issues with one of our servers in charge of managing the MQTT subscribe routines. Unfortunately, the server’s RAM got filled and thus our REDIS db could not manage in a proper way the updates to the different topics in the broker. This sort of issues usually are fixed with a reboot of the REDIS cluster, but this time, even after a reset, the service could not be available again.

Trigger

The Server’s RAM got filled and thus the REDIS db experienced an outage.

Resolution

A new server to manage just MQTT subscription routines was deployed

Detection

Detected by bug reports coming from Ubidots users

Action Items

Action Item Type Owner Bug
Deploy a new server for MQTT subscribe update routines mitigate gustavo woakas@ubidots.com DONE
Update MQTT subscribe check: To monitor not just the retained values but the updates coming from susbcribed topics prevent Jose jose.garcia@ubidots.com SD-3792, IN PROGRESS
Specify architecture for REDIS memory checks to be deployed prevent Gustavo woakas@ubidots.com DEV-1671, IN PROGRESS
To migrate our actual REDIS DB to REDIS cluster prevent Juan juan.agudelo@ubidots.com DEV-1672, IN PROGRESS
To create new checks to monitor the ingestion service and the IoT protocols service uptime independently prevent Jose jose.garcia@ubidots.com DEV-1673, IN PROGRESS

Lessons Learned

What went well

We really do not see anything that went well, we experienced a very extended, in time, issue.

What went wrong

  • Our automated checks did not monitor the subscribe service in its entire, just retained values

  • We were not monitoring the available memory for some of our REDIS databases

Supporting Information

Support: support@ubidots.com

Posted Mar 29, 2021 - 10:30 UTC

Resolved
We experienced issues from 20th, March 10:11:47 UTC to 22nd, March, 17:46:10 UTC with our MQTT broker that prevented devices subscribed to any topic to receive new incoming data. Retained values worked properly and were updated, the issue was related just to devices that had already an opened socket to be subscribed to an MQTT topic.
Posted Mar 24, 2021 - 12:23 UTC