We experienced issues with our ingestion engine exposed through HTTP during 02:28-02:31 - 02:55-02:57, and 03:13 - 03:17 UTC, on February 2nd. Access to both owners' and end-users web apps was also affected by the issue, avoiding to visualize data stored at Ubidots.
During the issue time windows, our HTTP servers responded 50x standard errors, so users should have been able to capture the error and to set values read in their devices to be sent later.
Critical, all the requests from external devices and scripts that used our REST API to ingest data were rejected.
We couldn’t find the root cause of the issue with the data base, so we have decided to solve the issue by creating another replication instances.
Sigkill and reindex of our Postgress DB instance
Detected by the automated internal service health checker.
|DB reindex||mitigate||gustavo email@example.com||DONE|
|Create Backup DB instances||Prevent||gustavo firstname.lastname@example.org||Done|
The automated health checker alerted to the DevOps team once the issue was presented.
Posted 6 minutes ago. May 21, 2019 - 16:00 UTC
This incident has been resolved.
Posted 1 day ago. May 20, 2019 - 09:53 UTC
A fix has been implemented and we are monitoring the results.
Posted 1 day ago. May 20, 2019 - 09:50 UTC
Events engine, MQTT, HTTP, TCP/UDP and login apps services affected
Posted 1 day ago. May 20, 2019 - 06:58 UTC