Data Ingestion

Incident Report for Ubidots

Resolved

# Report

We experienced issues with our ingestion engine during two different time windows:

## Data ingestion workers: From 11:39 UTC - 15:43 UTC

Even though our REST API responded with a 200 or 201 response code, the action to save Dots and reflect them in our user’s accounts took longer times. This issue affected data visualization at dashboards and variable views levels. Additionally, synthetic variables and and the events engine were also impacted by said delays.
GET requests did not respond with the updated data during this time window.

## REST API response: From 14:08 UTC - 15:42 UTC

During this time window, our REST API responded with 50x error codes through HTTP, and may have lost data over other protocols as follows:

### From 14:08 UTC - 14:27 UTC

We responded to about 70% of the data received through TCP, UDP, or HTTP with a 50x response code. Through MQTT the response was without an ACK message. It is not possible to recover these Dots. Users may see data gaps in their variables.

### From 14:27 UTC - 14:54 UTC

We responded to about 100% of the data received through TCP, UDP, or HTTP with a 50x response code. Through MQTT the response was without an ACK message. It is not possible to recover these Dots. All our users will see data gaps in their variables during this time window.

**Details per protocol:**

* **HTTP, from 14:48 UTC - 15:21 UTC: **About 10% of the data requests got a 50x response code. It is not possible to recover these Dots. Users may see data gaps in their variables.
* **MQTT/TCP/UDP, from 14:27 UTC - 15:21 UTC: **About 100% of the data received through MQTT did not get an ACK message. It is not possible to recover these Dots. All our users will see data gaps in their variables during this time window.

Please, remember that if you got a 50x response code or did not get an ACK message from our server, it means that something went wrong with the request and your request or socket message must be sent again.

Posted Nov 23, 2020 - 20:07 UTC

Monitoring

All our ingestión services are up and running again. We are working on the data recovery during this outage window.

Posted Nov 23, 2020 - 15:51 UTC

Update

We are actually experiencing issues with our data ingestion service, we are working to fix it as soon as possible

Posted Nov 23, 2020 - 15:05 UTC

Update

We are continuing to investigate this issue.

Posted Nov 23, 2020 - 14:05 UTC

Update

We are still looking into the main root of this issue. We are actually working on deploying a data logger to check if the issue is related to our microservice environment.

Posted Nov 23, 2020 - 14:05 UTC

Investigating

We are currently experiencing an issue with data ingestion received through MQTT, TCP, UDP or HTTP. Some dots are not being reflected at the user's database. Any dot has been lost until now.

Posted Nov 23, 2020 - 13:09 UTC

This incident affected: America (HTTP Post, TCP, MQTT Publish, MQTT Subscribe, Events Engine, UDP).