Mobile SDK endpoints and mobile push impaired in the European and Australian hosting regions

Write-up

Summary

On June 11, 2026, between approximately 09:34 UTC and 10:18 UTC, customers hosted in our European and Australian regions experienced failures in mobile push notification delivery and in API operations that merge user records. The incident was caused by a database table, the one that stores mobile device push tokens, being unintentionally deleted in both regions during the final phase of a planned database migration.

No data was lost. For end user devices with push notifications enabled, logins to the Mobile Messenger will have failed. The Messenger relays this failure back to the host app via the mobile SDK’s login method callbacks. For those who were already logged into the Messenger when the incident began, they would have had a degraded experience including being unable to send messages.

A much smaller cohort of users also impacted were those on web and mobile who were undergoing a user merge. This happens when a previously unidentified visitor is identified as an existing user. The device tokens table is queried during this operation and failed during this time. These users were left in a recoverable pre-merge state and once the incident was resolved their experience would have returned to normal.

Messages sent from the Inbox during the incident were unaffected. However, mobile push notifications alerting users to those messages were not sent while the table was unavailable.

Recovery happened in two phases. In Australia, the migration that removed the table was still within its reversal window, and we reverted it. Error rates in the AU region returned to normal by 10:06 UTC. In Europe, the migration was past the point where it could be reverted. However the table is not dropped immediately, it's renamed and preserved for a period of time, which made it possible to restore with manual intervention from the PlanetScale team. We renamed the table back in place, and error rates in the EU region returned to normal by 10:18 UTC, with no data loss from the time of the deletion.

An equivalent migration was queued in the US region but we intervened before it was deployed. As a result, no US customer was impacted.

We understand that Fin is core to how you support your own customers. We sincerely apologize for the disruption this caused.

Root cause

Intercom uses PlanetScale (built on Vitess) as our high-scale database layer. As part of ongoing work to rebalance load across our database infrastructure, we were moving the table that stores device tokens from one database to another. Application traffic had already been switched to the new location the previous day, and on the morning of June 11 we completed the final steps of the move in the Australian and European regions.

Schema changes at Fin are applied through PlanetScale's deploy request workflow. A deploy request works by comparing the proposed schema against a snapshot of the current production schema and generating the set of changes needed to reconcile them. On this morning, a routine schema change to an unrelated table was deployed through this workflow. The snapshot it compared against was stale: it predated the creation of the device tokens table in its new database, because that table had been created manually on the platform's backend as part of the move, and a manual backend operation does not refresh the snapshot that deploy requests rely on.

From the deploy request's point of view, the device tokens table looked like it should have been removed. The generated change set therefore included a command to drop it, alongside the intended routine change. When the deploy request was applied in the Australian and European regions, the table serving live production traffic was deleted.

Timeline (UTC)

09:34 - A routine schema migration was applied in the EU region
09:35 - Automated availability checks detect elevated errors on the EU API. An incident is declared and engineering is paged
09:36 - The same migration was applied in the AU region
09:39 - Root cause identified to be an unrelated in-progress table move. Impact is confirmed in the AU region also shortly after
09:46 - An urgent support case is opened with PlanetScale and Incident Commander engaged
09:51 - Status page posted for the EU and AU regions
09:58 - PlanetScale restores routing rules, recovering write traffic while reads are still failing
10:06 - Recovery in the AU region. The migration is reverted in AU, using normal planetscale processes, restoring the table there
10:12 - Planetscale manually restored the EU table , opening a recovery path with no data loss
10:18 - Recovery in the EU region. The EU table is renamed back into place and restored
10:33 - Customer impact confirmed to be resolved. The status page is closed
10:44 - The US deploy request is confirmed and cancelled, removing any risk to the US region. The incident is resolved

Next steps

Completed remediation

The device tokens table fully restored in both affected regions, with no data loss.
The equivalent deploy request in the US region was cancelled before it could execute, and we verified that both Planetscale and our own tooling reflect the cancelled state.
Schema snapshots have been manually refreshed in the European, Australian, and US databases, so deploy requests in those regions now compute their changes against an accurate view of production.
We updated our table move runbook so that deploy requests are locked for the duration, and so that schema snapshots are verified as refreshed in every region after any backend schema operation.
On the 12th of June, the migration was rerun and successfully completed.

Ongoing improvements

We are introducing a guardrail to prevent any ‘drop table’ migration requests without four-eye approval from a member of our datastores team.
We are working with PlanetScale to understand why the snapshot was not refreshed as part of the backend table creation, whether future moves should create tables through the deploy request workflow instead, and how long dropped tables remain recoverable. Their findings will feed back into our procedures.

We remain committed to the reliability you expect from Fin. The root cause of this incident is well-understood, the affected table has been fully restored with no data loss, and we are changing both our own procedures and, together with PlanetScale, the platform safeguards so that a stale view of our schema can never again translate into a destructive change in production.