API Issue

Incident Report for Transifex

Postmortem

API v3 Partial Downtime
Date: 02 April 2026
Service Impacted: api-v3
Impact Scope: Services depending on api-v3 experienced degraded functionality

Summary
On April 2nd, api-v3 experienced partial downtime due to a database performance issue. A database query caused contention within the database, leading to increased resource utilization and degraded responsiveness of api-v3 pods.
As a result, api-v3 instances were intermittently failing health checks and restarting, which further amplified database load and connection pressure.

Impact

Degraded performance and intermittent unavailability of api-v3
Increased error rates for dependent services
Elevated database CPU usage and connection counts

‌

Root Cause
The incident was caused by a database query that partially blocked operations within the database. This led to:

Query contention and locking behavior
Increased execution times for other queries
Accumulation of database connections from api-v3 pods

This combination resulted in reduced system responsiveness and instability in api-v3.

Mitigation & Resolution

Problematic long-running queries were identified and terminated
The affected query was optimized to improve performance and reduce locking behavior

Following these actions, system performance returned to normal and stability was restored.

Follow-up Actions

Optimize database query performance related to the incident
Improve monitoring and alerting for long-running or blocking queries
Evaluate database connection handling and limits in api-v3
Consider implementing specific query timeouts and safeguards to prevent similar issues

‌

Current Status
The issue has been resolved, and the system has remained stable since the fix was applied.

Posted Apr 07, 2026 - 08:29 UTC

Resolved

The issue has been fully resolved, and services are now operating normally.
We apologize for any inconvenience caused and thank you for your patience and understanding.

Posted Apr 02, 2026 - 19:25 UTC

Monitoring

The fix has been successfully applied, and system performance has stabilized.
We are continuing to monitor all systems closely to ensure sustained stability.

Posted Apr 02, 2026 - 18:18 UTC

Identified

The issue has been identified, and a fix has been applied.
Service has been restored, and performance is returning to normal.

Posted Apr 02, 2026 - 17:54 UTC

Update

We are still actively investigating the issue and working to identify the root cause.

We sincerely apologize for the delays and the API errors you are experiencing, and we understand the impact this may have on your workflows.

Our team is fully engaged, and we will share another update as soon as we have more information.

Posted Apr 02, 2026 - 15:16 UTC

Investigating

We are currently investigating the issue

Posted Apr 02, 2026 - 12:50 UTC

This incident affected: Transifex API.