API Issue

Incident Report for Transifex

Postmortem

API v3 Partial Downtime
Date: 02 April 2026
 Service Impacted: api-v3
 Impact Scope: Services depending on api-v3 experienced degraded functionality

Summary
On April 2nd, api-v3 experienced partial downtime due to a database performance issue. A database query caused contention within the database, leading to increased resource utilization and degraded responsiveness of api-v3 pods.
As a result, api-v3 instances were intermittently failing health checks and restarting, which further amplified database load and connection pressure.

Impact

  • Degraded performance and intermittent unavailability of api-v3
  • Increased error rates for dependent services
  • Elevated database CPU usage and connection counts

Root Cause
The incident was caused by a database query that partially blocked operations within the database. This led to:

  • Query contention and locking behavior
  • Increased execution times for other queries
  • Accumulation of database connections from api-v3 pods

This combination resulted in reduced system responsiveness and instability in api-v3.

Mitigation & Resolution

  • Problematic long-running queries were identified and terminated
  • The affected query was optimized to improve performance and reduce locking behavior

Following these actions, system performance returned to normal and stability was restored.

Follow-up Actions

  • Optimize database query performance related to the incident
  • Improve monitoring and alerting for long-running or blocking queries
  • Evaluate database connection handling and limits in api-v3
  • Consider implementing specific query timeouts and safeguards to prevent similar issues

Current Status
The issue has been resolved, and the system has remained stable since the fix was applied.

Posted Apr 07, 2026 - 08:29 UTC

Resolved

The issue has been fully resolved, and services are now operating normally.
We apologize for any inconvenience caused and thank you for your patience and understanding.
Posted Apr 02, 2026 - 19:25 UTC

Monitoring

The fix has been successfully applied, and system performance has stabilized.
We are continuing to monitor all systems closely to ensure sustained stability.
Posted Apr 02, 2026 - 18:18 UTC

Identified

The issue has been identified, and a fix has been applied.
 Service has been restored, and performance is returning to normal.
Posted Apr 02, 2026 - 17:54 UTC

Update

We are still actively investigating the issue and working to identify the root cause.

We sincerely apologize for the delays and the API errors you are experiencing, and we understand the impact this may have on your workflows.

Our team is fully engaged, and we will share another update as soon as we have more information.
Posted Apr 02, 2026 - 15:16 UTC

Investigating

We are currently investigating the issue
Posted Apr 02, 2026 - 12:50 UTC
This incident affected: Transifex API.