Overview
To improve the resilience and reliability of your monitoring workflows, Confluent is updating the error-handling behavior of the Confluent Cloud Metrics API /export endpoint. This change, driven by customer feedback, will prevent entire metric queries from failing due to a single inaccessible resource, thereby ensuring the continuous delivery of critical monitoring data.
Why we are making this change
We heard from many customers that monitoring and alerting can be brittle when a single resource in a large Metrics query is decommissioned or becomes inaccessible. A failed query often leads to missing metrics data and 'no data' alerts firing, causing visibility gaps and poor user experience. This update directly addresses that feedback by ensuring that customers always receive metrics for accessible resources.
What is changing?
We are moving from an "all-or-nothing" failure model to a "partial success" model.
- Before: Currently, if a request to the
/exportendpoint includes an inaccessible (e.g., deleted or unauthorized) resource, the entire request fails with a403HTTP error code. - After: After this change, the
/exportendpoint will return a200 OKsuccess status. The response body, in Prometheus format, will include metrics for all accessible resources. For any inaccessible resources, an error status metric will be included in the response body, allowing you to identify individual resource failures without losing visibility into the rest of your fleet.
Here is a sample response showing the error status metric:
# HELP confluent_scrape_resource_access_error Error status of resources queried. A value of 1 signifies that the resource was not found, whereas a value of 0 signifies that the resource was successfully found and the user principal was authorized to access it.
confluent_scrape_resource_access_error{resource_type="resource.kafka.id",resource_id="lcc-12345"} 1.0
confluent_scrape_resource_access_error{resource_type="resource.connect.id",resource_id="lcc-12345"} 0.0
How will this affect me?
This update will make your monitoring dashboards and alerts more robust. You will no longer experience data gaps when a single resource in a multi-resource query becomes inaccessible.
While this change is beneficial, it may require updates to your integrations if you have custom automation. Client applications that rely exclusively on 403 HTTP status codes to detect failures will need to be modified, as the endpoint will now return a 200 OK as long as the query has at least one accessible resource.
Recommended Actions
To ensure a smooth transition, we recommend reviewing your applications, scripts, and monitoring tools for their use of the Metrics API /export endpoint. Please examine their error-handling logic. If these tools currently rely solely on a 4xx HTTP status code to identify failures, we advise updating them. It's important to modify these error checks to parse the Prometheus response body and look for error statuses associated with individual resources. This adjustment will enable your tools to identify situations where specific resources are inaccessible.
Timeline
This change will take effect on April 16, 2026. If you have any questions or concerns, please contact our support team.
Limited Availability for Interested Customers
If you are interested in enabling this change sooner than April 16, 2026, please reach out to our support team. Our team can assist you with activation of the updated Metrics API /export endpoint behavior.