Description
This Article provides details on the below error encountered using a fully managed MongoDB Atlas Source connector hosted on Confluent Cloud:
The connector failed to resume change stream, as the resume point may no longer be in the oplog.
Please increase the size of your oplog and recreate the connector.
Applies To
Mongo DB Atlas Source Connector
Confluent Cloud
Cause
The Error above occurs when the connector's resume token does not correspond to any entry in the MongoDB oplog, the connector has no way to determine where to begin to process from the MongoDB change stream. This can generally happen due to the following reasons -->
- When the connector is inactive or paused for longer than the oplog retention period resulting in the resume token being purged.
- If there have been infrequent or no updates after an initial to your MongoDB namespace for a longer time, causing the resume token to be purged from the oplog.
- If the Oplog size on the DB itself has reached its maximum configured size causing the resume token in question to get purged.
Resolution
To recover from this error you can find below a couple of options for recovery
-
You may configure your connector to tolerate errors while you produce a change stream event that updates the connector's resume token by setting
mongo.errors.tolerance
option to tolerate `all` errors and then produce a change stream event that updates your connector's resume token. Once this is done, you may revert this config value to `none`. The caveat here is that there is a risk that your connector briefly ignores errors unrelated to the invalid resume token. -
Another option here is that you may set
offset.partition.name
(as documented here) to a partition name that does not exist on your cluster. This would reset your Kafka Connect offset data, which contains your resume token, to allow your connector to resume processing your change stream.
Once the connector is recovered and back to running, to prevent the above exception from occurring in the future, you may consider the following options
- Increasing oplog retention should help incase of resume token purged due to connector being inactive longer than the oplog retention period
- For the case of infrequent namespace updates, You may adjust heartbeat.interval.ms and heartbeat.topic.name. Heartbeat messages contain the oplog resume token to a dedicated topic, and is intended to be used for low throughput namespaces