Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces tracking for retry attempts within OpenTelemetry spans for BigQuery operations. It utilizes a ContextKey to store an AtomicInteger in the OpenTelemetry context, allowing HttpTracingRequestInitializer to increment and record the http.request.resend_count attribute on outgoing request spans. The PR also includes visibility adjustments for testing and adds new integration and unit tests to verify the telemetry data. One piece of feedback suggests removing a redundant null check in BigQueryRetryHelper to improve code clarity.
...query/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/BigQueryRetryHelper.java
Outdated
Show resolved
Hide resolved
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
| if (attemptTracker != null) { | ||
| int attempt = attemptTracker.getAndIncrement(); | ||
| if (attempt > 0) { | ||
| span.setAttribute(HTTP_REQUEST_RESEND_COUNT, (long) attempt); |
There was a problem hiding this comment.
This logic looks good to me. However, is there a way to implement it without using OpenTelemetry Context so it can be reused for other things?
For example, metrics may also need this attribute in the future. We can call Context.current() as well if traces are enabled, but it is also possible that customers only enable metrics but not traces.
There was a problem hiding this comment.
I have 3 options for how to implement this:
-
Add a retry counter object per API request. This would require modifying all API calls, as currently retries are handled via static method call toBigQueryRetryHelper.runWithRetries(. This is not scalable and I would prefer not to implement this solution.
-
The solution proposed in this PR: use the OpenTelemetry Context (which uses its own ThreadLocal object for storage)
-
create our own ThreadLocal object in BigQueryRetryHelper that stores this value that can be also accessed by metrics.
Let me know if you are okay with implementing choice no. 3
cc @lqiu96 , in case you have some input.
There was a problem hiding this comment.
I'm not too familiar with Otel's Context so I wouldn't know when/ when not to use it. If there are concerns with it, then we don't have to go this route.
Option 3 isn't my favorite (using static methods and thread locals to track retries), but to Blake's point, it can be re-used for both metrics and traces (assuming that Context is a trace specific thing).
Since metrics are in scope, a route forward can be to use Option 3 and then migrate to Option 1 in the future. WDYT?
There was a problem hiding this comment.
Ok I implemented option 3, I think we should consider thoughtfully before moving to option 1. It adds repetitive code to each client call and also adds another configuration point for metrics which can easily be missed when implementing new api calls.
There was a problem hiding this comment.
Reverted back to option 2 as we are not comfortable with potential leakage using ThreadLocal. Looking for @westarle if he thinks its worth it to move forward with this implementation, or if we should skip for this initial launch?
There was a problem hiding this comment.
Discussed with @westarle that there are no current plans to add this attribute to metrics. This implementation looks good to me.
…urned off # Conflicts: # java-bigquery/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/BigQueryRetryHelper.java
…pan is turned off" This reverts commit a7fded2.
🤖 I have created a release *beep* *boop* --- <details><summary>1.83.0</summary> ## [1.83.0](v1.82.0...v1.83.0) (2026-04-07) ### Features * [aiplatform] [Memorystore for Redis Cluster] Add support for ([0bd7666](0bd7666)) * [aiplatform] Add container_spec to Reasoning Engine public protos ([0bd7666](0bd7666)) * [aiplatform] Add container_spec to Reasoning Engine public protos ([0bd7666](0bd7666)) * [aiplatform] Add container_spec to Reasoning Engine public protos ([3ba3854](3ba3854)) * [aiplatform] Add container_spec to Reasoning Engine public protos ([3ba3854](3ba3854)) * [aiplatform] add evaluation metrics and autorater configuration to ([0bd7666](0bd7666)) * [backupdr] Adding new workload specific fields for AlloyDB ([6344cb0](6344cb0)) * [ces] update public libraries for CES v1 ([6344cb0](6344cb0)) * [ces] update public libraries for CES v1beta ([0bd7666](0bd7666)) * [ces] update public libraries for CES v1beta ([0bd7666](0bd7666)) * [chat] Addition of Section and SectionItem APIs ([0bd7666](0bd7666)) * [chat] Support app authentication with admin-consent scopes for ([0bd7666](0bd7666)) * [databasecenter] A new value `SUB_RESOURCE_TYPE_READ_POOL` is ([6344cb0](6344cb0)) * [dataflow] Add Pausing/Yaml capabilities to public protos ([3ba3854](3ba3854)) * [dataflow] add sha256 field to Package proto ([0bd7666](0bd7666)) * [dataflow] add sha256 field to Package proto ([3ba3854](3ba3854)) * [dataform] add folders and teamFolders related changes to v1 ([6344cb0](6344cb0)) * [datalineage] add configmanagement v1 module ([#12355](#12355)) ([2def625](2def625)) * [datamanager] add INVALID_MERCHANT_ID to the ErrorReason enum for ([6344cb0](6344cb0)) * [dialogflow-cx] updated v3 dialogflow client libraries with ([6344cb0](6344cb0)) * [dialogflow] updated v2 dialogflow client libraries ([6344cb0](6344cb0)) * [dialogflow] updated v2beta1 dialogflow client libraries ([6344cb0](6344cb0)) * [dlp] added support for detecting key-value pairs in client ([e5e22ed](e5e22ed)) * [document-ai] Added a fields for image and table annotation output ([0bd7666](0bd7666)) * [geocode] new module for geocode ([#12343](#12343)) ([474efb1](474efb1)) * [netapp] Add ONTAP passthrough APIs ([6344cb0](6344cb0)) * [network-security] Publish proto definitions for AuthzPolicy, ([6344cb0](6344cb0)) * [redis-cluster] [Memorystore for Redis Cluster] Add support for ([0bd7666](0bd7666)) * [redis-cluster] [Memorystore for Redis Cluster] Add support for ([3ba3854](3ba3854)) * [redis-cluster] [Memorystore for Redis Cluster] Add support for ([3ba3854](3ba3854)) * [securesourcemanager] Add CustomHostConfig to configure custom ([6344cb0](6344cb0)) * [storage] populate the `persisted_data_checksums` field with ([e5e22ed](e5e22ed)) * [texttospeech] Support safety settings for Gemini voices and ([0bd7666](0bd7666)) * [texttospeech] Support safety settings for Gemini voices and ([0bd7666](0bd7666)) * [texttospeech] Support safety settings for Gemini voices and ([0bd7666](0bd7666)) * [texttospeech] Support safety settings for Gemini voices and ([0bd7666](0bd7666)) * [translate] A new field `mime_type` is added to message ([e5e22ed](e5e22ed)) * [valkey] [Memorystore for Valkey] Add support for Flexible CA ([0bd7666](0bd7666)) * [valkey] [Memorystore for Valkey] Add support for Flexible CA ([0bd7666](0bd7666)) * [valkey] [Memorystore for Valkey] Add support for Flexible CA ([3ba3854](3ba3854)) * Add getProjectId getter for ComputeEngineCredentials ([#1833](#1833)) ([0a7895a](0a7895a)) * **bigguery:** add url.domain to span tracing ([#12208](#12208)) ([6f79c2d](6f79c2d)) * **bigquery observability:** add version attribute to span tracing ([#12132](#12132)) ([95c3eb8](95c3eb8)) * **bigquery:** add gcp.resource.destination.id for span tracing ([#12134](#12134)) ([5f31ded](5f31ded)) * **bigquery:** add opentelemetry W3C Trace Context to headers ([#12203](#12203)) ([965761a](965761a)) * **bigquery:** add resend attribute to span tracing + integration tests ([#12313](#12313)) ([167722d](167722d)) * **bigquery:** add url.full attribute to span tracing ([#12176](#12176)) ([7fdf9ff](7fdf9ff)) * **bigquery:** add url.template to span tracing ([#12181](#12181)) ([30f8afb](30f8afb)) * **bigquery:** added error attributes to span tracing ([#12115](#12115)) ([863d23b](863d23b)) * Extract resource name from unary requests for tracing ([#4159](#4159)) ([23b16b7](23b16b7)) * **gapic-generator-java:** Extract resource name heuristicly ([#12207](#12207)) ([f46480a](f46480a)) * **gax:** Actionable Errors Logging API Tracer ([#12202](#12202)) ([8d23279](8d23279)) * **gax:** Add error attributes to golden signal metrics. ([#12564](#12564)) ([063dfe5](063dfe5)) * **gax:** add utility for logging actionable errors ([#4144](#4144)) ([54fb8a5](54fb8a5)) * **gax:** Implement trace context extraction and injection with integration test ([#12625](#12625)) ([6675310](6675310)) * **observability:** Implement gcp.client.service attribute ([#12315](#12315)) ([e99812f](e99812f)) * **observability:** implement url.domain attribute ([#12316](#12316)) ([0a865bf](0a865bf)) * **sdk-platform-java:** Add CompositeTracer and CompositeTracerFactory. ([#12321](#12321)) ([4b5e8af](4b5e8af)) * Switch Eef metrics to using built in open telemetry ([#4385](#4385)) ([759bb22](759bb22)) ### Bug Fixes * Add error attributes to logging ([#12685](#12685)) ([a9198ee](a9198ee)) * **bq jdbc:** allow & ignore unknown parameters ([#12352](#12352)) ([2332ff1](2332ff1)) * **bq jdbc:** ensure getMoreResults() always moves the cursor ([#12353](#12353)) ([ac1f0f4](ac1f0f4)) * **ci:** consolidate duplicate yaml keys in github actions workflows ([#12306](#12306)) ([f644a19](f644a19)) * Clean up attributes for traces and metrics ([#12677](#12677)) ([914f97f](914f97f)) * fix getLong on NUMERIC ([#2420](#2420)) ([75ec5c2](75ec5c2)) * **gax:** Implement lazy resource name evaluation in ApiTracerContext ([#12618](#12618)) ([5e47749](5e47749)) * Handle null server address ([#12184](#12184)) ([435dd8c](435dd8c)) * **hermetic-build:** do not add release please comments on vertexai poms ([#12559](#12559)) ([5e161a7](5e161a7)) * **o11y:** create noop tracer when artifact ID is not set ([#12307](#12307)) ([630d83d](630d83d)) * **o11y:** do not record error.type in successful runs ([#12620](#12620)) ([28eebf0](28eebf0)) * **o11y:** remove `gpc.client.language` attribute ([#12621](#12621)) ([40d2e43](40d2e43)) * **oauth2:** mask sensitive tokens in HTTP logs ([#1900](#1900)) ([3e4ccb7](3e4ccb7)) * **release:** add Version.java as extra files in release-please ([#12617](#12617)) ([f5101d9](f5101d9)) * **spanner:** enforce READY-only location aware routing and add endpoint lifecycle management ([ecb86fd](ecb86fd)) * **spanner:** enforce READY-only location aware routing and add endpoint lifecycle management ([#12678](#12678)) ([ca9edb9](ca9edb9)) * **spanner:** improve grpc-gcp affinity cleanup and location-aware retries ([a157c2f](a157c2f)) * **spanner:** improve grpc-gcp affinity cleanup and location-aware retries ([#12682](#12682)) ([aca0428](aca0428)) * use dynamic tracer name instead of hardcoded gax-java ([#12190](#12190)) ([dea24db](dea24db)) ### Dependencies * bump jackson version to 2.18.3 ([#12351](#12351)) ([50304c1](50304c1)) * update dependencies.txt for grpc-gcp to 1.9.2 ([#4164](#4164)) ([f336fdc](f336fdc)) * update dependency com.google.apis:google-api-services-storage to v1-rev20260204-2.0.0 ([#1750](#1750)) ([340ecbe](340ecbe)) * update dependency com.google.apis:google-api-services-storage to v1-rev20260204-2.0.0 ([#3519](#3519)) ([1531107](1531107)) * update dependency com.google.cloud:google-cloud-storage to v2.64.1 ([#1752](#1752)) ([8fb6935](8fb6935)) * update dependency com.google.cloud:sdk-platform-java-config to v3.58.0 ([#1751](#1751)) ([9cc3e22](9cc3e22)) * update dependency com.google.cloud:sdk-platform-java-config to v3.58.0 ([#3523](#3523)) ([26d772a](26d772a)) * update dependency node to v24 ([#3509](#3509)) ([f308477](f308477)) * update gcr.io/cloud-devrel-public-resources/storage-testbench docker tag to v0.62.0 ([#3526](#3526)) ([ca29d5e](ca29d5e)) * update googleapis/sdk-platform-java action to v2.68.0 ([#3522](#3522)) ([abae1ac](abae1ac)) ### Reverts * ci: only run default list of graalvm tests if too many modules are touched ([#12292](#12292)) ([92bcdf4](92bcdf4)) ### Documentation * [dataplex] Change Dataplex library from `ALPHA` to `GA` ([6344cb0](6344cb0)) * [run] An existing repeated string field custom_audiences is marked ([015d9a1](015d9a1)) * **hermetic-build:** improve usability of development guide ([#12362](#12362)) ([5944127](5944127)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: chingor13 <chingor@google.com>
This adds the http.request.resend_count to span tracing in the case of a retry. We achieve this by adding a context to the parent span that tracks all retries. This remains at null for the initial retry and then gets incremented for all subsequent retries.
This PR also introduces integration tests to fully validate the resent attribute.
example trace with resent attribute set