Conversation
…sages Include HTTP status code, error code and error_description from Microsoft token endpoint response in the OneDriveClientException message so that error details propagate to Datadog logs via runner-sync-api. Previously only a generic 'Authentication failed' message was visible, making it impossible to distinguish between expired refresh tokens, missing admin consent, conditional access policies, etc. Also improve error messages in get_site_id_from_url() and get_document_libraries() to include site_url and response details. Co-Authored-By: Vojta Tuma <vojta.tuma@keboola.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
The _get_client method was catching OneDriveClientException silently and raising a new UserException with a hardcoded generic message, discarding the improved error details from _get_request_tokens(). Now preserves the last OneDriveClientException message and passes it through to the UserException so Microsoft error codes are visible. Co-Authored-By: Vojta Tuma <vojta.tuma@keboola.com>
Single-tenant Enterprise Apps (like CSAS KeboolaOneWriter-EGPROD) reject the /common endpoint with AADSTS50194. Use self.authority which is already correctly set to the tenant-specific URL for SharePoint and OneDrive for Business clients. Co-Authored-By: Vojta Tuma <vojta.tuma@keboola.com>
Co-Authored-By: Vojta Tuma <vojta.tuma@keboola.com>
828d1e7 to
a86713d
Compare
… self.base_url Co-Authored-By: Vojta Tuma <vojta.tuma@keboola.com>
soustruh
left a comment
There was a problem hiding this comment.
LGTM, thanks Vojta! ദ്ദി(•ᴗ•)
Co-Authored-By: Vojta Tuma <vojta.tuma@keboola.com>
Co-Authored-By: Vojta Tuma <vojta.tuma@keboola.com>
a864ec4 to
c6d701f
Compare
d60883b to
79503f6
Compare
79503f6 to
c8fa5c0
Compare
3036016 to
efc2aaf
Compare
matyas-jirat-keboola
left a comment
There was a problem hiding this comment.
Please do the changes, but they are minor, so approved
| tags: | ||
| - '*' # Skip the workflow on the main branch without tags | ||
| - "*" # Skip the workflow on the main branch without tags | ||
| workflow_dispatch: |
There was a problem hiding this comment.
Why workflow_dispatch? Can't we just put there the new CI?
There was a problem hiding this comment.
workflow_dispatch was added per @soustruh's review request — he asked to fix the workflow triggers as an interim step before unifying to the new component-ci base. If you'd prefer to switch to the new CI in this PR instead, I can do that — just let me know which base workflow to use.
Summary
Fixes three distinct OAuth/auth bugs in the OneDrive client that surfaced together on a long-running job: the access token expired mid-run, every subsequent request 401'd, and the retry logic broke in three different ways depending on the call site.
1. Tenant-specific auth URL for token refresh
Token refresh in
_get_request_tokens()used the hardcoded/commonendpoint, which single-tenant Enterprise Apps reject with AADSTS50194. Now usesself.auth_url, set per client type:https://login.microsoftonline.com/commonhttps://login.microsoftonline.com/{tenant_id}(
self.authority→self.auth_urlrename for consistency withself.base_url.)2. Infinite recursion on HTTP 401
When the access token expired mid-run,
get_request()recursed: 401 → refresh token → retry via recursion → 401 → recurse, untilRecursionError. The@backoffdecorator on_download_file_from_onedrive_urlthen retried the whole thing 5 times. Replaced with an iterative 2-attempt loop that refreshes once and surfaces the Microsoft error on the second 401. AddedOneDriveTransientExceptionso@backoffonly retries genuine transient errors (429/5xx), not auth failures.3. Bearer header on pre-signed download URLs
@microsoft.graph.downloadUrlis a pre-signed URL with its owntempauthJWT in the query string — it does not need (and on personal OneDrive'smicrosoftpersonalcontent.comCDN, actively rejects) the Graph API Bearer header. The old code routed downloads throughself.get_request(), which always attaches the Bearer token. Now usesself.get_raw(url, is_absolute_path=True, stream=True, ignore_auth=True)— keeps the HttpClient retry session, drops the Bearer header. Business/SharePoint accounts tolerated the extra header which is why telemetry only showed personal accounts failing.Other improvements
infotodebugto cut noise.Found N items (X files, Y subfolders[, Z unknown]) in '/path'— previously zero visibility into traversal scale.urlparse(url).path) sotempauthJWTs can never leak even if a future caller routes a pre-signed URL throughget_request().Testing
1.0.9-fix-tenant-endpointfor the tenant-URL fix, re-tested successfully with1777018079-improve-error-logging-26covering all three fixes.Review checklist
/commonendpoint) still worksinfolevel is readable: client init, folder scan, item counts, download maskLink to Devin session: https://app.devin.ai/sessions/8b70f85ef2e0408c96eb91f970f5e217
Requested by: @yustme