Skip to content

Fix truncating file handle#38425

Open
shunping wants to merge 4 commits intoapache:masterfrom
shunping:fix-truncating-file-handle
Open

Fix truncating file handle#38425
shunping wants to merge 4 commits intoapache:masterfrom
shunping:fix-truncating-file-handle

Conversation

@shunping
Copy link
Copy Markdown
Collaborator

@shunping shunping commented May 9, 2026

A few tests in the Python Precommit suite, including YamlTestingTest::test_create and IOTest::test_read_fwf, InlinePythonTest::test_line_69, have been reporting ValueError and AttributeError exceptions during execution (https://github.com/apache/beam/actions/runs/25576871651/job/75086176283).

The traceback is shown below:

  Traceback (most recent call last):
    File "/Users/runner/work/beam/beam/sdks/python/apache_beam/dataframe/io.py", line 587, in flush
      self._underlying.flush()
      ~~~~~~~~~~~~~~~~~~~~~~^^
  ValueError: I/O operation on closed file.
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/Users/runner/work/beam/beam/sdks/python/target/.tox/py313-macos/lib/python3.13/site-packages/apache_beam/runners/worker/bundle_processor.py", line 235, in process_encoded
      self.output(decoded_value)
      ~~~~~~~~~~~^^^^^^^^^^^^^^^
  AttributeError: '_TruncatingFileHandle' object has no attribute 'close'. Did you mean: 'closed'?

The issue occurs when flush() is called on a _TruncatingFileHandle after the underlying file has already been closed, leading to a ValueError. This exception then triggers a secondary AttributeError because _TruncatingFileHandle lacked a close() method

In this PR, we fixed the handling of close and flush operations within _TruncatingFileHandle.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses stability issues in the Apache Beam Python SDK's _TruncatingFileHandle class. By ensuring that file operations are only performed if the underlying stream is open and adding necessary error handling, the changes prevent intermittent crashes during I/O operations that were previously causing failures in the Precommit test suite.

Highlights

  • Improved File Handle State Management: Updated the closed property in _TruncatingFileHandle to correctly reflect the state of the underlying file object.
  • Robust Flush and Close Operations: Added defensive checks to flush() and implemented a close() method in _TruncatingFileHandle to prevent ValueError and AttributeError exceptions when interacting with already closed files.
  • Regression Testing: Added a new test case test_truncating_filehandle_flush_on_closed_stream to verify that flushing a closed stream no longer triggers exceptions.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added the python label May 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the _TruncatingFileHandle class to dynamically report its closed status based on the underlying stream and adds safer implementations for flush and close to handle closed streams gracefully. A new test case was also added to verify flush behavior. The review feedback suggests narrowing the exception handling in the close method to avoid suppressing unrelated errors and extending the unit test to explicitly cover the newly added close functionality.

Comment thread sdks/python/apache_beam/dataframe/io.py
Comment thread sdks/python/apache_beam/dataframe/io_test.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.75%. Comparing base (81828fd) to head (7e4ee0a).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
sdks/python/apache_beam/dataframe/io.py 50.00% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #38425      +/-   ##
============================================
+ Coverage     55.74%   55.75%   +0.01%     
  Complexity     2095     2095              
============================================
  Files          1099     1099              
  Lines        172219   172287      +68     
  Branches       1350     1350              
============================================
+ Hits          95995    96065      +70     
+ Misses        73829    73827       -2     
  Partials       2395     2395              
Flag Coverage Δ
python 79.83% <50.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant