Skip to content

Commit ae8ea4d

Browse files
jorisvandenbosschepitrou
authored andcommitted
GH-38618: [C++] S3FileSystem: fix regression in deleting explicitly created sub-directories (#38845)
### Rationale for this change See #38618 (comment) and below for the analysis. When deleting the dir contents, we use a GetFileInfo with recursive FileSelector to list all objects to delete, but when doing that the file paths for directories don't end in a trailing `/`, so for deleting explicitly created directories we need to add the `kSep` here as well to properly delete the object. ### Are these changes tested? I tested them manually with an actual S3 bucket. The problem is that MinIO doesn't have the same problem, and so it's not actually tested with the test I added using our MinIO testing setup. ### Are there any user-facing changes? Fixes the regression * Closes: #38618 Lead-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
1 parent 44238dd commit ae8ea4d

2 files changed

Lines changed: 42 additions & 1 deletion

File tree

cpp/src/arrow/filesystem/s3fs.cc

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2408,7 +2408,16 @@ class S3FileSystem::Impl : public std::enable_shared_from_this<S3FileSystem::Imp
24082408
std::vector<std::string> file_paths;
24092409
for (const auto& file_info : file_infos) {
24102410
DCHECK_GT(file_info.path().size(), bucket.size());
2411-
file_paths.push_back(file_info.path().substr(bucket.size() + 1));
2411+
auto file_path = file_info.path().substr(bucket.size() + 1);
2412+
if (file_info.IsDirectory()) {
2413+
// The selector returns FileInfo objects for directories with a
2414+
// a path that never ends in a trailing slash, but for AWS the file
2415+
// needs to have a trailing slash to recognize it as directory
2416+
// (https://github.com/apache/arrow/issues/38618)
2417+
DCHECK_OK(internal::AssertNoTrailingSlash(file_path));
2418+
file_path = file_path + kSep;
2419+
}
2420+
file_paths.push_back(std::move(file_path));
24122421
}
24132422
scheduler->AddSimpleTask(
24142423
[=, file_paths = std::move(file_paths)] {

python/pyarrow/tests/test_fs.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -760,6 +760,38 @@ def test_delete_dir(fs, pathfn):
760760
fs.delete_dir(d)
761761

762762

763+
def test_delete_dir_with_explicit_subdir(fs, pathfn):
764+
# GH-38618: regression with AWS failing to delete directories,
765+
# depending on whether they were created explicitly. Note that
766+
# Minio doesn't reproduce the issue, so this test is not a regression
767+
# test in itself.
768+
skip_fsspec_s3fs(fs)
769+
770+
d = pathfn('directory/')
771+
nd = pathfn('directory/nested/')
772+
773+
# deleting dir with explicit subdir
774+
fs.create_dir(d)
775+
fs.create_dir(nd)
776+
fs.delete_dir(d)
777+
dir_info = fs.get_file_info(d)
778+
assert dir_info.type == FileType.NotFound
779+
780+
# deleting dir with blob in explicit subdir
781+
d = pathfn('directory2')
782+
nd = pathfn('directory2/nested')
783+
f = pathfn('directory2/nested/target-file')
784+
785+
fs.create_dir(d)
786+
fs.create_dir(nd)
787+
with fs.open_output_stream(f) as s:
788+
s.write(b'data')
789+
790+
fs.delete_dir(d)
791+
dir_info = fs.get_file_info(d)
792+
assert dir_info.type == FileType.NotFound
793+
794+
763795
def test_delete_dir_contents(fs, pathfn):
764796
skip_fsspec_s3fs(fs)
765797

0 commit comments

Comments
 (0)