Skip to content
3 changes: 2 additions & 1 deletion Lib/encodings/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ def normalize_encoding(encoding):
if c.isalnum() or c == '.':
if punct and chars:
chars.append('_')
chars.append(c)
if c.isascii():
chars.append(c)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ask you to add a ".. versionchanged:: 3.10" entry in the documentation, but then I noticed that the encodings module was never documented! Oh!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If end user will use this function or module, I can try to create the doc, but I need some time to do it :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can and must be addressed in a separated PR anymore. The lack of documentation should not hold this change.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, copy that.

punct = False
else:
punct = True
Expand Down
16 changes: 16 additions & 0 deletions Lib/test/test_codecs.py
Original file line number Diff line number Diff line change
Expand Up @@ -3440,5 +3440,21 @@ def search_function(encoding):
self.assertEqual(NOT_FOUND, codecs.lookup('a\xe9\u20ac-8'))


class EncodingNormalizationTest(unittest.TestCase):
Comment thread
shihai1991 marked this conversation as resolved.
Outdated

def test_normalization(self):
# encodings.normalize_encoding() ignores non-ASCII letters.
out = encodings.normalize_encoding('utf\xE9\u20AC\U0010ffff-8')
self.assertEqual(out, 'utf_8')
out = encodings.normalize_encoding('utf_8')
self.assertEqual(out, 'utf_8')
Comment thread
shihai1991 marked this conversation as resolved.
Outdated
out = encodings.normalize_encoding('utf 8')
self.assertEqual(out, 'utf_8')
Comment thread
shihai1991 marked this conversation as resolved.
Outdated
out = encodings.normalize_encoding('UTF 8')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to add a comment to explain that the function does not convert upper case letters to lower case letters, just to make the purpose of this test even more explicit?

Copy link
Copy Markdown
Member Author

@shihai1991 shihai1991 Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know how to exact explain it~
I found a case in https://github.com/python/cpython/blob/master/Lib/locale.py#L358.
Looks like It's fine to update encodings.normalize_encoding() to conver to lower-case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just describe the fact, Lol~

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can enhance this encodings.normalize_encoding()? I am not sure~

self.assertEqual(out, 'UTF_8')
out = encodings.normalize_encoding('utf...8')
Comment thread
shihai1991 marked this conversation as resolved.
Outdated
self.assertEqual(out, 'utf...8')


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
:func:`encodings.normalize_encoding` now ignores non-ASCII letters.
Comment thread
shihai1991 marked this conversation as resolved.
Outdated