Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions Doc/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -710,8 +710,9 @@ Glossary

Comment thread
methane marked this conversation as resolved.
On Windows, it is the ANSI code page (ex: ``cp1252``).
Comment thread
methane marked this conversation as resolved.
Outdated

``locale.getpreferredencoding(False)`` can be used to get the locale
encoding.
On Android and VxWorks, return ``"UTF-8"``.
Comment thread
methane marked this conversation as resolved.
Outdated

``locale.getencoding()`` can be used to get the locale encoding.
Comment thread
methane marked this conversation as resolved.

Python uses the :term:`filesystem encoding and error handler` to convert
between Unicode filenames and bytes filenames.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be helpful to mention here the Python UTF-8 Mode which ignores the locale encoding and always uses UTF-8. What do you think?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. This paragraph doesn't describe where the locale encoding is used.
This paragraph describes what the locale encoding is.
With this pull request, locale encoding is locale encoding even in UTF-8 mode.

On the other hand, following this paragraph looks unnecessary:

      Python uses the :term:`filesystem encoding and error handler` to convert
      between Unicode filenames and bytes filenames.

I will replace it with See also :term:`filesystem encoding and error handler`.

Expand Down
18 changes: 17 additions & 1 deletion Doc/library/locale.rst
Original file line number Diff line number Diff line change
Expand Up @@ -334,10 +334,26 @@ The :mod:`locale` module defines the following exception and functions:
locale. See also the :term:`filesystem encoding and error handler`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to copy/paste this paragraph to getencoding() doc.


.. versionchanged:: 3.7
The function now always returns ``UTF-8`` on Android or if the
The function now always returns ``"UTF-8"`` on Android or if the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The function now always returns ``"UTF-8"`` on Android or if the
The function now always returns ``"utf-8"`` on Android or if the

:ref:`Python UTF-8 Mode <utf8-mode>` is enabled.


.. function:: getencoding()

Get the current :term:`locale encoding`:

* On Android and VxWorks, return ``"UTF-8"``.
* On Unix, return the encoding of the current :data:`LC_CTYPE` locale.
Return ``"UTF-8"`` if ``nl_langinfo(CODESET)`` returns an empty string:
for example, if the current LC_CTYPE locale is not supported.
* On Windows, return the ANSI code page.

This function is same to ``getpreferredencoding(False)`` except this
Comment thread
methane marked this conversation as resolved.
Outdated
function ignore the :ref:`UTF-8 Mode <utf8-mode>`.
Comment thread
methane marked this conversation as resolved.
Outdated

.. versionadded:: 3.11


.. function:: normalize(localename)

Returns a normalized locale code for the given locale name. The returned locale
Expand Down
6 changes: 6 additions & 0 deletions Doc/whatsnew/3.11.rst
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,12 @@ inspect
* Add :func:`inspect.ismethodwrapper` for checking if the type of an object is a
:class:`~types.MethodWrapperType`. (Contributed by Hakan Çelik in :issue:`29418`.)

locale
------

* Add :func:`locale.getencoding` that is same to
Comment thread
methane marked this conversation as resolved.
Outdated
``locale.getpreferredencoding(False)`` but ignores :ref:`UTF-8 Mode <utf8-mode>`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like repeating "Python" to remind that it's unrelated to the Unix locale, but really something specific to Python which ignores the locale.

Suggested change
``locale.getpreferredencoding(False)`` but ignores :ref:`UTF-8 Mode <utf8-mode>`.
``locale.getpreferredencoding(False)`` but ignores the :ref:`Python UTF-8 Mode <utf8-mode>`.


math
----

Expand Down
20 changes: 10 additions & 10 deletions Lib/locale.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
"setlocale", "resetlocale", "localeconv", "strcoll", "strxfrm",
"str", "atof", "atoi", "format", "format_string", "currency",
"normalize", "LC_CTYPE", "LC_COLLATE", "LC_TIME", "LC_MONETARY",
"LC_NUMERIC", "LC_ALL", "CHAR_MAX"]
"LC_NUMERIC", "LC_ALL", "CHAR_MAX", "getencoding"]

def _strcoll(a,b):
""" strcoll(string,string) -> int.
Expand Down Expand Up @@ -637,27 +637,27 @@ def resetlocale(category=LC_ALL):


try:
from _locale import _get_locale_encoding
from _locale import getencoding
except ImportError:
def _get_locale_encoding():
def getencoding():
if hasattr(sys, 'getandroidapilevel'):
# On Android langinfo.h and CODESET are missing, and UTF-8 is
# always used in mbstowcs() and wcstombs().
return 'UTF-8'
if sys.flags.utf8_mode:
return 'UTF-8'
encoding = getdefaultlocale()[1]
if encoding is None:
# LANG not set, default conservatively to ASCII
encoding = 'ascii'
# LANG not set, default to UTF-8
encoding = 'UTF-8'
Comment thread
methane marked this conversation as resolved.
Outdated
return encoding

try:
CODESET
except NameError:
def getpreferredencoding(do_setlocale=True):
"""Return the charset that the user is likely using."""
return _get_locale_encoding()
if sys.flags.utf8_mode:
return 'UTF-8'
return getencoding()
else:
# On Unix, if CODESET is available, use that.
def getpreferredencoding(do_setlocale=True):
Expand All @@ -667,15 +667,15 @@ def getpreferredencoding(do_setlocale=True):
return 'UTF-8'

if not do_setlocale:
return _get_locale_encoding()
return getencoding()

old_loc = setlocale(LC_CTYPE)
try:
try:
setlocale(LC_CTYPE, "")
except Error:
pass
return _get_locale_encoding()
return getencoding()
finally:
setlocale(LC_CTYPE, old_loc)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add :func:`locale.getencoding`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can copy/paste the Doc/whatsnew/3.11.rst entry entry.

9 changes: 8 additions & 1 deletion Modules/_io/textio.c
Original file line number Diff line number Diff line change
Expand Up @@ -1145,7 +1145,14 @@ _io_TextIOWrapper___init___impl(textio *self, PyObject *buffer,
}
}
if (encoding == NULL && self->encoding == NULL) {
self->encoding = _Py_GetLocaleEncodingObject();
if (_PyRuntime.preconfig.utf8_mode) {
_Py_DECLARE_STR(utf_8, "utf-8");
self->encoding = &_Py_STR(utf_8);
Py_INCREF(self->encoding);
Comment thread
methane marked this conversation as resolved.
Outdated
}
else {
self->encoding = _Py_GetLocaleEncodingObject();
}
if (self->encoding == NULL) {
goto error;
}
Expand Down
8 changes: 4 additions & 4 deletions Modules/_localemodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -773,14 +773,14 @@ _locale_bind_textdomain_codeset_impl(PyObject *module, const char *domain,


/*[clinic input]
_locale._get_locale_encoding
_locale.getencoding

Get the current locale encoding.
[clinic start generated code]*/

static PyObject *
_locale__get_locale_encoding_impl(PyObject *module)
/*[clinic end generated code: output=e8e2f6f6f184591a input=513d9961d2f45c76]*/
_locale_getencoding_impl(PyObject *module)
/*[clinic end generated code: output=86b326b971872e46 input=6503d11e5958b360]*/
{
return _Py_GetLocaleEncodingObject();
}
Expand Down Expand Up @@ -811,7 +811,7 @@ static struct PyMethodDef PyLocale_Methods[] = {
_LOCALE_BIND_TEXTDOMAIN_CODESET_METHODDEF
#endif
#endif
_LOCALE__GET_LOCALE_ENCODING_METHODDEF
_LOCALE_GETENCODING_METHODDEF
{NULL, NULL}
};

Expand Down
16 changes: 8 additions & 8 deletions Modules/clinic/_localemodule.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 6 additions & 4 deletions Python/fileutils.c
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,12 @@ _Py_device_encoding(int fd)

return PyUnicode_FromFormat("cp%u", (unsigned int)cp);
#else
if (_PyRuntime.preconfig.utf8_mode) {
_Py_DECLARE_STR(utf_8, "utf-8");
PyObject *encoding = &_Py_STR(utf_8);
Py_INCREF(encoding);
return encoding;
}
return _Py_GetLocaleEncodingObject();
#endif
}
Expand Down Expand Up @@ -890,10 +896,6 @@ _Py_GetLocaleEncoding(void)
// and UTF-8 is always used in mbstowcs() and wcstombs().
return _PyMem_RawWcsdup(L"UTF-8");
#else
const PyPreConfig *preconfig = &_PyRuntime.preconfig;
if (preconfig->utf8_mode) {
return _PyMem_RawWcsdup(L"UTF-8");
}

#ifdef MS_WINDOWS
wchar_t encoding[23];
Expand Down
8 changes: 7 additions & 1 deletion Python/initconfig.c
Original file line number Diff line number Diff line change
Expand Up @@ -1779,7 +1779,13 @@ static PyStatus
config_get_locale_encoding(PyConfig *config, const PyPreConfig *preconfig,
wchar_t **locale_encoding)
{
wchar_t *encoding = _Py_GetLocaleEncoding();
wchar_t *encoding;
if (preconfig->utf8_mode) {
encoding = _PyMem_RawWcsdup(L"UTF-8");
}
else {
encoding = _Py_GetLocaleEncoding();
}
if (encoding == NULL) {
return _PyStatus_NO_MEMORY();
}
Expand Down