UTF-8 in the Windows API

The May 2019 update of Windows 10 introduced the possibility of setting the ActiveCodePage property of an executable to UTF-8. This is done via the application manifest. The documentation is super-vague on the technical details and history, and in usual Microsoft fashion the functionality is obscured and the little desirable kill-a-gnat can only be done by costly nuclear bombing, so to speak — why let something simple be simple if it can be wrapped in military standard complexity?

But it means that with Visual C++ 2019 one can now use UTF-8 encoding for GUI applications, and for the output of console programs, without any encoding conversions in the code.

In particular, with UTF-8 active process codepage the arguments of main now come handily UTF-8 encoded, which means that they can now represent general filenames also in Windows. Hurray! Yippi!

However, interactive console input of UTF-8 is still limited to ASCII at the API-level. And the MinGW g++ 9.2 compiler’s default standard library implementation doesn’t support UTF-8 in the C and C++ locale machinery, e.g in setlocale, probably because it employs an old version of Microsoft’s runtime library. That means that FILE* or iostreams UTF-8 console output with MinGW g++ 9.2 only works for the default “C” locale.

I experimented by setting the ANSI codepage default in the registry to 65001, the UTF-8 codepage number. After rebooting the console windows came up with active codepage 65001, even though the OEM codepage default was the same old one (850 in my case). That indicates an effort on Microsoft’s part to support UTF-8 all the way in Windows, which if so is fantastically good.


An example application manifest.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
    <assemblyIdentity type="win32" name="UTF-8 app example" version="6.0.0.0"/>
    <application>
        <windowsSettings>
            <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings"
                >UTF-8</activeCodePage>
        </windowsSettings>
    </application>
    <dependency>
        <dependentAssembly>
            <assemblyIdentity
                type="win32"
                name="Microsoft.Windows.Common-Controls"
                version="6.0.0.0"
                processorArchitecture="*"
                publicKeyToken="6595b64144ccf1df"
                language="*"
                />
        </dependentAssembly>
    </dependency>
</assembly>

The second assemblyIdentity part has nothing to do with the UTF-8 support, it just corrects a practically unusable default for the look and feel of buttons etc. Essentially this manifest corrects the two “wrong” defaults: the narrow text encoding, and the look ’n feel. From within an application with this manifest it looks as both CP_ACP (the global default) and CP_THREAD_ACP (the mysterious thread codepage) are UTF-8, codepage 65001.

In my experimentation UTF-8 had to be specified in uppercase, and it did not work with whitespace such as space or newline at either side.


An example resource script:

#include <windows.h>
1 RT_MANIFEST "resources/application-manifest.xml"

One comment on “UTF-8 in the Windows API

  1. Pingback: A <windows.h> wrapper for UTF-8 Windows apps. | Alf on programming (mostly C++)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s