Non-crashing Python 3.x output in Windows

Non-crashing Python 3.x output in Windows

Problem

The following little Python 3.x program just crashes with CPython 3.3.4 in a default-configured English Windows:

crash.py3
#encoding=utf-8
print( "Blåbærsyltetøy!")
H:\personal\web\blog alf on programming at wordpress\001\test>chcp 437
Active code page: 437

H:\personal\web\blog alf on programming at wordpress\001\test>type crash.py3 | display_utf8
#encoding=utf-8
print( "Blåbærsyltetøy!")

H:\personal\web\blog alf on programming at wordpress\001\test>crash.py3
Traceback (most recent call last):
  File "H:\personal\web\blog alf on programming at wordpress\001\test\crash.py3", line 2, in 
    print( "Blåbærsyltet\xf8y!")
  File "C:\Program Files\CPython 3_3_4\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xf8' in position 12: character maps to 

H:\personal\web\blog alf on programming at wordpress\001\test>_

Here codepage 437 is the original IBM PC character set, which is the default narrow text interpretation in an English Windows console window.

A partial solution is to change the default console codepage to Windows ANSI, which then at least for CPython matches the encoding for output to a pipe or file, and it’s nice with consistency. But also this has a severely limited character set, with possible crash behavior for any unsupported characters.

Direct console output

Unicode text limited to the Basic Multilingual Plane (essentially original 16-bits Unicode) can be output to a Windows console via the WriteConsoleW Windows API function.

The standard Python ctypes module provides access to the API:

Direct_console_io.py
import ctypes
class Object: pass

winapi = Object()
winapi.STD_INPUT_HANDLE     = -10
winapi.STD_OUTPUT_HANDLE    = -11
winapi.STD_ERROR_HANDLE     = -12
winapi.GetStdHandle         = ctypes.windll.kernel32.GetStdHandle
winapi.CloseHandle          = ctypes.windll.kernel32.CloseHandle
winapi.WriteConsoleW        = ctypes.windll.kernel32.WriteConsoleW

class Direct_console_io:
    def write( self, s ) -> int:
        n_written = ctypes.c_ulong()
        ret = winapi.WriteConsoleW(
            self.std_output_handle, s, len( s ), ctypes.byref( n_written ), 0
            )
        return n_written.value

    def __del__( self ):
        if not winapi: return       # Looks like a bug in CPython 3.x
        winapi.CloseHandle( self.std_error_handle )
        winapi.CloseHandle( self.std_output_handle )
        winapi.CloseHandle( self.std_input_handle )

    def __init__( self ):
        self.dependency = winapi
        self.std_input_handle   = winapi.GetStdHandle( winapi.STD_INPUT_HANDLE )
        self.std_output_handle  = winapi.GetStdHandle( winapi.STD_OUTPUT_HANDLE )
        self.std_error_handle   = winapi.GetStdHandle( winapi.STD_ERROR_HANDLE )

Implementing input is left as an exercise for the reader.

Overriding the standard streams to use direct i/o and UTF-8.

In addition to the silly crashing behavior, the standard streams in CPython 3.x, like sys.stdout, default to Windows ANSI for output to file or pipe. In Python 2.7 this could be reset to more useful UTF-8 by reloading the sys module in order to get back a dynamically removed method that could set the default encoding. No longer in Python 3.x, so this code just creates new stream objects:

Utf8_standard_streams.py
import io
import sys
from Direct_console_io import Direct_console_io

class Dcio_raw_iobase( io.RawIOBase ):
    def writable( self ) -> bool:
        return True

    def write( self, seq_of_bytes ) -> int:
        b = bytes( seq_of_bytes )
        return self.dcio.write( b.decode( 'utf-8' ) )

    def __init__( self ):
        self.dcio = Direct_console_io()

class Dcio_buffered_writer( io.BufferedWriter ):
    def write( self, seq_of_bytes ) -> int:
        return self.raw_stream.write( seq_of_bytes )

    def flush( self ):
        pass

    def __init__( self, raw_iobase ):
        super().__init__( raw_iobase )
        self.raw_stream = raw_iobase

# Module initialization:
def __init__():
    using_console_input = sys.stdin.isatty()
    using_console_output = sys.stdout.isatty()
    using_console_error = sys.stderr.isatty()

    if using_console_output:
        raw_io = Dcio_raw_iobase()
        buf_io = Dcio_buffered_writer( raw_io )
        sys.stdout = io.TextIOWrapper( buf_io, encoding = 'utf-8' )
        sys.stdout.isatty = lambda: True
    else:
        sys.stdout = io.TextIOWrapper( sys.stdout.detach(), encoding = 'utf-8-sig' )

    if using_console_error:
        raw_io = Dcio_raw_iobase()
        buf_io = Dcio_buffered_writer( raw_io )
        sys.stderr = io.TextIOWrapper( buf_io, encoding = 'utf-8' )
        sys.stderr.isatty = lambda: True
    else:
        sys.stderr = io.TextIOWrapper( sys.stderr.detach(), encoding = 'utf-8-sig' )
    return

__init__()

Disclaimer: It’s been a long time since I fiddled with Python, so possibly I’m breaking a number of conventions plus doing things in some less than optimal way. But this was the first path I found through the jungle of apparently arbitrary io class derivations etc. It worked well enough for my purposes (in a little script to convert NRK’s HTML-format subtitles to SubRip format), so, I gather it can be useful also for you – at least as a basis for more robust and/or more general code.

Advertisements