Non-crashing Python 3.x output in Windows
Problem
The following little Python 3.x program just crashes with CPython 3.3.4 in a default-configured English Windows:
#encoding=utf-8 print( "Blåbærsyltetøy!")
H:\personal\web\blog alf on programming at wordpress\001\test>chcp 437
Active code page: 437
H:\personal\web\blog alf on programming at wordpress\001\test>type crash.py3 | display_utf8
#encoding=utf-8
print( "Blåbærsyltetøy!")
H:\personal\web\blog alf on programming at wordpress\001\test>crash.py3
Traceback (most recent call last):
File "H:\personal\web\blog alf on programming at wordpress\001\test\crash.py3", line 2, in
print( "Blåbærsyltet\xf8y!")
File "C:\Program Files\CPython 3_3_4\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xf8' in position 12: character maps to
H:\personal\web\blog alf on programming at wordpress\001\test>_
Here codepage 437 is the original IBM PC character set, which is the default narrow text interpretation in an English Windows console window.
A partial solution is to change the default console codepage to Windows ANSI, which then at least for CPython matches the encoding for output to a pipe or file, and it’s nice with consistency. But also this has a severely limited character set, with possible crash behavior for any unsupported characters.
Direct console output
Unicode text limited to the Basic Multilingual Plane (essentially original 16-bits Unicode) can be output to a Windows console via the WriteConsoleW Windows API function.
The standard Python ctypes module provides access to the API:
import ctypes
class Object: pass
winapi = Object()
winapi.STD_INPUT_HANDLE = -10
winapi.STD_OUTPUT_HANDLE = -11
winapi.STD_ERROR_HANDLE = -12
winapi.GetStdHandle = ctypes.windll.kernel32.GetStdHandle
winapi.CloseHandle = ctypes.windll.kernel32.CloseHandle
winapi.WriteConsoleW = ctypes.windll.kernel32.WriteConsoleW
class Direct_console_io:
def write( self, s ) -> int:
n_written = ctypes.c_ulong()
ret = winapi.WriteConsoleW(
self.std_output_handle, s, len( s ), ctypes.byref( n_written ), 0
)
return n_written.value
def __del__( self ):
if not winapi: return # Looks like a bug in CPython 3.x
winapi.CloseHandle( self.std_error_handle )
winapi.CloseHandle( self.std_output_handle )
winapi.CloseHandle( self.std_input_handle )
def __init__( self ):
self.dependency = winapi
self.std_input_handle = winapi.GetStdHandle( winapi.STD_INPUT_HANDLE )
self.std_output_handle = winapi.GetStdHandle( winapi.STD_OUTPUT_HANDLE )
self.std_error_handle = winapi.GetStdHandle( winapi.STD_ERROR_HANDLE )
Implementing input is left as an exercise for the reader.
Overriding the standard streams to use direct i/o and UTF-8.
In addition to the silly crashing behavior, the standard streams in CPython 3.x, like sys.stdout, default to Windows ANSI for output to file or pipe. In Python 2.7 this could be reset to more useful UTF-8 by reloading the sys module in order to get back a dynamically removed method that could set the default encoding. No longer in Python 3.x, so this code just creates new stream objects:
import io
import sys
from Direct_console_io import Direct_console_io
class Dcio_raw_iobase( io.RawIOBase ):
def writable( self ) -> bool:
return True
def write( self, seq_of_bytes ) -> int:
b = bytes( seq_of_bytes )
return self.dcio.write( b.decode( 'utf-8' ) )
def __init__( self ):
self.dcio = Direct_console_io()
class Dcio_buffered_writer( io.BufferedWriter ):
def write( self, seq_of_bytes ) -> int:
return self.raw_stream.write( seq_of_bytes )
def flush( self ):
pass
def __init__( self, raw_iobase ):
super().__init__( raw_iobase )
self.raw_stream = raw_iobase
# Module initialization:
def __init__():
using_console_input = sys.stdin.isatty()
using_console_output = sys.stdout.isatty()
using_console_error = sys.stderr.isatty()
if using_console_output:
raw_io = Dcio_raw_iobase()
buf_io = Dcio_buffered_writer( raw_io )
sys.stdout = io.TextIOWrapper( buf_io, encoding = 'utf-8' )
sys.stdout.isatty = lambda: True
else:
sys.stdout = io.TextIOWrapper( sys.stdout.detach(), encoding = 'utf-8-sig' )
if using_console_error:
raw_io = Dcio_raw_iobase()
buf_io = Dcio_buffered_writer( raw_io )
sys.stderr = io.TextIOWrapper( buf_io, encoding = 'utf-8' )
sys.stderr.isatty = lambda: True
else:
sys.stderr = io.TextIOWrapper( sys.stderr.detach(), encoding = 'utf-8-sig' )
return
__init__()
Disclaimer: It’s been a long time since I fiddled with Python, so possibly I’m breaking a number of conventions plus doing things in some less than optimal way. But this was the first path I found through the jungle of apparently arbitrary io class derivations etc. It worked well enough for my purposes (in a little script to convert NRK’s HTML-format subtitles to SubRip format), so, I gather it can be useful also for you – at least as a basis for more robust and/or more general code.
3.4.2 64bit Win7 UK English does not crash, but it does mess up the terminal.
nice website
php compiler ??