The ZStr
ownerhip transferring string class that I introduced in my first Xerces posting may have seemed like total overkill for the job. In C++98 it’s difficult to do correctly, and it requires some less-than-commonly available support machinery. And so if Xerces string handling was the only reason to do it, I wouldn’t have done it.
But once a string class like ZStr
is available you (or at least I) find that it’s a natural first choice for many tasks. The beauty of it is that it allows you to defer the decision of trading efficiency for functionality, because the ownership can be transferred to an immutable sharing string type at any point. Or the string can be copied to a copying mutable string type like std::string
, whatever.
With a type like ZStr
, if you’re implementing library-like functionality, the decision of which “rich” string type does not have to be imposed on the client code. Instead of already trading away the efficiency you’re giving the client code the choice, including the choice of just using ZStr
.
Why not just use an Ownership
directly?
The main difference between a ZStr<
T>
and a plain Ownership<
T const>
is that the ZStr
strongly implies that the contained pointer points to an array of T with a zero at the end, a zero-terminated string
Ideally this should have been guaranteed by the language’s type system, but C++’s type system isn’t powerful enough for that. So, it’s a convention, as much else in C++. But it’s a communicated convention, one that is explicitly there in the code, and hence it’s a restriction that you can rely on: with a ZStr
you know.
And ZStr
does offer some support routines that assume that it’s a zero-terminated string. But apart from basics such as obtaining the string length there are none of the usual string operations. The idea is that you have the choice of any more “rich” string type you want for doing more advanced things than just passing strings around.
The current implementation
The current implementation looks like this:
File [progrock/cppx/text/ZStr.h], complete code:
// Copyright (c) Alf P. Steinbach, 2010. // #include <progrock/cppx/text/ZStr.h> // // A ZStr represents ownership of a zero-terminated string. #ifndef PROGROCK_CPPX_TEXT_ZSTR_H #define PROGROCK_CPPX_TEXT_ZSTR_H #include <progrock/cppx/devsupport/better_experience.h> //-------------------------------- Dependencies: #include <progrock/cppx/pointers/Ownership.h> #include <progrock/cppx/collections/iterator_range_util.h> #include <progrock/cppx/text/codepoint_types.h> //-------------------------------- Interface: namespace progrock{ namespace cppx{ template< class CodePointType > class ZStr // Zero-terminated string. : public cppx::OwnershipTransferring< ZStr< CodePointType > > { private: cppx::Ownership< CodePointType const > myArray; ZStr( ZStr& ); // No such. ZStr& operator=( ZStr const& ); // No such. protected: typedef cppx::OwnershipTransferring< ZStr< CodePointType > > Base; typedef typename Base::Ref Ref; public: typedef typename Deleter< CodePointType >::Func DeleterFunc; ZStr( CodePointType const* s, DeleterFunc deleter ) : myArray( s, deleter ) { assert( deleter != 0 ); } ZStr( CodePointType const* s, DeleteAsArray daa ) : myArray( s, daa ) {} ZStr( cppx::Ownership< CodePointType const > o ) : myArray( o.transfer() ) {} ZStr( CodePointType const* s, NoDelete nd ) : myArray( s, nd ) {} ZStr( AsCopyOf, CodePointType const* s ) : myArray( dupStr( s ), deleteAsArray ) {} ZStr( Ref other ): myArray( 0 ) { swapWith( *other.p ); } void swapWith( ZStr& other ) { myArray.swapWith( other.myArray ); } CodePointType const* ptr() const { return myArray.ptr(); } operator CodePointType const* () const { return ptr(); } DeleterFunc deleter() const { return myArray.deleter(); } CodePointType const* release() { return myArray.release(); } Size length() const { return strLen( myArray.ptr() ); } }; typedef ZStr< CodePoint8 > ZStr8; typedef ZStr< CodePoint16 > ZStr16; typedef ZStr< CodePoint32 > ZStr32; typedef ZStr< char > ZStrChar; typedef ZStr< wchar_t > ZStrWChar; template< class ResultCPType, class SourceCPType > inline ZStr< ResultCPType > reinterpretAs( ZStr< SourceCPType > s ) { // Doesn't check that types are built-in, but probably good enough. CPPX_STATIC_ASSERT( sizeof( ResultCPType ) == sizeof( SourceCPType ) ); CPPX_STATIC_ASSERT( std::numeric_limits< SourceCPType >::is_integer ); CPPX_STATIC_ASSERT( std::numeric_limits< ResultCPType >::is_integer ); typedef ZStr< ResultCPType > ZStrR; typedef ZStr< SourceCPType > ZStrS; typename ZStrS::DeleterFunc const d = s.deleter(); return ZStrR( reinterpret_cast< ResultCPType const* >( s.release() ), reinterpret_cast< typename ZStrR::DeleterFunc >( d ) ); } template< typename CodePointType > struct It< ZStr< CodePointType > > { typedef CodePointType const* T; }; template< typename CodePointType > struct It< ZStr< CodePointType > const > { typedef CodePointType const* T; }; template< typename CodePointType > inline CodePointType const* startOf( ZStr< CodePointType >& s ) { return s.ptr(); } template< typename CodePointType > inline CodePointType const* startOf( ZStr< CodePointType > const& s ) { return s.ptr(); } template< typename CodePointType > inline CodePointType const* endOf( ZStr< CodePointType >& s ) { return s.ptr() + s.length(); } template< typename CodePointType > inline CodePointType const* endOf( ZStr< CodePointType > const& s ) { return s.ptr() + s.length(); } template< typename CodePointType > inline Size size( ZStr< CodePointType >& s ) { return s.length(); } template< typename CodePointType > inline Size size( ZStr< CodePointType > const& s ) { return s.length(); } } } // namespace progrock::cppx #endif
I emphasized current implementation because ZStr
is still evolving. I’m not embarrassed to tell you that I have not yet written a unit test for ZStr
, although I do have a unit test for Ownership
. The testing has consisted of using this class in other code, which uncovers missing functionality (design level slip-ups) as opposed to uncovering bugs (coding slip-ups). It is perhaps controversial but I firmly believe in the adage that “no paper design should survive contact with reality”, or put another way, that we humans are very fallible, including in our limited ability to see that, which implies that writing comprehensive unit tests before coding and trial-usage is not necessarily a good idea.
For example, looking at the code that I just pasted I notice that there’s no global namespace swap
, which really should be there. 🙂
Why is the contained string presented as const
?
Dealing with constness for an ownership transferring type is a bit complex. I do that in Ownership
, but for ZStr
I chose the easy way out: once you’ve given ownership of a string to a ZStr
, it’s effectively const
. Since Ownership
supports “constification” it’s no problem to deal safely with a mutable buffer before transferring it to a ZStr
, and in the opposite direction, when you want to do string value manipulation you’ll generally transfer ownership to a “rich” immutable shared string class object, or copy the string to e.g. a std::string
.
Why two variants of size
etc.?
Why are there two variants of e.g. size
, like
template< typename CodePointType > inline Size size( ZStr< CodePointType >& s ) { return s.length(); } template< typename CodePointType > inline Size size( ZStr< CodePointType > const& s ) { return s.length(); }
Since ZStr
is an ownership transferring class a function like size
can’t take the argument by value. For that would transfer the ownership, and bye bye, string (happily with ZStr
such an ownership transfer has to be explicitly requested, except for a temporary where it does not matter). So size
has to take the argument by reference, at least if it should support a ZStr
used directly as actual argument.
However, the two definitions above are overloads of a general size
definition. The general definition deals with raw arrays and standard containers. And so to steer the compiler’s attention away from the general template (which won’t compile for a ZStr
argument) both possible inferred formal argument types have to be provided – otherwise the general template would be a better match in some case, and with a modern compiler an avalanche of incomprehensible and mostly meaningless diagnostics would result.
I didn’t figure this out before writing the code; it’s quite subtle.
I did encounter that diagnostics avalanche! 🙂
Why implicit conversion to pointer, isn’t that Evil™?
Ownership
quite sensibly and conventionally does not offer implicit conversion to its pointer type, but ZStr
does. I added this implicit conversion quite late, when I saw that using ZStr
for Xerces strings became annoyingly verbose without it. Generally such implicit conversions are pitfalls, dangerous, and in some cases downright Evil™, but, in particular, ZStr
does not have any constructor that takes a single such pointer.
I did not originally design ZStr
with this in mind, but it just turned out to support such conversion well because I had endeavoured to make most operations explicit instead of implicit – so that when an operation finally “needed” to be implicit it didn’t crash with other operations.
The advantages of clear, clean & consise usage do in my opinion outweight the more academic problems with the implicit conversion (e.g. → bool
).
A usage example (Windows)
Few if any standard library functions produce dynamically allocated strings that must be deallocated in a special way. But many libraries (e.g. Xerces) do, and then ZStr
comes in very handy. The example below is instead for the Windows API – and yes, this time I’ve tested with both the MingW g++ and the Visual C++ compilers! 🙂
Example using some Windows API functions, complete code:
#include <progrock/cppx/text/ZStr.h> #include <iostream> #include <stdexcept> #include <stdlib.h> // EXIT_SUCCESS, EXIT_FAILURE #ifndef STRICT # define STRICT #endif #ifndef NOMINMAX # define NOMINMAX #endif #ifndef UNICODE # define UNICODE #endif #include <windows.h> using namespace progrock; bool throwX( char const s[] ) { throw std::runtime_error( s ); } void disposeComString( wchar_t* p ) { CoTaskMemFree( p ); } bool ok( HRESULT hr ) { return SUCCEEDED( hr ); } struct MsComLib { MsComLib() { ok( CoInitialize( 0 ) ) || throwX( "CoInitialize failed" ); } ~MsComLib() { CoUninitialize(); } }; // Fudge output of known-to-be-Latin-1 wide string, since MingW g++ has no std::wcout. std::ostream& operator<<( std::ostream& stream, wchar_t const* s ) { while( *s ) { stream << char( *s++ ); } return stream; } cppx::ZStrWChar stringFrom( CLSID const& classId ) { wchar_t* pStr; HRESULT const hr = StringFromCLSID( classId, &pStr ); ok( hr ) || throwX( "StringFromCLSID failed" ); return cppx::ZStrWChar( pStr, &disposeComString ); } cppx::ZStrWChar progIdFrom( CLSID const& classId ) { wchar_t* pStr; HRESULT const hr = ProgIDFromCLSID( classId, &pStr ); ok( hr ) || throwX( "ProgIDFromCLSID failed" ); return cppx::ZStrWChar( pStr, &disposeComString ); } int main() { // Just an example of a class id that's probably also on your Windows system. CLSID const aClassId = { 0x00021401, 0x0000, 0x0000, { 0xC0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x46 } }; try { MsComLib const comLibUsage; std::cout << "Class id " << stringFrom( aClassId ) << " is a \"" << progIdFrom( aClassId ) << "\"." << std::endl; return EXIT_SUCCESS; } catch( std::exception const& x ) { std::cerr << "!" << x.what() << std::endl; } return EXIT_FAILURE; }
Here the ZStr
function results avoid any copying of the strings produced by the API functions: the strings can just be passed around, and will be deallocated properly, automatically.
Cheers, & … enjoy!