[cppx] ZStr, an ownerhip transferring string type

The ZStr ownerhip transferring string class that I introduced in my first Xerces posting may have seemed like total overkill for the job. In C++98 it’s difficult to do correctly, and it requires some less-than-commonly available support machinery. And so if Xerces string handling was the only reason to do it, I wouldn’t have done it.

But once a string class like ZStr is available you (or at least I) find that it’s a natural first choice for many tasks. The beauty of it is that it allows you to defer the decision of trading efficiency for functionality, because the ownership can be transferred to an immutable sharing string type at any point. Or the string can be copied to a copying mutable string type like std::string, whatever.

With a type like ZStr, if you’re implementing library-like functionality, the decision of which “rich” string type does not have to be imposed on the client code. Instead of already trading away the efficiency you’re giving the client code the choice, including the choice of just using ZStr.

Why not just use an Ownership directly?

The main difference between a ZStr<T> and a plain Ownership<T const> is that the ZStr strongly implies that the contained pointer points to an array of T with a zero at the end, a zero-terminated string

Ideally this should have been guaranteed by the language’s type system, but C++’s type system isn’t powerful enough for that. So, it’s a convention, as much else in C++. But it’s a communicated convention, one that is explicitly there in the code, and hence it’s a restriction that you can rely on: with a ZStr you know.

And ZStr does offer some support routines that assume that it’s a zero-terminated string. But apart from basics such as obtaining the string length there are none of the usual string operations. The idea is that you have the choice of any more “rich” string type you want for doing more advanced things than just passing strings around.

The current implementation

The current implementation looks like this:

File [progrock/cppx/text/ZStr.h], complete code:

// Copyright (c) Alf P. Steinbach, 2010.
// #include <progrock/cppx/text/ZStr.h>
//
// A ZStr represents ownership of a zero-terminated string.

#ifndef PROGROCK_CPPX_TEXT_ZSTR_H
#define PROGROCK_CPPX_TEXT_ZSTR_H
#include <progrock/cppx/devsupport/better_experience.h>

//-------------------------------- Dependencies:

#include    <progrock/cppx/pointers/Ownership.h>
#include    <progrock/cppx/collections/iterator_range_util.h>
#include    <progrock/cppx/text/codepoint_types.h>

//-------------------------------- Interface:

namespace progrock{ namespace cppx{

    template< class CodePointType >
    class ZStr      // Zero-terminated string.
        : public cppx::OwnershipTransferring< ZStr< CodePointType > >
    {
    private:
        cppx::Ownership< CodePointType const >   myArray;

        ZStr( ZStr& );                      // No such.
        ZStr& operator=( ZStr const& );     // No such.

    protected:
        typedef cppx::OwnershipTransferring< ZStr< CodePointType > >    Base;
        typedef typename Base::Ref                                      Ref;

    public:
        typedef typename Deleter< CodePointType >::Func  DeleterFunc;

        ZStr( CodePointType const* s, DeleterFunc deleter )
            : myArray( s, deleter )
        { assert( deleter != 0 ); }

        ZStr( CodePointType const* s, DeleteAsArray daa )
            : myArray( s, daa )
        {}

        ZStr( cppx::Ownership< CodePointType const > o )
            : myArray( o.transfer() )
        {}

        ZStr( CodePointType const* s, NoDelete nd )
            : myArray( s, nd )
        {}

        ZStr( AsCopyOf, CodePointType const* s )
            : myArray( dupStr( s ), deleteAsArray )
        {}

        ZStr( Ref other ): myArray( 0 )         { swapWith( *other.p ); }
        void swapWith( ZStr& other )            { myArray.swapWith( other.myArray ); }

        CodePointType const* ptr() const        { return myArray.ptr(); }
        operator CodePointType const* () const  { return ptr(); }
        DeleterFunc deleter() const             { return myArray.deleter(); }

        CodePointType const* release()          { return myArray.release(); }

        Size length() const                     { return strLen( myArray.ptr() ); }
    };

    typedef ZStr< CodePoint8 >      ZStr8;
    typedef ZStr< CodePoint16 >     ZStr16;
    typedef ZStr< CodePoint32 >     ZStr32;
    typedef ZStr< char >            ZStrChar;
    typedef ZStr< wchar_t >         ZStrWChar;

    template< class ResultCPType, class SourceCPType >
    inline ZStr< ResultCPType > reinterpretAs( ZStr< SourceCPType > s )
    {
        // Doesn't check that types are built-in, but probably good enough.
        CPPX_STATIC_ASSERT( sizeof( ResultCPType ) == sizeof( SourceCPType ) );
        CPPX_STATIC_ASSERT( std::numeric_limits< SourceCPType >::is_integer );
        CPPX_STATIC_ASSERT( std::numeric_limits< ResultCPType >::is_integer );

        typedef ZStr< ResultCPType >    ZStrR;
        typedef ZStr< SourceCPType >    ZStrS;

        typename ZStrS::DeleterFunc const   d   = s.deleter();
        return ZStrR(
            reinterpret_cast< ResultCPType const* >( s.release() ),
            reinterpret_cast< typename ZStrR::DeleterFunc >( d )
            );
    }

    template< typename CodePointType >
    struct It< ZStr< CodePointType > >
    {
        typedef CodePointType const*     T;
    };

    template< typename CodePointType >
    struct It< ZStr< CodePointType > const >
    {
        typedef CodePointType const*     T;
    };

    template< typename CodePointType >
    inline CodePointType const*
    startOf( ZStr< CodePointType >& s )         { return s.ptr(); }

    template< typename CodePointType >
    inline CodePointType const*
    startOf( ZStr< CodePointType > const& s )   { return s.ptr(); }

    template< typename CodePointType >
    inline CodePointType const*
    endOf( ZStr< CodePointType >& s )           { return s.ptr() + s.length(); }

    template< typename CodePointType >
    inline CodePointType const*
    endOf( ZStr< CodePointType > const& s )     { return s.ptr() + s.length(); }

    template< typename CodePointType >
    inline Size
    size( ZStr< CodePointType >& s )            { return s.length(); }

    template< typename CodePointType >
    inline Size
    size( ZStr< CodePointType > const& s )      { return s.length(); }

} }  // namespace progrock::cppx

#endif

I emphasized current implementation because ZStr is still evolving. I’m not embarrassed to tell you that I have not yet written a unit test for ZStr, although I do have a unit test for Ownership. The testing has consisted of using this class in other code, which uncovers missing functionality (design level slip-ups) as opposed to uncovering bugs (coding slip-ups). It is perhaps controversial but I firmly believe in the adage that “no paper design should survive contact with reality”, or put another way, that we humans are very fallible, including in our limited ability to see that, which implies that writing comprehensive unit tests before coding and trial-usage is not necessarily a good idea.

For example, looking at the code that I just pasted I notice that there’s no global namespace swap, which really should be there. 🙂

Why is the contained string presented as const?

Dealing with constness for an ownership transferring type is a bit complex. I do that in Ownership, but for ZStr I chose the easy way out: once you’ve given ownership of a string to a ZStr, it’s effectively const. Since Ownership supports “constification” it’s no problem to deal safely with a mutable buffer before transferring it to a ZStr, and in the opposite direction, when you want to do string value manipulation you’ll generally transfer ownership to a “rich” immutable shared string class object, or copy the string to e.g. a std::string.

Why two variants of size etc.?

Why are there two variants of e.g. size, like

template< typename CodePointType >
inline Size
size( ZStr< CodePointType >& s )            { return s.length(); }

template< typename CodePointType >
inline Size
size( ZStr< CodePointType > const& s )      { return s.length(); }

Since ZStr is an ownership transferring class a function like size can’t take the argument by value. For that would transfer the ownership, and bye bye, string (happily with ZStr such an ownership transfer has to be explicitly requested, except for a temporary where it does not matter). So size has to take the argument by reference, at least if it should support a ZStr used directly as actual argument.

However, the two definitions above are overloads of a general size definition. The general definition deals with raw arrays and standard containers. And so to steer the compiler’s attention away from the general template (which won’t compile for a ZStr argument) both possible inferred formal argument types have to be provided – otherwise the general template would be a better match in some case, and with a modern compiler an avalanche of incomprehensible and mostly meaningless diagnostics would result.

I didn’t figure this out before writing the code; it’s quite subtle.

I did encounter that diagnostics avalanche! 🙂

Why implicit conversion to pointer, isn’t that Evil™?

Ownership quite sensibly and conventionally does not offer implicit conversion to its pointer type, but ZStr does. I added this implicit conversion quite late, when I saw that using ZStr for Xerces strings became annoyingly verbose without it. Generally such implicit conversions are pitfalls, dangerous, and in some cases downright Evil™, but, in particular, ZStr does not have any constructor that takes a single such pointer.

I did not originally design ZStr with this in mind, but it just turned out to support such conversion well because I had endeavoured to make most operations explicit instead of implicit – so that when an operation finally “needed” to be implicit it didn’t crash with other operations.

The advantages of clear, clean & consise usage do in my opinion outweight the more academic problems with the implicit conversion (e.g. → bool).

A usage example (Windows)

Few if any standard library functions produce dynamically allocated strings that must be deallocated in a special way. But many libraries (e.g. Xerces) do, and then ZStr comes in very handy. The example below is instead for the Windows API – and yes, this time I’ve tested with both the MingW g++ and the Visual C++ compilers! 🙂

Example using some Windows API functions, complete code:

#include <progrock/cppx/text/ZStr.h>
#include <iostream>
#include <stdexcept>
#include <stdlib.h>     // EXIT_SUCCESS, EXIT_FAILURE

#ifndef STRICT
#   define STRICT
#endif
#ifndef NOMINMAX
#   define NOMINMAX
#endif
#ifndef UNICODE
#   define UNICODE
#endif
#include <windows.h>

using namespace progrock;

bool throwX( char const s[] )           { throw std::runtime_error( s ); }
void disposeComString( wchar_t* p )     { CoTaskMemFree( p ); }
bool ok( HRESULT hr )                   { return SUCCEEDED( hr ); }

struct MsComLib
{
    MsComLib()  { ok( CoInitialize( 0 ) ) || throwX( "CoInitialize failed" ); }
    ~MsComLib() { CoUninitialize(); }
};

// Fudge output of known-to-be-Latin-1 wide string, since MingW g++ has no std::wcout.
std::ostream& operator<<( std::ostream& stream, wchar_t const* s )
{
    while( *s ) { stream << char( *s++ ); }
    return stream;
}

cppx::ZStrWChar stringFrom( CLSID const& classId )
{
    wchar_t*        pStr;
    HRESULT const   hr  = StringFromCLSID( classId, &pStr );

    ok( hr ) || throwX( "StringFromCLSID failed" );
    return cppx::ZStrWChar( pStr, &disposeComString );
}

cppx::ZStrWChar progIdFrom( CLSID const& classId )
{
    wchar_t*        pStr;
    HRESULT const   hr  = ProgIDFromCLSID( classId, &pStr );

    ok( hr ) || throwX( "ProgIDFromCLSID failed" );
    return cppx::ZStrWChar( pStr, &disposeComString );
}

int main()
{
    // Just an example of a class id that's probably also on your Windows system.
    CLSID const aClassId    =
    {
        0x00021401, 0x0000, 0x0000,
        { 0xC0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x46 }
    };

    try
    {
        MsComLib const  comLibUsage;

        std::cout
            << "Class id " << stringFrom( aClassId )
            << " is a \"" << progIdFrom( aClassId ) << "\"."
            << std::endl;
        return EXIT_SUCCESS;
    }
    catch( std::exception const& x )
    {
        std::cerr << "!" << x.what() << std::endl;
    }
    return EXIT_FAILURE;
}

Here the ZStr function results avoid any copying of the strings produced by the API functions: the strings can just be passed around, and will be deallocated properly, automatically.

Cheers, & … enjoy!

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s