The time is 65535:65535 PM, and the temperature is a nice comfy 4294967295 degrees… Oops. Clearly some signed integer values have been promoted to unsigned somewhere, or perhaps the program has just used unsigned integer arithmetic incorrectly.

There are two main cures for that:

- make sure that you’re only using
*unsigned arithmetic*in expressions that involve unsigned type numerical values, or … - make sure that you’re only using
*signed arithmetic*in expressions that involve unsigned type numerical values.

As of 2010 most programmers still choose the first cure, attempting to use unsigned arithmetic wherever there are unsigned numerical values involved, e.g. being careful to use `size_t`

instead of `int`

for some loop counter. I think the main reason is that it’s an old convention (which is propagated by e.g. the C++ standard library), a case of “frozen history”, but it may also have something to do with each intrusion of unsigned numerical values being viewed as a purely local problem, suggesting that a purely local fix, each time, is appropriate. This then leads to a **bug-attracting mix** of signed and unsigned type expression, which moreover are difficult to recognize as respectively signed and unsigned type, and are easy to get wrong.

Here I’m advocating the second cure, a single global convention: *do not mix signed and unsigned numerical types*. The unsigned types are for bit-level operations (only), while the signed types are for numerical values (only). But since the standard library was designed before the advantages of this clear usage separation were fully realized, and indeed at a time when some computer architectures still conferred some advantages to e.g. an unsigned `size()`

, adopting this more rational and work-saving convention requires some support.

### A signed size type

It’s only on a 32-bit architecture that `int`

can be safe instead of `size_t`

. In particular, in 64-bit Windows `long int`

is 32-bit while `size_t`

is 64-bit, although in 64-bit *nix `long int`

is typically 64-bit (see e.g. Wikipedia). Hence, some other signed type is needed for sizes and indexes, and that type is the standard library’s `ptrdiff_t`

. It’s guaranteed to be large enough for any array except for an array of `char`

filling more than half of the available address space. And that single special case is not only so rare as to not occur in practice, at all (except possibly for some embedded computer programming); it’s also a case that’s simply not fully supported by current C++.

The standard library defines two types, `size_t`

and `ptrdiff_t`

, not only to provide a choice between signed and unsigned but because these types are *used* differently: the names indicate different usages. `size_t`

is used for sizes, corresponding to the cppx ** Size** type below. And

`ptrdiff_t`

is used for array indices and pointer offsets, corresponding to the cppx **type below.**

`Index`

In [progrock/cppx/primitive_types.h]:

namespace progrock{ namespace cppx{ // Some more stuff (e.g. definition of Byte) first, then ... namespace signedSize { CPPX_STATIC_ASSERT( sizeof( size_t ) == sizeof( ptrdiff_t ) ); typedef ptrdiff_t T; enum { maximum = ptrdiff_t( size_t( -1 ) / 2 ), minimum = -maximum - 1 // Assumes two's complement form. }; // Assert that at least ptrdiff_t is two's complement form. This assertion // is stronger, asserting also that there's no bounds checking. But AFAIK // it's all that's available from the standard library. CPPX_STATIC_ASSERT( std::numeric_limits<T>::is_modulo ); } // namespace signedSize typedef signedSize::T Size; typedef signedSize::T Index; } } // namespace progrock::cppx

### Wrapper functions

C-style casts are generally bad, e.g. easily hiding or inadvertently introducing some reinterpretation or `const`

-casting, so writing `Size(v.size())`

is ungood. And C++-style casts are verbose, so writing `static_cast<Size>(v.size())`

is also ungood. Hence, a wrapper function is called for, writing just

:-).**size**(v)

Wrapper functions for size- and indexing- related things also allow you to treat raw arrays and container classes in a uniform way, writing e.g. just

, instead of **endOf**(a)`a+sizeof(a)/sizeof(*a)`

if `a`

is a raw array and `a.end()`

if `a`

is e.g. a `std::vector`

. Similarly, in cppx I define a

function. These names are intentionally slightly unlike **startOf**`begin`

and `end`

, so as to avoid any confusion (for `size`

I found it difficult to come up with some different name, though):

In [progrock/cppx/collections/iterator_range_util.h]:

namespace progrock{ namespace cppx{ template< typename T > inline Size size( T const& c ) { return static_cast<Size>( c.size() ); } template< typename T, Size N > inline Size size( T (&)[N] ) { return N; } template< typename T > inline typename It<T>::T startOf( T& c ) { return c.begin(); } template< typename T, Size N > inline T* startOf( T (&a)[N] ) { return a; } template< typename T > inline typename It<T>::T endOf( T& c ) { return c.end(); } template< typename T, Size N > inline T* endOf( T (&a)[N] ) { return a + N; } #define CPPX_ALL_OF( v ) \ ::progrock::cppx::startOf( v), ::progrock::cppx::endOf( v ) } } // namespace progrock::cppx

It’s possible to redesign `size`

so that it can be used at compile time, then wrapped via a macro, but although it’s an interesting academic exercise I have never needed that in practice, and the necessary macro for that complicates things. Another limitation is that with C++98 these functions can’t be used for locally defined classes, since in C++98 such classes have no linkage and template instantiation requires external linkage. However, like the first point this is not a problem in practice, and anyway as I understand it it’s been fixed in C++0x.

Summing up, with the above you can write e.g. `for (Index i = size(s)-1; i >= 0; --i)`

, and other such natural constructions, with no fear of value wrapping bugs. :-)

### Automatic iterator type deduction

For e.g. a `std::vector<int> const`

the `startOf`

and `endOf`

functions above automatically deduce the proper iterator type, via the cppx

utility class template:**It**

In [progrock/cppx/collections/It.h]:

namespace progrock{ namespace cppx{ template< typename Type > struct It { typedef typename Type::iterator T; }; template< typename Type > struct It< Type const > { typedef typename Type::const_iterator T; }; template< typename ElemType, Size N > struct It< ElemType[N] > { typedef ElemType* T; }; } } // namespace progrock::cppx

Cheers, & … enjoy! :-)

Very interesting perspective. I definitely fall into the first category of “use unsigned as much as possible”. I have never thought of using signed value for all numerical values.

The main reason for me to use unsigned type is because stuff such as count and size feels more natural to be unsigned.

This is thought provoking.

About 64-bit and size_t:

Article. About size_t and ptrdiff_t

Lessons on development of 64-bit C/C++ applications

The time is -1:-1 PM

ops

You used a negative value…

Yes, it might seem to be the same. I only hinted at the problems since they are well known. You shouldn’t take that hint too literally… ;-)

For those unfamiliar with the problems of unsigned arithmetic, here are some exercises that I think illustrate some of it:

1) Write a loop that counts down from 5 to 0 inclusive, and prints each number. First with signed type loop counter, and then with unsigned type loop counter. There are many solutions for the latter (none very intuitive).

2) For your C++ compiler and values of x in the range 3 through 7, what is the value of (x-5)/10 when x is respectively

`int`

and`unsigned`

?3) For general numbers, if you have a < b+k, then you also have a-k < b, and vice versa. Assume that k = 42. For what values of a and b is it safe to rewrite a < b+k as a-k < b when a and b are signed integers? What about unsigned?

Cheers & hth.,

– Alf

The more I think about it, the more I think you are right. I have been doing it wrong the whole time.

I like the argument in part 3. With signed type, it is always safe to rewrite a < b+k as a-k < b. There is no hole.

I have recently encountered a few bugs with subtraction and unsigned integer. Since most of the application uses unsigned type, the bug was difficult to track down. It wouldn't even be a problem if we consistently use signed type.

… Alan

It all depends on what you are doing.

There are counter examples :

1) a point for you :)

2) And what is it for x = -2147483643?

3) As you probably know, for large set of values for a and k, (a-k) is causing UB.

I don’t know if it is interesting but

Scientific and Engineering C++by Barton & Nackman also used`ptrdiff_t`

:But only for subscripts; sizes were still represented by some unsigned:

(don’t ask me why it is “short”: I should recheck the book :-))

PS:

> for size I found it difficult to come up with some different name, though

I call it “count” (it’s a term I came up with since the C days, when I wanted to distinguish between the “count” of an array of chars and its “length” as a string).

Thanks. That’s much better than

`size`

! :-) I’ve consequently changed my cppx source to`countOf`

. Happily not much other code was affected! I added the “of” suffix because pure`count`

might easily clash with client code names; because the “of” reads nicely; and because it matches`startOf`

and`endOf`

.The problem is that — as far as the standard is concerned — ptrdiff_t is merely guaranteed to be at least 16 bits (although in practice it usually follows the bitness of the machine and thus, as you say, works for all arrays except those that are extremely large whose element type is of size 1). Subtracting two pointers that are too far apart results in undefined behavior (see [expr.add]/6).

Furthermore, while there are gotchas to be aware of, you rarely perform dangerous operations on indexes, sizes or counts in general. I challenge you to find real use case where you perform (x-5)/10 where x is an index to an array.

As for “a < b+k", indeed one needs to be careful there and make sure that the expression on either side is still an index/size/count.

He he. :-) No, the limited guaranteed range for

`ptrdiff_t`

is not a problem on 32-bit or 64-bit systems, any more than it is a problem for for`size_t`

. As you note,`ptrdiff_t`

is of guaranteed sufficient range for any valid pointer subtraction.But it can be a problem on 16-bit machines (which happily are quite rare!), because at least according to the N869 committee draft for C99,

`ptrdiff_t`

is required to be at least17 bits.So, on such a machine it might introduce needless inefficiency to explicitly use

`ptrdiff_t`

, or indeed to do any pointer subtraction, since that’s defined to yield`ptrdiff_t`

result…But programming for a 16-bit machine you have to make many adjustments anyway, writing pretty machine-specific code and/or doing things in ways imposed by the limits of the machine, so it’s not like it’s practically relevant.

I agree that dangerous operations on indexes, sizes or counts in general are rare. But they’re there. Like, checking the result of

`.size()`

or`.length()`

against some computed limit. For example,`s.length() > -5`

will in practice almost always yield (unexpected)`false`

. That’s due to implicit promotion.Happily most compilers will warn about comparing signed and unsigned.

But as of 2010 they generally don’t warn about computed numerical results.

Cheers, & thanks for commenting!,

– Alf

Pingback:2010 in review | Alf on programming (mostly C++)