The time is 65535:65535 PM, and the temperature is a nice comfy 4294967295 degrees… Oops. Clearly some signed integer values have been promoted to unsigned somewhere, or perhaps the program has just used unsigned integer arithmetic incorrectly.
There are two main cures for that:
- make sure that you’re only using unsigned arithmetic in expressions that involve unsigned type numerical values, or …
- make sure that you’re only using signed arithmetic in expressions that involve unsigned type numerical values.
As of 2010 most programmers still choose the first cure, attempting to use unsigned arithmetic wherever there are unsigned numerical values involved, e.g. being careful to use size_t
instead of int
for some loop counter. I think the main reason is that it’s an old convention (which is propagated by e.g. the C++ standard library), a case of “frozen history”, but it may also have something to do with each intrusion of unsigned numerical values being viewed as a purely local problem, suggesting that a purely local fix, each time, is appropriate. This then leads to a bug-attracting mix of signed and unsigned type expression, which moreover are difficult to recognize as respectively signed and unsigned type, and are easy to get wrong.
Here I’m advocating the second cure, a single global convention: do not mix signed and unsigned numerical types. The unsigned types are for bit-level operations (only), while the signed types are for numerical values (only). But since the standard library was designed before the advantages of this clear usage separation were fully realized, and indeed at a time when some computer architectures still conferred some advantages to e.g. an unsigned size()
, adopting this more rational and work-saving convention requires some support.
A signed size type
It’s only on a 32-bit architecture that int
can be safe instead of size_t
. In particular, in 64-bit Windows long int
is 32-bit while size_t
is 64-bit, although in 64-bit *nix long int
is typically 64-bit (see e.g. Wikipedia). Hence, some other signed type is needed for sizes and indexes, and that type is the standard library’s ptrdiff_t
. It’s guaranteed to be large enough for any array except for an array of char
filling more than half of the available address space. And that single special case is not only so rare as to not occur in practice, at all (except possibly for some embedded computer programming); it’s also a case that’s simply not fully supported by current C++.
The standard library defines two types, size_t
and ptrdiff_t
, not only to provide a choice between signed and unsigned but because these types are used differently: the names indicate different usages. size_t
is used for sizes, corresponding to the cppx Size
type below. And ptrdiff_t
is used for array indices and pointer offsets, corresponding to the cppx Index
type below.
In [progrock/cppx/primitive_types.h]:
namespace progrock{ namespace cppx{ // Some more stuff (e.g. definition of Byte) first, then ... namespace signedSize { CPPX_STATIC_ASSERT( sizeof( size_t ) == sizeof( ptrdiff_t ) ); typedef ptrdiff_t T; enum { maximum = ptrdiff_t( size_t( -1 ) / 2 ), minimum = -maximum - 1 // Assumes two's complement form. }; // Assert that at least ptrdiff_t is two's complement form. This assertion // is stronger, asserting also that there's no bounds checking. But AFAIK // it's all that's available from the standard library. CPPX_STATIC_ASSERT( std::numeric_limits<T>::is_modulo ); } // namespace signedSize typedef signedSize::T Size; typedef signedSize::T Index; } } // namespace progrock::cppx
Wrapper functions
C-style casts are generally bad, e.g. easily hiding or inadvertently introducing some reinterpretation or const
-casting, so writing Size(v.size())
is ungood. And C++-style casts are verbose, so writing static_cast<Size>(v.size())
is also ungood. Hence, a wrapper function is called for, writing just size(v)
:-).
Wrapper functions for size- and indexing- related things also allow you to treat raw arrays and container classes in a uniform way, writing e.g. just endOf(a)
, instead of a+sizeof(a)/sizeof(*a)
if a
is a raw array and a.end()
if a
is e.g. a std::vector
. Similarly, in cppx I define a startOf
function. These names are intentionally slightly unlike begin
and end
, so as to avoid any confusion (for size
I found it difficult to come up with some different name, though):
In [progrock/cppx/collections/iterator_range_util.h]:
namespace progrock{ namespace cppx{ template< typename T > inline Size size( T const& c ) { return static_cast<Size>( c.size() ); } template< typename T, Size N > inline Size size( T (&)[N] ) { return N; } template< typename T > inline typename It<T>::T startOf( T& c ) { return c.begin(); } template< typename T, Size N > inline T* startOf( T (&a)[N] ) { return a; } template< typename T > inline typename It<T>::T endOf( T& c ) { return c.end(); } template< typename T, Size N > inline T* endOf( T (&a)[N] ) { return a + N; } #define CPPX_ALL_OF( v ) \ ::progrock::cppx::startOf( v), ::progrock::cppx::endOf( v ) } } // namespace progrock::cppx
It’s possible to redesign size
so that it can be used at compile time, then wrapped via a macro, but although it’s an interesting academic exercise I have never needed that in practice, and the necessary macro for that complicates things. Another limitation is that with C++98 these functions can’t be used for locally defined classes, since in C++98 such classes have no linkage and template instantiation requires external linkage. However, like the first point this is not a problem in practice, and anyway as I understand it it’s been fixed in C++0x.
Summing up, with the above you can write e.g. for (Index i = size(s)-1; i >= 0; --i)
, and other such natural constructions, with no fear of value wrapping bugs. 🙂
Automatic iterator type deduction
For e.g. a std::vector<int> const
the startOf
and endOf
functions above automatically deduce the proper iterator type, via the cppx It
utility class template:
In [progrock/cppx/collections/It.h]:
namespace progrock{ namespace cppx{ template< typename Type > struct It { typedef typename Type::iterator T; }; template< typename Type > struct It< Type const > { typedef typename Type::const_iterator T; }; template< typename ElemType, Size N > struct It< ElemType[N] > { typedef ElemType* T; }; } } // namespace progrock::cppx
Cheers, & … enjoy! 🙂
Very interesting perspective. I definitely fall into the first category of “use unsigned as much as possible”. I have never thought of using signed value for all numerical values.
The main reason for me to use unsigned type is because stuff such as count and size feels more natural to be unsigned.
This is thought provoking.
About 64-bit and size_t:
Article. About size_t and ptrdiff_t
Lessons on development of 64-bit C/C++ applications
The time is -1:-1 PM
ops
You used a negative value…
Yes, it might seem to be the same. I only hinted at the problems since they are well known. You shouldn’t take that hint too literally… 😉
For those unfamiliar with the problems of unsigned arithmetic, here are some exercises that I think illustrate some of it:
1) Write a loop that counts down from 5 to 0 inclusive, and prints each number. First with signed type loop counter, and then with unsigned type loop counter. There are many solutions for the latter (none very intuitive).
2) For your C++ compiler and values of x in the range 3 through 7, what is the value of (x-5)/10 when x is respectively
int
andunsigned
?3) For general numbers, if you have a < b+k, then you also have a-k < b, and vice versa. Assume that k = 42. For what values of a and b is it safe to rewrite a < b+k as a-k < b when a and b are signed integers? What about unsigned?
Cheers & hth.,
– Alf
The more I think about it, the more I think you are right. I have been doing it wrong the whole time.
I like the argument in part 3. With signed type, it is always safe to rewrite a < b+k as a-k < b. There is no hole.
I have recently encountered a few bugs with subtraction and unsigned integer. Since most of the application uses unsigned type, the bug was difficult to track down. It wouldn't even be a problem if we consistently use signed type.
… Alan
It all depends on what you are doing.
There are counter examples :
1) a point for you 🙂
2) And what is it for x = -2147483643?
3) As you probably know, for large set of values for a and k, (a-k) is causing UB.
I don’t know if it is interesting but Scientific and Engineering C++ by Barton & Nackman also used
ptrdiff_t
:But only for subscripts; sizes were still represented by some unsigned:
(don’t ask me why it is “short”: I should recheck the book :-))
PS:
> for size I found it difficult to come up with some different name, though
I call it “count” (it’s a term I came up with since the C days, when I wanted to distinguish between the “count” of an array of chars and its “length” as a string).
Thanks. That’s much better than
size
! 🙂 I’ve consequently changed my cppx source tocountOf
. Happily not much other code was affected! I added the “of” suffix because purecount
might easily clash with client code names; because the “of” reads nicely; and because it matchesstartOf
andendOf
.The problem is that — as far as the standard is concerned — ptrdiff_t is merely guaranteed to be at least 16 bits (although in practice it usually follows the bitness of the machine and thus, as you say, works for all arrays except those that are extremely large whose element type is of size 1). Subtracting two pointers that are too far apart results in undefined behavior (see [expr.add]/6).
Furthermore, while there are gotchas to be aware of, you rarely perform dangerous operations on indexes, sizes or counts in general. I challenge you to find real use case where you perform (x-5)/10 where x is an index to an array.
As for “a < b+k", indeed one needs to be careful there and make sure that the expression on either side is still an index/size/count.
He he. 🙂 No, the limited guaranteed range for
ptrdiff_t
is not a problem on 32-bit or 64-bit systems, any more than it is a problem for forsize_t
. As you note,ptrdiff_t
is of guaranteed sufficient range for any valid pointer subtraction.But it can be a problem on 16-bit machines (which happily are quite rare!), because at least according to the N869 committee draft for C99,
ptrdiff_t
is required to be at least 17 bits.So, on such a machine it might introduce needless inefficiency to explicitly use
ptrdiff_t
, or indeed to do any pointer subtraction, since that’s defined to yieldptrdiff_t
result…But programming for a 16-bit machine you have to make many adjustments anyway, writing pretty machine-specific code and/or doing things in ways imposed by the limits of the machine, so it’s not like it’s practically relevant.
I agree that dangerous operations on indexes, sizes or counts in general are rare. But they’re there. Like, checking the result of
.size()
or.length()
against some computed limit. For example,s.length() > -5
will in practice almost always yield (unexpected)false
. That’s due to implicit promotion.Happily most compilers will warn about comparing signed and unsigned.
But as of 2010 they generally don’t warn about computed numerical results.
Cheers, & thanks for commenting!,
– Alf
Pingback: 2010 in review | Alf on programming (mostly C++)