Part I of III: The LSP and value assignment.
Abstract:
Liskov’s principle of substitution, a principle of generally good class design for polymorphism, strongly implies the slicing behavior of C++ value assignment. Destination value slicing can cause a partial assignment, which can easily break data integrity. Three solutions are (1) static checking of assignment, (2) dynamic checking of assignment, and (3), generally easiest, prohibiting assignment, then possibly providing cloning functionality as an alternative.
Contents:
- > A short review of the LSP.
- > How the LSP (almost) forces slicing in C++.
- > The danger of slicing: type constraint violations.
- > The possible design alternatives wrt. providing value assignment.
- > How to provide assignment with STATIC CHECKING of type constraints.
- > How to provide assignment with DYNAMIC CHECKING of type constraints.
- > How to provide CLONING as an alternative to assignment.
> A short review of the LSP.
The Liskov substitution principle, often referred to as just the LSP, identifies one particular mathematical notion of subtype with the programming notion of class extension such as C++ class derivation.
It’s a principle of generally good class design for polymorphism, namely, that when a class Derived
extends a class Base
, then with respect to the properties guaranteed by the Base
specification,
code that refers to a Base
object should work just as well if that object is actually a Derived
object,
except for checking of dynamic type such as use of C++’s typeid
.
The original formulation, from Liskov’s keynote address at the 1987 OOPSLA conference:
– What is wanted here is something like the following substitution property: If for each object o1 of type S there is an object o2 of type T such that for all programs P defined in terms of T, the behavior of P is unchanged when o1 is substituted for o2 then S is a subtype of T.
This original tentative definition fails to accommodate direct checks of dynamic type such as C++’s typeid
and dynamic_cast
. Secondly, it fails to consider the mixing of S and T in the same program, such as with C++ slicing (I discuss that in the next section). And third, perhaps most crucially, it fails to make explicit that what it’s about is that subtype is a very desirable property of class extension in programming. Instead it reads more like a dry mathematical definition of the term subtype. But with just a bit of intelligence and positive interpretation applied to it it is, in my humble opinion, a quite good definition, pointing out the way.
A simplified formulation by Liskov and Wing in 1994:
– Subtype requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be provable for objects y of type S where S is a subtype of T.
This now famous and classic quote is, for example, the quote used in Wikipedia’s article on the LSP. A main improvement on Liskov’s original definition is that this is much more clearly a constraint on what a subtype can be, what a subtype is allowed to be, and not just one possibility of subtype, but this improved clarity is offset by a lack of existential/universal quantification (e.g. does “objects” mean “some objects” or “all objects”?). And, it still fails to accommodate direct checks of dynamic type, it appears to still fail to consider mixing of S and T, and even though the article contains many very suggestive examples, it still fails to make explicit that it’s about how to design good class extensions in programming.
Robert C. Martin a.k.a. uncle Bob’s more practically oriented 1996 explanation of the LSP from his Engineering Notebook column in the C++ Report, finally makes the connection to class extension explicit, and does properly consider mixing of types in the same program:
– Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it.
The term subtype, which only served to anchor the mathematically oriented definitions, has been dropped entirely, simplifying things. And it seems that all that remains is to accommodate checks of dynamic type. I hope that my explanation which does that, given at the start above, can serve you well.
> How the LSP (almost) forces slicing in C++.
How does the Liskov substitution principle fare in a value-based statically typed language like C++?
Well, when I first learned about the C++ rules for assignment I did not think in terms of the LSP but more in terms of the low level memory layout. And from that point of view a value of a derived class would in general be larger, more bytes, than a value of the base class, and would just not fit into a base class variable. So, I just could not believe that the rules were like I read that they were. What, you can assign a value of a derived class (C++ class extension) to a variable of the base class? But there’s no room, for Pete’s sake!
Behind that thinking was an assumption that I failed to think critically about, namely an (incorrect) assumption that assignment should not lose information about the assigned value. Essentially this incorrect assumption resulted from an incorrect perception of the ideal requirement that after a
=
b;
an assert(
a
==
b
)
should hold. My perception was incorrect because with static type checking and non-virtual call binding, as C++ supports, this assertion can and will in general hold even when the assignment loses the derived class specific information about the b
value:
#include <assert.h> // assert #include <string> // std::string using namespace std; struct Person { string name; bool isEqualTo( Person const& other ) const { return (name == other.name); } Person( string const& _name ) : name( _name ) {} }; bool operator==( Person const& a, Person const& b ) { return a.Person::isEqualTo( b ); } struct Student : Person { int studentId; bool isEqualTo( Student const& other ) const { return (Person::isEqualTo( other ) && studentId == other.studentId); } Student( string const& _name, int const _studentId ) : Person( _name ), studentId( _studentId ) {} }; bool operator==( Student const& a, Student const& b ) { return a.Student::isEqualTo( b ); } int main() { Person a( "Hillary Clinton" ); Student b( "John Smith", 12345 ); a = b; // I once expected this assignment to NOT COMPILE! // Here there is no `a.studentId`. The Student value has been SLICED. assert( a == b ); // The Person class' isEqualTo still says "equal". }However, this C++ slicing behavior where only the
Person
base class part of theStudent
object is copied, which at first can be pretty counter-intuitive & surprising, is not at all an arbitrary language design choice: it is a logical consequence of the LSP and more lower level rules of C++.First, the LSP requires that in a sound design any
Student
object is aPerson
object, because classStudent
is an extension ofPerson
(modeling a subtype relationship). This means in particular that anyPerson
reference can be initialized to refer to aStudent
object. And secondly, tying in to that, in the absence of a user-declared copy assignment operator for a class T, C++ provides an automatically generated one. The automatically generated copy assignment operator for a class T takes an argument of type Tconst&
. And for T asPerson
, the LSP requires that in a sound design this reference-to-Person
formal argument can be initialized to refer to anyStudent
actual argument…The generated
operator=
copy assignment operator is a member function that also has a T&
result in order to support assignment chaining, but for clarity & also for good programming habits (even though this choice is probably unusual to many readers), let’s pretend that its result type isvoid
:[value_assignment.v2.cpp] ← Just click the link to compile and run the program#include <assert.h> // assert #include <string> // std::string using namespace std; struct Person { string name; void setValue( Person const& other ) { name = other.name; } void operator=( Person const& other ) { setValue( other ); } Person( string const& _name ) : name( _name ) {} }; struct Student : Person { int studentId; void setValue( Student const& other ) { Person::setValue( other ); // Copies the name part. studentId = other.studentId; // Copies the id part. } void operator=( Student const& other ) { setValue( other ); } Student( string const& _name, int const _studentId ) : Person( _name ), studentId( _studentId ) {} }; int main() { Person a( "Hillary Clinton" ); Student b( "John Smith", 12345 ); a = b; // Only the Person::operator= matches the arguments. // Here there is no `a.studentId`. The Student value has been SLICED. // The executed operator=() function copies only the name attribute. assert( a.name == b.name ); }With the above simulation of what the C++ rules generate for
=
, hopefully it’s clear that slicing can occur in just about any programming language, even one that’s wholly based on reference semantics.If you don’t see that immediately, then take some time to study the above code, to make sure of how it works.
> The danger of slicing: type constraint violations.
In order to make the
Person
andStudent
classes more safe, the programmer now makes the dynamic state (member variables)private
, and addspublic
inspector member functionsPerson::name
andStudent::studentId
. The main idea is to ensure data integrity by restricting the possible operations to a defined set of operations for each type. The access restrictions are intended to prevent client code from e.g. inadvertently changing the name of a student without updating the student id correspondingly.I.e., the design with the access restrictions is intended to enforce a very strong coupling between name and student id.
Unfortunately, a copy assignment
operator=
with unrestricted access is among the operations that C++ provides automatically, by default…[value_assignment.v3.cpp] ← Just click the link to compile and run the program#include <string> #include <iostream> using namespace std; class Person { private: string name_; public: string name() const { return name_; } Person( string const& name ) : name_( name ) {} }; class Student : public Person { private: int studentId_; public: int studentId() const { return studentId_; } Student( string const& name, int const studentId ) : Person( name ), studentId_( studentId ) {} }; int main() { Person a( "Hillary Clinton" ); Student b( "John Smith", 12345 ); Person& bAsPerson = b; bAsPerson = a; cout << "Name: " << b.name() << ". Student id: " << b.studentId() << "." << endl; }Result:
Name: Hillary Clinton. Student id: 12345.Uh oh. Hillary Clinton has suddenly, perhaps inexplicably, become a student at some university. It’s not that she doesn’t have anything more to learn, or that she has no time to waste on studying, but the access declarations were intended to prevent this!
The type constraint that is violated is one that is not explicitly expressed in the code above, namely that after construction of a
Student
value, it should not be possible to modify parts of the value. For example, it should ideally not be possible to create new values via partial assignment. Yet that’s what happened above; if the first slicing example, which was a bit counter-intuitive, can be called source value slicing, then what we have here, which breaks critical assumptions about data integrity, may be called destination value slicing.> The possible design alternatives wrt. providing value assignment.
There are two main design alternatives with respect to assignment: prohibit assignment, or permit assignment.
When assignment is permitted then for a class that does permit extension (having derived classes) there are further sub-alternatives: no checking of type constraints, dynamic (run time) checking of type constraints, and static (compile time) checking of type constraints:
Assignment: | Checking: | ||
---|---|---|---|
✗ nope | ← the practical choice | ||
✓ yes | none | ← the previous section’s example | |
✓ yes | dynamic | ||
✓ yes | static |
The previous section’s example program was in the category of permitting assignment, with no checking.
> How to provide assignment with STATIC CHECKING of type constraints.
The LSP practically forces slicing for value assignment, which we’ve now seen generally opens up for type constraint violations, because extensions of the assignable class generally can exist and an instance of such a class can be used as the base class’s assignment operator’s this
argument. In order to prevent that calamity with compile time enforcement, it’s necessary that each assignable class is final, i.e. that it cannot be further extended. Happily C++11 provides a simple way to ensure this, namely simply by declaring the class final
.
One little problem with that, though, is that none of the compilers that I have installed yet support the C++11 final
feature, so in the code below there is a long and rather awkward C++03-compatible workaround…
However one implements the final
functionality, the main idea is then to let the main hierarchy of classes be non-assignable, but to provide assignable final
leaf-classes whose assignment operators will only accept actual arguments of those exact classes:
// Remove "x" to invoke feature. #define xTEST_BAD_ASSIGN #if defined( CPP11 ) # define OVERRIDE override # define NOEXCEPT noexcept #else # define OVERRIDE # define NOEXCEPT throw() #endif //--------------------------------------------------------------------------- #include <algorithm> // std::swap #include <iostream> // std::wcout, std::endl #include <string> // std::string #include <utility> // std::forward using namespace std; #if defined( CPP11 ) template< class Base > class Assignable_ final : public Base { public: using Base::Base; using Base::operator=; }; #else template< class Base > class Assignable_; class IsFinalAssignable { template< class > friend class Assignable_; private: IsFinalAssignable() {} }; template< class Base > class Assignable_ : private virtual IsFinalAssignable , public Base { public: Assignable_(): Base() {} template< class Arg1 > Assignable_( Arg1 const& a1 ) : Base( a1 ) {} template< class Arg1, class Arg2 > Assignable_( Arg1 const& a1, Arg2 const& a2 ) : Base( a1, a2 ) {} using Base::operator=; }; #endif class Person { private: string name_; protected: void swapWith( Person& other ) NOEXCEPT { swap( name_, other.name_ ); } void operator=( Person other ) { swapWith( other ); } public: typedef Assignable_<Person> Assignable; string name() const { return name_; } Person( string const& name ) : name_( name ) {} }; class Student : public Person { private: int studentId_; protected: void swapWith( Student& other ) NOEXCEPT { Person::swapWith( other ); swap( studentId_, other.studentId_ ); } void operator=( Student other ) { swapWith( other ); } public: typedef Assignable_<Student> Assignable; int studentId() const { return studentId_; } Student( string const& name, int const studentId ) : Person( name ), studentId_( studentId ) {} }; #ifdef TEST_EXTENSION class Extension : public Person::Assignable { public: Extension(): Person::Assignable( "Somebody" ) {} }; #endif wostream& operator<<( wostream& stream, string const& s ) { return (stream << s.c_str()); } int main() { Person::Assignable a( "Hillary Clinton" ); Student::Assignable b1( "John Smith", 12345 ); Student b2( "Mary Jones", 88888 ); Person& bAsPerson = b1; // OK. wcout << "Name: " << b1.name() << ". Student id: " << b1.studentId() << "." << endl; #ifdef TEST_BAD_ASSIGN bAsPerson = a; // Does not compile. 🙂 #else b1 = b2; // Good/allowed assignment. #endif wcout << "Name: " << b1.name() << ". Student id: " << b1.studentId() << "." << endl; }It may perhaps look dangerous that
Assignable_<
T>::operator=
actually is the T::operator=
, which usually takes a Tconst&
formal argument, since that allows slicing, and hey, wasn’t slicing the problem that this was intended to avoid?However, it only permits source value slicing, which is only a problem in that it is a bit counter-intuitive – and may be useful to the client code. The left hand side of the assignment is restricted to
Assignable_<
T>&
. I.e., the destination value slicing that could cause dangerous partial assignments that could violate the data integrity, is avoided.> How to provide assignment with DYNAMIC CHECKING of type constraints.
Dynamic checking of assignment is easier to implement than static checking, yields fewer classes (because one doesn’t have to provide otherwise unneeded leaf classes), and provides more uniform properties for the classes. But this is all paid for by a huge cost: that error detection is moved to run-time. That is the opposite of what C++ is all about relative to C, namely better static type checking and early compile time error detection, and it means a lot more work in systematic testing.
The dynamic checking is easier to implement because all that’s needed is to use virtual assignment, with a single override in each directly derived class. If that override is ever called, then there is a destination value slicing in the making, and so a logic error should be reported. The best way to report a logic error depends, of course, but in general it’s a good idea to terminate because very little can then be relied on.
For simplicity I let the example program below terminate via an
assert
:[value_assignment.v5.cpp] ← Just click the link to compile and run the program// Remove "x" to invoke feature. #define xTEST_BAD_ASSIGN #if defined( CPP11 ) # define OVERRIDE override # define NOEXCEPT noexcept #else # define OVERRIDE # define NOEXCEPT throw() #endif //--------------------------------------------------------------------------- #include <assert.h> // assert #include <algorithm> // std::swap #include <exception> // std::terminate #include <string> // std::string #include <iostream> // std::wcout, std::wcerr, std::endl using namespace std; #define FATAL_ERROR( s ) \ assert( !s ); wcerr << s << endl; terminate(); class Person { private: string name_; public: string name() const { return name_; } virtual void swapWith( Person& other ) NOEXCEPT { using std::swap; swap( name_, other.name_ ); } void operator=( Person other ) { swapWith( other ); } Person( string const& name ) : name_( name ) {} }; class Student : public Person { private: int studentId_; virtual void swapWith( Person& ) NOEXCEPT OVERRIDE { FATAL_ERROR( "Student::swapWith( Person& ): dest. value slicing" ); } public: int studentId() const { return studentId_; } virtual void swapWith( Student& other ) NOEXCEPT { Person::swapWith( other ); using std::swap; swap( studentId_, other.studentId_ ); } void operator=( Student other ) { swapWith( other ); } Student( string const& name, int const studentId ) : Person( name ), studentId_( studentId ) {} }; wostream& operator<<( wostream& stream, string const& s ) { return (stream << s.c_str()); } int main() { Person a( "Hillary Clinton" ); Student b1( "John Smith", 12345 ); Student b2( "Mary Jones", 88888 ); Person& bAsPerson = b1; // OK. wcout << "Name: " << b1.name() << ". Student id: " << b1.studentId() << "." << endl; #ifdef TEST_BAD_ASSIGN bAsPerson = a; // Terminates at run-time (i.e. crash). #else b1 = b2; // Good/allowed assignment. #endif wcout << "Name: " << b1.name() << ". Student id: " << b1.studentId() << "." << endl; }> How to provide CLONING as an alternative to assignment.
The C++11 way to absolutely remove copy assignment is to write
=
delete
after a copy assignment declaration. For good measure it can be madeprivate
. In C++03, or if your compiler does not yet support=
delete
, do make itprivate
and then just remove the=
delete
and refrain from implementing the operator.Still, it might desirable to have a way to copy instances, and for that you can provide a safe cloning method, a method that creates a dynamically allocated copy of the object that it’s called on. In order to be safe it should return a smart pointer such as C++11
std::unique_ptr<
T>
. However, by the rules of C++ a method returningunique_ptr<Derived>
will not be regarded as an override of a method returningunique_ptr<Base>
– i.e. C++ does not support so called covariance for such result types – so some trickery is necessary.A simple solution is to let the
public
interface have a non-virtual cloning method returningunique_ptr<
T>
, and to have aprivate
orprotected
virtual cloning method that returns a T*
:[value_assignment.v6.cpp] ← Just click the link to compile and run the program// Remove "x" to invoke feature. #define xTEST_ABSTRACT_CLONE #if defined( CPP11 ) # define OVERRIDE override # define IS_DELETED = delete # include <memory> template< class Type > struct Ownership { using Ptr = std::unique_ptr<Type>; }; #else # define OVERRIDE # define IS_DELETED # include <memory> // Note: std::auto_ptr is deprecated. template< class Type > struct Ownership { typedef std::auto_ptr<Type> Ptr; }; #endif //--------------------------------------------------------------------------- #include <assert.h> // assert #include <iostream> // std::wcout, std::wcerr, std::endl #include <memory> // std::unique_ptr #include <string> // std::string #include <typeinfo> // std::typeinfo, i.e. typeid usage. using namespace std; class Person { private: string name_; Person& operator=( Person const& ) IS_DELETED; virtual Person* virtualClone() const { assert( typeid( *this ) == typeid( Person ) ); return new Person( *this ); } public: string name() const { return name_; } Ownership<Person>::Ptr clone() const { return Ownership<Person>::Ptr( virtualClone() ); } Person( string const& name ) : name_( name ) {} }; class Student : public Person { private: int studentId_; Student& operator=( Student const& ) IS_DELETED; virtual Student* virtualClone() const OVERRIDE { assert( typeid( *this ) == typeid( Student ) ); return new Student( *this ); } public: int studentId() const { return studentId_; } Ownership<Person>::Ptr clone() const { return Ownership<Person>::Ptr( virtualClone() ); } Student( string const& name, int const studentId ) : Person( name ), studentId_( studentId ) {} }; wostream& operator<<( wostream& stream, string const& s ) { return (stream << s.c_str()); } int main() { Person a( "Hillary Clinton" ); Student b1( "John Smith", 12345 ); Student b2( "Mary Jones", 88888 ); Person& bAsPerson = b1; // OK. Ownership<Person>::Ptr pClone; wcout << "Name: " << b1.name() << ". Student id: " << b1.studentId() << "." << endl; #ifdef TEST_ABSTRACT_CLONE pClone = bAsPerson.clone(); // OK, clones John Smith the Student. #else pClone = b2.clone(); // More obviously (?) OK, clones Mary. #endif Student& studentRef = dynamic_cast<Student&>( *pClone ); wcout << "Name: " << studentRef.name() << ". Student id: " << studentRef.studentId() << "." << endl; }This works nicely, but the code above has much undesirable redundancy. In each class there is a mindless repetition of the definitions of
virtualClone
andclone
. As of 2012 this is known as a violation of the DRY principle: Don’t Repeat Yourself.In an earlier blog posting I discussed three ways to abstract up a common implementation of cloning, namely dominance in sideways inheritance, middleman class inheritance, and a macro. I then recommended the macro approach, for the good reasons of being simplest and smallest. However, with the constructor inheritance support of C++11 the middleman class approach – a class inserted in the inheritance chain – becomes much more attractive.
Since none of my installed compilers support C++11 constructor inheritance, in the code below I just use forwarding of arguments. And I don’t even use the C++11 support for argument forwarding, since as of this writing Visual C++ (version 10.0 and 11 beta) does not yet support that. But in your mind you can envision a `using Base::Base;` there instead of the primitive argument forwarding constructors:
[value_assignment.v7.cpp] ← Just click the link to compile and run the program// Remove "x" to invoke feature. #define xTEST_ABSTRACT_CLONE #if defined( CPP11 ) # define OVERRIDE override # define IS_DELETED = delete # include <memory> template< class Type > struct Ownership { using Ptr = std::unique_ptr<Type>; }; #else # define OVERRIDE # define IS_DELETED # include <memory> // Note: std::auto_ptr is deprecated. template< class Type > struct Ownership { typedef std::auto_ptr<Type> Ptr; }; #endif //--------------------------------------------------------------------------- #include <assert.h> // assert #include <iostream> // std::wcout, std::wcerr, std::endl #include <memory> // std::unique_ptr #include <string> // std::string #include <typeinfo> // std::typeinfo, i.e. typeid usage. using namespace std; class Cloneable { private: virtual Cloneable* virtualClone() const = 0; protected: virtual ~Cloneable() {} }; template< class DerivedClass, class BaseClass > class WithCloningOf_ : public BaseClass { private: WithCloningOf_& operator=( WithCloningOf_ const& ) IS_DELETED; virtual WithCloningOf_* virtualClone() const { assert( typeid( *this ) == typeid( DerivedClass ) ); return new DerivedClass( *static_cast<DerivedClass const*>( this ) ); } DerivedClass* derivedClassClone() const { return static_cast<DerivedClass*>( virtualClone() ); } public: typename Ownership<DerivedClass>::Ptr clone() const { return typename Ownership<DerivedClass>::Ptr( derivedClassClone() ); } WithCloningOf_(): BaseClass() {} template< class Arg1 > WithCloningOf_( Arg1 const& a1 ) : BaseClass( a1 ) {} template< class Arg1, class Arg2 > WithCloningOf_( Arg1 const& a1, Arg2 const& a2 ) : BaseClass( a1, a2 ) {} }; class Person : public WithCloningOf_<Person, Cloneable> { typedef WithCloningOf_<Person, Cloneable> Base; private: string name_; Person& operator=( Person const& ) IS_DELETED; public: string name() const { return name_; } Person( string const& name ) : name_( name ) {} }; class Student : public WithCloningOf_<Student, Person> { typedef WithCloningOf_<Student, Person> Base; private: int studentId_; Student& operator=( Student const& ) IS_DELETED; public: int studentId() const { return studentId_; } Student( string const& name, int const studentId ) : Base( name ), studentId_( studentId ) {} }; wostream& operator<<( wostream& stream, string const& s ) { return (stream << s.c_str()); } int main() { Person a( "Hillary Clinton" ); Student b1( "John Smith", 12345 ); Student b2( "Mary Jones", 88888 ); Person& bAsPerson = b1; // OK. Ownership<Person>::Ptr pClone; wcout << "Name: " << b1.name() << ". Student id: " << b1.studentId() << "." << endl; #ifdef TEST_ABSTRACT_CLONE pClone = bAsPerson.clone(); // OK, clones John Smith the Student. #else pClone = b2.clone(); // More obviously (?) OK, clones Mary. #endif Student& studentRef = dynamic_cast<Student&>( *pClone ); wcout << "Name: " << studentRef.name() << ". Student id: " << studentRef.studentId() << "." << endl; }With this approach there are two ways to copy an object: copy construction, and cloning.
Neither of these operations can produce the effect of a partial assignment, i.e., they’re safe. Cloning doesn’t slice at all, as the above two programs illustrate. Copy construction may incur a source value slicing, but that’s safe: it’s destination value slicing that’s dangerous, and that you can avoid by using e.g. the techniques I’ve exemplified in this article.
The plan, such as it is, is to discuss covariance and contravariance in the next installment, and then in a third article, C++ construction (a problem I bumped into while implementing some thread support). However, what will be, will be. We shall see. 🙂
Cheers, & enjoy!,
– Alf
Good explanation, but the code formatting is very bad. Bad formatting starts after the first code snippet
sorry, wordpress has evidently changed its rules about how it replaces the posted HTML with its own idea of what it should be. argh! i am sorry i don’t have the time to fix this (as you can see i haven’t had the time to continue the series, as i wanted to)