Unions
Unions, a low-level feature inherited from the C programming language, allow more than one member to share the same memory area.
Unions are very similar to structs with the following main differences:
- Unions are defined by the
union
keyword. - The members of a
union
are not independent; they share the same memory area.
Just like structs, unions can have member functions as well.
The examples below will produce different results depending on whether they are compiled on a 32-bit or a 64-bit environment. To avoid getting confusing results, please use the -m32
compiler switch when compiling the examples in this chapter. Otherwise, your results may be different than mine due to alignment, which we will see in a later chapter.
Naturally, struct
objects are as large as necessary to accommodate all of their members:
// Note: Please compile with the -m32 compiler switch struct S { int i; double d; } // ... writeln(S.sizeof);
Since int
is 4 bytes long and double
is 8 bytes long, the size of that struct
is the sum of their sizes:
12
In contrast, the size of a union
with the same members is only as large as its largest member:
union U { int i; double d; } // ... writeln(U.sizeof);
The 4-byte int
and the 8-byte double
share the same area. As a result, the size of the entire union
is the same as its largest member:
8
Unions are not a memory-saving feature. It is impossible to fit multiple data into the same memory location. The purpose of a union is to use the same area for different type of data at different times. Only one of the members can be used reliably at one time. However, although doing so may not be portable to different platforms, union
members can be used for accessing fragments of other members.
One of the examples below takes advantage of typeid
to disallow access to members other than the one that is currently valid.
The following diagram shows how the 8 bytes of the union
above are shared by its members:
0 1 2 3 4 5 6 7 ───┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬─── │<─── 4 bytes for int ───> │ │<─────────────── 8 bytes for double ────────────────>│ ───┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴───
Either all of the 8 bytes are used for the double
member, or only the first 4 bytes are used for the int
member and the other 4 bytes are unused.
Unions can have as many members as needed. All of the members would share the same memory location.
The fact that the same memory location is used for all of the members can have surprising effects. For example, let's initialize a union
object by its int
member and then access its double
member:
auto u = U(42); // initializing the int member writeln(u.d); // accessing the double member
Initializing the int
member by the value 42 sets just the first 4 bytes, and this affects the double
member in an unpredictable way:
2.07508e-322
Depending on the endianness of the microprocessor, the 4 bytes may be arranged in memory as 0|0|0|42, 42|0|0|0, or in some other order. For that reason, the value of the double
member may appear differently on different platforms.
Anonymous unions
Anonymous unions specify what members of a user-defined type share the same area:
struct S { int first; union { int second; int third; } } // ... writeln(S.sizeof);
The last two members of S
share the same area. So, the size of the struct
is a total of two int
s: 4 bytes needed for first
and another 4 bytes to be shared by second
and third
:
8
Dissecting other members
Unions can be used for accessing individual bytes of variables of other types. For example, they make it easy to access the 4 bytes of an IPv4 address individually.
The 32-bit value of the IPv4 address and a fixed-length array can be defined as the two members of a union
:
union IpAddress { uint value; ubyte[4] bytes; }
The members of that union
would share the same memory area as in the following figure:
0 1 2 3
───┬──────────┬──────────┬──────────┬──────────┬───
│ <──── 32 bits of the IPv4 address ────> │
│ bytes[0] │ bytes[1] │ bytes[2] │ bytes[3] │
───┴──────────┴──────────┴──────────┴──────────┴───
For example, when an object of this union
is initialized by 0xc0a80102 (the value that corresponds to the dotted form 192.168.1.2), the elements of the bytes
array would automatically have the values of the four octets:
import std.stdio; void main() { auto address = IpAddress(0xc0a80102); writeln(address.bytes); }
When run on a little-endian system, the octets would appear in reverse of their dotted form:
[2, 1, 168, 192]
The reverse order of the octets is another example of how accessing different members of a union
may produce unpredictable results. This is because the behavior of a union
is guaranteed only if that union
is used through just one of its members. There are no guarantees on the values of the members other than the one that the union
has been initialized with.
Although it is not directly related to this chapter, bswap
from the core.bitop
module is useful in dealing with endianness issues. bswap
returns its parameter after swapping its bytes. Also taking advantage of the endian
value from the std.system
module, the octets of the previous IPv4 address can be printed in the expected order after swapping its bytes:
import std.system; import core.bitop; // ... if (endian == Endian.littleEndian) { address.value = bswap(address.value); }
The output:
[192, 168, 1, 2]
Please take the IpAddress
type as a simple example; in general, it would be better to consider a dedicated networking module for non-trivial programs.
Examples
Communication protocol
In some protocols like TCP/IP, the meanings of certain parts of a protocol packet depend on a specific value inside the same packet. Usually, it is a field in the header of the packet that determines the meanings of successive bytes. Unions can be used for representing such protocol packets.
The following design represents a protocol packet that has two kinds:
struct Host { // ... } struct ProtocolA { // ... } struct ProtocolB { // ... } enum ProtocolType { A, B } struct NetworkPacket { Host source; Host destination; ProtocolType type; union { ProtocolA aParts; ProtocolB bParts; } ubyte[] payload; }
The struct
above can make use of the type
member to determine whether aParts
or bParts
of the union
to be used.
Discriminated union
Discriminated union is a data structure that brings type safety over a regular union
. Unlike a union
, it does not allow accessing the members other than the one that is currently valid.
The following is a simple discriminated union type that supports only two types: int
and double
. In addition to a union
to store the data, it maintains a TypeInfo
member to know which one of the two union
members is valid.
import std.stdio; import std.exception; struct Discriminated { private: TypeInfo validType_; union { int i_; double d_; } public: this(int value) { // This is a call to the property function below: i = value; } // Setter for 'int' data void i(int value) { i_ = value; validType_ = typeid(int); } // Getter for 'int' data int i() const { enforce(validType_ == typeid(int), "The data is not an 'int'."); return i_; } this(double value) { // This is a call to the property function below: d = value; } // Setter for 'double' data void d(double value) { d_ = value; validType_ = typeid(double); } // Getter for 'double' data double d() const { enforce(validType_ == typeid(double), "The data is not a 'double'." ); return d_; } // Identifies the type of the valid data const(TypeInfo) type() const { return validType_; } } unittest { // Let's start with 'int' data auto var = Discriminated(42); // The type should be reported as 'int' assert(var.type == typeid(int)); // 'int' getter should work assert(var.i == 42); // 'double' getter should fail assertThrown(var.d); // Let's replace 'int' with 'double' data var.d = 1.5; // The type should be reported as 'double' assert(var.type == typeid(double)); // Now 'double' getter should work ... assert(var.d == 1.5); // ... and 'int' getter should fail assertThrown(var.i); }
This is just an example. You should consider using Algebraic
and Variant
from the std.variant
module in your programs. Additionally, this code could take advantage of other features of D like templates and mixins to reduce code duplication.
Regardless of the data that is being stored, there is only one Discriminated
type. (An alternative template solution could take the data type as a template parameter, in which case each instantiation of the template would be a distinct type.) For that reason, it is possible to have an array of Discriminated
objects, effectively enabling a collection where elements can be of different types. However, the user must still know the valid member before accessing it. For example, the following function determines the actual type of the valid data with the type
property of Discriminated
:
void main() { Discriminated[] arr = [ Discriminated(1), Discriminated(2.5) ]; foreach (value; arr) { if (value.type == typeid(int)) { writeln("Working with an 'int' : ", value.i); } else if (value.type == typeid(double)) { writeln("Working with a 'double': ", value.d); } else { assert(0); } } }
Working with an 'int' : 1 Working with a 'double': 2.5