Pointers
Pointers are variables that provide access to other variables. The value of a pointer is the address of the variable that it provides access to.
Pointers can point at any type of variable, object, and even other pointers. In this chapter, I will refer to all of these simply as variables.
Pointers are low level features of microprocessors. They are an important part of system programming.
The syntax and semantics of pointers in D are inherited directly from C. Although pointers are notoriously the most difficult feature of C to comprehend, they should not be as difficult in D. This is because other features of D that are semantically close to pointers are more useful in situations where pointers would have to be used in other languages. When the ideas behind pointers are already understood from those other features of D, pointers should be easier to grasp.
The short examples throughout the most of this chapter are decidedly simple. The programs at the end of the chapter will be more realistic.
The names like ptr
(short for "pointer") that I have used in these examples should not be considered as useful names in general. As always, names must be chosen to be more meaningful and explanatory in actual programs.
The concept of a reference
Although we have encountered references many times in the previous chapters, let's summarize this concept one more time.
The ref
variables in foreach
loops
As we have seen in the foreach
Loop chapter, normally the loop variables are copies of elements:
import std.stdio; void main() { int[] numbers = [ 1, 11, 111 ]; foreach (number; numbers) { number = 0; // ← the copy changes, not the element } writeln("After the loop: ", numbers); }
The number
that gets assigned 0 each time is a copy of one of the elements of the array. Modifying that copy does not modify the element:
After the loop: [1, 11, 111]
When the actual elements need to be modified, the foreach
variable must be defined as ref
:
foreach (ref number; numbers) { number = 0; // ← the actual element changes }
This time number
is a reference to an actual element in the array:
After the loop: [0, 0, 0]
ref
function parameters
As we have seen in the Function Parameters chapter, the parameters of value types are normally copies of the arguments:
import std.stdio; void addHalf(double value) { value += 0.5; // ← Does not affect 'value' in main } void main() { double value = 1.5; addHalf(value); writeln("The value after calling the function: ", value); }
Because the function parameter is not defined as ref
, the assignment inside the function affects only the local variable there. The variable in main()
is not affected:
The value after calling the function: 1.5
The ref
keyword would make the function parameter a reference to the argument:
void addHalf(ref double value) { value += 0.5; }
This time the variable in main()
gets modified:
The value after calling the function: 2
Reference types
Some types are reference types. Variables of such types provide access to separate variables:
- Class variables
- Slices
- Associative arrays
We have seen this distinction in the Value Types and Reference Types chapter. The following example demonstrates reference types by two class
variables:
import std.stdio; class Pen { double ink; this() { ink = 15; } void use(double amount) { ink -= amount; } } void main() { auto pen = new Pen; auto otherPen = pen; // ← Now both variables provide // access to the same object writefln("Before: %s %s", pen.ink, otherPen.ink); pen.use(1); // ← the same object is used otherPen.use(2); // ← the same object is used writefln("After : %s %s", pen.ink, otherPen.ink); }
Because classes are reference types, the class variables pen
and otherPen
provide access to the same Pen
object. As a result, using either of those class variables affects the same object:
Before: 15 15 After : 12 12
That single object and the two class variables would be laid out in memory similar to the following figure:
(The Pen object) pen otherPen ───┬───────────────────┬─── ───┬───┬─── ───┬───┬─── │ ink │ │ o │ │ o │ ───┴───────────────────┴─── ───┴─│─┴─── ───┴─│─┴─── ▲ │ │ │ │ │ └────────────────────┴────────────┘
References point at actual variables as pen
and otherPen
do above.
Programming languages implement the reference and pointer concepts by special registers of the microprocessor, which are specifically for pointing at memory locations.
Behind the scenes, D's higher-level concepts (class variables, slices, associative arrays, etc.) are all implemented by pointers. As these higher-level features are already efficient and convenient, pointers are rarely needed in D programming. Still, it is important for D programmers to understand pointers well.
Syntax
The pointer syntax of D is mostly the same as in C. Although this can be seen as an advantage, the peculiarities of C's pointer syntax are necessarily inherited by D as well. For example, the different meanings of the *
character may be confusing.
With the exception of void
pointers, every pointer is associated with a certain type and can point at only variables of that specific type. For example, an int
pointer can only point at variables of type int
.
The pointer definition syntax consists of the associated type and a *
character:
type_to_point_at * name_of_the_pointer_variable;
Accordingly, a pointer variable that would be pointing at int
variables would be defined like this:
int * myPointer;
The *
character in that syntax may be pronounced as "pointer". So, the type of myPointer
above is an "int pointer". The spaces before and after the *
character are optional. The following syntaxes are common as well:
int* myPointer; int *myPointer;
When it is specifically a pointer type that is being mentioned as in "int pointer", it is common to write the type without any spaces as in int*
.
Pointer value and the address-of operator &
Being variables themselves pointers have values as well. The default value of a pointer is the special value null
, which means that the pointer is not pointing at any variable yet (i.e. does not provide access to any variable).
To make a pointer provide access to a variable, the value of the pointer must be set to the address of that variable. The pointer starts pointing at the variable that is at that specific address. From now on, I will call that variable the pointee.
The &
operator which we have used many times before with readf
has also been briefly mentioned in the Value Types and Reference Types chapter. This operator produces the address of the variable that is written after it. Its value can be used when initializing a pointer:
int myVariable = 180; int * myPointer = &myVariable;
Initializing myPointer
by the address of myVariable
makes myPointer
point at myVariable
.
The value of the pointer is the same as the address of myVariable
:
writeln("The address of myVariable: ", &myVariable); writeln("The value of myPointer : ", myPointer);
The address of myVariable: 7FFF2CE73F10 The value of myPointer : 7FFF2CE73F10
Note: The address value is likely to be different every time the program is started.
The following figure is a representation of these two variables in memory:
myVariable at myPointer at address 7FFF2CE73F10 some other address ───┬────────────────┬─── ───┬────────────────┬─── │ 180 │ │ 7FFF2CE73F10 │ ───┴────────────────┴─── ───┴────────│───────┴─── ▲ │ │ │ └─────────────────────────────┘
The value of myPointer
is the address of myVariable
, conceptually pointing at the variable that is at that location.
Since pointers are variables as well, the &
operator can produce the address of the pointer as well:
writeln("The address of myPointer : ", &myPointer);
The address of myPointer : 7FFF2CE73F18
Since the difference between the two addresses above is 8, remembering that an int
takes up 4 bytes, we can deduce that myVariable
and myPointer
are 4 bytes apart in memory.
After removing the arrow that represented the concept of pointing at, we can picture the contents of memory around these addresses like this:
7FFF2CE73F10 7FFF2CE73F14 7FFF2CE73F18 : : : : ───┬────────────────┬────────────────┬────────────────┬─── │ 180 │ (unused) │ 7FFF2CE73F10 │ ───┴────────────────┴────────────────┴────────────────┴───
The names of variables, functions, classes, etc. and keywords are not parts of programs of compiled languages like D. The variables that have been defined by the programmer in the source code are converted to bytes that occupy memory or registers of the microprocessor.
Note: The names (a.k.a. symbols) may actually be included in programs to help with debugging but those names do not affect the operation of the program.
The access operator *
We have seen above that the *
character which normally represents multiplication is also used when defining pointers. A difficulty with the syntax of pointers is that the same character has a third meaning: It is also used when accessing the pointee through the pointer.
When it is written before the name of a pointer, it means the variable that the pointer is pointing at (i.e. the pointee):
writeln("The value that it is pointing at: ", *myPointer);
The value that it is pointing at: 180
The .
(dot) operator to access a member of the pointee
If you know pointers from C, this operator is the same as the ->
operator in that language.
We have seen above that the *
operator is used for accessing the pointee. That is sufficiently useful for pointers of fundamental types like int*
: The value of a fundamental type is accessed simply by writing *myPointer
.
However, when the pointee is a struct or a class object, the same syntax becomes inconvenient. To see why, let's consider the following struct:
struct Coordinate { int x; int y; string toString() const { return format("(%s,%s)", x, y); } }
The following code defines an object and a pointer of that type:
auto center = Coordinate(0, 0); Coordinate * ptr = ¢er; // pointer definition writeln(*ptr); // object access
That syntax is convenient when accessing the value of the entire Coordinate
object:
(0,0)
However, the code becomes complicated when accessing a member of an object through a pointer and the *
operator:
// Adjust the x coordinate
(*ptr).x += 10;
That expression modifies the value of the x
member of the center
object. The left-hand side of that expression can be explained by the following steps:
ptr
: The pointer that points atcenter
*ptr
: Accessing the object (i.e.center
itself)(*ptr)
: Parentheses so that the.
(dot) operator is applied to the object, not to the pointer(*ptr).x
: Thex
member of the object thatptr
is pointing at
To reduce the complexity of pointer syntax in D, the .
(dot) operator is transferred to the pointee and provides access to the member of the object. (The exceptions to this rule are at the end of this section.)
So, the previous expression is normally written as:
ptr.x += 10;
Since the pointer itself does not have a member named x
, .x
is applied to the pointee and the x
member of center
gets modified:
(10,0)
Note that this is the same as the use of the .
(dot) operator with classes. When the .
(dot) operator is applied to a class variable, it provides access to a member of the class object:
class ClassType { int member; } // ... // Variable on the left, object on the right ClassType variable = new ClassType; // Applied to the variable but accesses the member of // the object variable.member = 42;
As you remember from the Classes chapter, the class object is constructed by the new
keyword on the right-hand side. variable
is a class variable that provides access to it.
Realizing that it is the same with pointers is an indication that class variables and pointers are implemented similarly by the compiler.
There is an exception to this rule both for class variables and for pointers. Type properties like .sizeof
are applied to the type of the pointer, not to the type of the pointee:
char c; char * p = &c; writeln(p.sizeof); // size of the pointer, not the pointee
.sizeof
produces the size of p
, which is a char*
, not the size of c
, which is a char
. On a 64-bit system pointers are 8-byte long:
8
Modifying the value of a pointer
The values of pointers can be incremented or decremented and they can be used in addition and subtraction:
++ptr; --ptr; ptr += 2; ptr -= 2; writeln(ptr + 3); writeln(ptr - 3);
Different from their arithmetic counterparts, these operations do not modify the actual value by the specified amount. Rather, the value of the pointer gets modified so that it now points at the variable that is a certain number of variables beyond the current one. The amount of the increment or the decrement specifies how many variables away should the pointer now point at.
For example, incrementing the value of a pointer makes it point at the next variable:
++ptr; // Starts pointing at a variable that is next in // memory from the old variable
For that to work correctly, the actual value of the pointer must be incremented by the size of the variable. For example, because the size of int
is 4, incrementing a pointer of type int*
changes its value by 4. The programmer need not pay attention to this detail; the pointer value is modified by the correct amount automatically.
Warning: It is undefined behavior to point at a location that is not a valid byte that belongs to the program. Even if it is not actually used to access any variable there, it is invalid for a pointer to point at a nonexistent variable. (The only exception to this rule is that it is valid to point at the imaginary element one past the end of an array. This will be explained later below.)
For example, it is invalid to increment a pointer that points at myVariable
, because myVariable
is defined as a single int
:
++myPointer; // ← undefined behavior
Undefined behavior means that it cannot be known what the behavior of the program will be after that operation. There may be systems where the program crashes after incrementing that pointer. However, on most modern systems the pointer is likely to point at the unused memory location that has been shown as being between myVariable
and myPointer
in the previous figure.
For that reason, the value of a pointer must be incremented or decremented only if there is a valid object at the new location. (As we will see below, pointing at the element one past the end of an array is valid as well). Arrays (and slices) have that property: The elements of an array are side by side in memory.
A pointer that is pointing at an element of a slice can be incremented safely as long as it is not used to access an element beyond the end of the slice. Incrementing such a pointer by the ++
operator makes it point at the next element:
import std.stdio; import std.string; import std.conv; enum Color { red, yellow, blue } struct Crayon { Color color; double length; string toString() const { return format("%scm %s crayon", length, color); } } void main() { writefln("Crayon objects are %s bytes each.", Crayon.sizeof); Crayon[] crayons = [ Crayon(Color.red, 11), Crayon(Color.yellow, 12), Crayon(Color.blue, 13) ]; Crayon * ptr = &crayons[0]; // (1) for (int i = 0; i != crayons.length; ++i) { writeln("Pointer value: ", ptr); // (2) writeln("Crayon: ", *ptr); // (3) ++ptr; // (4) } }
- Definition: The pointer is initialized by the address of the first element.
- Using its value: The value of the pointer is the address of the element that it is pointing at.
- Accessing the element that is being pointed at.
- Pointing at the next element.
The output:
Crayon objects are 16 bytes each. Pointer value: 7F37AC9E6FC0 Crayon: 11cm red crayon Pointer value: 7F37AC9E6FD0 Crayon: 12cm yellow crayon Pointer value: 7F37AC9E6FE0 Crayon: 13cm blue crayon
Note that the loop above is iterated a total of crayons.length
times so that the pointer is always used for accessing a valid element.
Pointers are risky
The compiler and the D runtime environment cannot guarantee that the pointers are always used correctly. It is the programmer's responsibility to ensure that a pointer is either null
or points at a valid memory location (at a variable, at an element of an array, etc.).
For that reason, it is always better to consider higher-level features of D before thinking about using pointers.
The element one past the end of an array
It is valid to point at the imaginary element one past the end of an array.
This is a useful idiom that is similar to number ranges. When defining a slice with a number range, the second index is one past the elements of the slice:
int[] values = [ 0, 1, 2, 3 ]; writeln(values[1 .. 3]); // 1 and 2 included, 3 excluded
This idiom can be used with pointers as well. It is a common function design in C and C++ where a function parameter points at the first element and another one points at the element after the last element:
import std.stdio; void tenTimes(int * begin, int * end) { while (begin != end) { *begin *= 10; ++begin; } } void main() { int[] values = [ 0, 1, 2, 3 ]; // The address of the second element: int * begin = &values[1]; // The address of two elements beyond that one tenTimes(begin, begin + 2); writeln(values); }
The value begin + 2
means two elements after the one that begin
is pointing at (i.e. the element at index 3).
The tenTimes()
function takes two pointer parameters. It uses the element that the first one is pointing at but it never accesses the element that the second one is pointing at. As a result, only the elements at indexes 1 and 2 get modified:
[0, 10, 20, 3]
Such functions can be implemented by a for
loop as well:
for ( ; begin != end; ++begin) {
*begin *= 10;
}
Two pointers that define a range can also be used with foreach
loops:
foreach (ptr; begin .. end) {
*ptr *= 10;
}
For these methods to be applicable to all of the elements of a slice, the second pointer must necessarily point after the last element:
// The second pointer is pointing at the imaginary element // past the end of the array: tenTimes(begin, begin + values.length);
That is the reason why it is legal to point at the imaginary element one beyond the last element of an array.
Using pointers with the array indexing operator []
Although it is not absolutely necessary in D, pointers can directly be used for accessing the elements of an array by an index value:
double[] floats = [ 0.0, 1.1, 2.2, 3.3, 4.4 ]; double * ptr = &floats[2]; *ptr = -100; // direct access to what it points at ptr[1] = -200; // access by indexing writeln(floats);
The output:
[0, 1.1, -100, -200, 4.4]
In that syntax, the element that the pointer is pointing at is thought of being the first element of an imaginary slice. The []
operator provides access to the specified element of that slice. The ptr
above initially points at the element at index 2 of the original floats
slice. ptr[1]
is a reference to the element 1 of the imaginary slice that starts at ptr
(i.e. index 3 of the original slice).
Although this behavior may seem complicated, there is a very simple conversion behind that syntax. Behind the scenes, the compiler converts the pointer[index]
syntax to the *(pointer + index)
expression:
ptr[1] = -200; // slice syntax *(ptr + 1) = -200; // the equivalent of the previous line
As I have mentioned earlier, the compiler may not guarantee that this expression refers to a valid element. D's slices provide a much safer alternative and should be considered instead:
double[] slice = floats[2 .. 4];
slice[0] = -100;
slice[1] = -200;
Normally, index values are checked for slices at run time:
slice[2] = -300; // Runtime error: accessing outside of the slice
Because the slice above does not have an element at index 2, an exception would be thrown at run time (unless the program has been compiled with the -release
compiler switch):
core.exception.RangeError@deneme(8391): Range violation
Producing a slice from a pointer
Pointers are not as safe or as useful as slices because although they can be used with the slice indexing operator, they are not aware of the valid range of elements.
However, when the number of valid elements is known, a pointer can be used to construct a slice.
Let's assume that the makeObjects()
function below is inside a C library. Let's assume that makeObjects
makes specified number of Struct
objects and returns a pointer to the first one of those objects:
Struct * ptr = makeObjects(10);
The syntax that produces a slice from a pointer is the following:
/* ... */ slice = pointer[0 .. count];
Accordingly, a slice to the 10 objects that are returned by makeObjects()
can be constructed by the following code:
Struct[] slice = ptr[0 .. 10];
After that definition, slice
is ready to be used safely in the program just like any other slice:
writeln(slice[1]); // prints the second element
void*
can point at any type
Although it is almost never needed in D, C's special pointer type void*
is available in D as well. void*
can point at any type:
int number = 42; double otherNumber = 1.25; void * canPointAtAnything; canPointAtAnything = &number; canPointAtAnything = &otherNumber;
The void*
above is able to point at variables of two different types: int
and double
.
void*
pointers are limited in functionality. As a consequence of their flexibility, they cannot provide access to the pointee. When the actual type is unknown, its size is not known either:
*canPointAtAnything = 43; // ← compilation ERROR
Instead, its value must first be converted to a pointer of the correct type:
int number = 42; // (1) void * canPointAtAnything = &number; // (2) // ... int * intPointer = cast(int*)canPointAtAnything; // (3) *intPointer = 43; // (4)
- The actual variable
- Storing the address of the variable in a
void*
- Assigning that address to a pointer of the correct type
- Modifying the variable through the new pointer
It is possible to increment or decrement values of void*
pointers, in which case their values are modified as if they are pointers of 1-byte types like ubyte
:
++canPointAtAnything; // incremented by 1
void*
is sometimes needed when interacting with libraries that are written in C. Since C does not have higher level features like interfaces, classes, templates, etc. C libraries must rely on the void*
type.
Using pointers in logical expressions
Pointers can automatically be converted to bool
. Pointers that have the value null
produce false
and the others produce true
. In other words, pointers that do not point at any variable are false
.
Let's consider a function that prints objects to the standard output. Let's design this function so that it also provides the number of bytes that it has just output. However, let's have it produce this information only when specifically requested.
It is possible to make this behavior optional by checking whether the value of a pointer is null
or not:
void print(Crayon crayon, size_t * numberOfBytes) { immutable info = format("Crayon: %s", crayon); writeln(info); if (numberOfBytes) { *numberOfBytes = info.length; } }
When the caller does not need this special information, they can pass null
as the argument:
print(Crayon(Color.yellow, 7), null);
When the number of bytes is indeed important, then a non-null
pointer value must be passed:
size_t numberOfBytes; print(Crayon(Color.blue, 8), &numberOfBytes); writefln("%s bytes written to the output", numberOfBytes);
Note that this is just an example. Otherwise, it would be better for a function like print()
to return the number of bytes unconditionally:
size_t print(Crayon crayon) { immutable info = format("Crayon: %s", crayon); writeln(info); return info.length; }
new
returns a pointer for some types
new
, which we have been using only for constructing class objects can be used with other types as well: structs, arrays, and fundamental types. The variables that are constructed by new
are called dynamic variables.
new
first allocates space from the memory for the variable and then constructs the variable in that space. The variable itself does not have a symbolic name in the compiled program; it would be accessed through the reference that is returned by new
.
The reference that new
returns is a different kind depending on the type of the variable:
- For class objects, it is a class variable:
Class classVariable = new Class;
- For struct objects and variables of fundamental types, it is a pointer:
Struct * structPointer = new Struct; int * intPointer = new int;
- For arrays, it is a slice:
int[] slice = new int[100];
This distinction is usually not obvious when the type is not spelled-out on the left-hand side:
auto classVariable = new Class; auto structPointer = new Struct; auto intPointer = new int; auto slice = new int[100];
The following program prints the return type of new
for different kinds of variables:
import std.stdio; struct Struct { } class Class { } void main() { writeln(typeof(new int ).stringof); writeln(typeof(new int[5]).stringof); writeln(typeof(new Struct).stringof); writeln(typeof(new Class ).stringof); }
new
returns pointers for structs and fundamental types:
int* int[] Struct* Class
When new
is used for constructing a dynamic variable of a value type, then the lifetime of that variable is extended as long as there is still a reference (e.g. a pointer) to that object in the program. (This is the default situation for reference types.)
The .ptr
property of arrays
The .ptr
property of arrays and slices is the address of the first element. The type of this value is a pointer to the type of the elements:
int[] numbers = [ 7, 12 ]; int * addressOfFirstElement = numbers.ptr; writeln("First element: ", *addressOfFirstElement);
This property is useful especially when interacting with C libraries. Some C functions take the address of the first of a number of consecutive elements in memory.
Remembering that strings are also arrays, the .ptr
property can be used with strings as well. However, note that the first element of a string need not be the first letter of the string; rather, the first Unicode code unit of that letter. As an example, the letter é is stored as two code units in a char
string.
When accessed through the .ptr
property, the code units of strings can be accessed individually. We will see this in the examples section below.
The in
operator of associative arrays
Actually, we have used pointers earlier in the Associative Arrays chapter. In that chapter, I had intentionally not mentioned the exact type of the in
operator and had used it only in logical expressions:
if ("purple" in colorCodes) { // there is an element for key "purple" } else { // no element for key "purple" }
In fact, the in
operator returns the address of the element if there is an element for the specified key; otherwise, it returns null
. The if
statement above actually relies on the automatic conversion of the pointer value to bool
.
When the return value of in
is stored in a pointer, the element can be accessed efficiently through that pointer:
import std.stdio; void main() { string[int] numbers = [ 0 : "zero", 1 : "one", 2 : "two", 3 : "three" ]; int number = 2; auto element = number in numbers; // (1) if (element) { // (2) writefln("I know: %s.", *element); // (3) } else { writefln("I don't know the spelling of %s.", number); } }
The pointer variable element
is initialized by the value of the in
operator (1) and its value is used in a logical expression (2). The value of the element is accessed through that pointer (3) only if the pointer is not null
.
The actual type of element
above is a pointer to the same type of the elements (i.e. values) of the associative array. Since the elements of numbers
above are of type string
, in
returns a string*
. Accordingly, the type could have been spelled out explicitly:
string * element = number in numbers;
When to use pointers
Pointers are rare in D. As we have seen in the Reading from the Standard Input chapter, readf
can in fact be used without explicit pointers.
When required by libraries
Pointers can appear on C and C++ library bindings. For example, the following function from the GtkD library takes a pointer:
GdkGeometry geometry; // ... set the members of 'geometry' ... window.setGeometryHints(/* ... */, &geometry, /* ... */);
When referencing variables of value types
Pointers can be used for referring to local variables. The following program counts the outcomes of flipping a coin. It takes advantage of a pointer when referring to one of two local variables:
import std.stdio; import std.random; void main() { size_t headsCount = 0; size_t tailsCount = 0; foreach (i; 0 .. 100) { size_t * theCounter = (uniform(0, 2) == 1) ? &headsCount : &tailsCount; ++(*theCounter); } writefln("heads: %s tails: %s", headsCount, tailsCount); }
Obviously, there are other ways of achieving the same goal. For example, using the ternary operator in a different way:
uniform(0, 2) ? ++headsCount : ++tailsCount;
By using an if
statement:
if (uniform(0, 2)) { ++headsCount; } else { ++tailsCount; }
As member variables of data structures
Pointers are essential when implementing many data structures.
Unlike the elements of an array being next to each other in memory, elements of many other data structures are apart. Such data structures are based on the concept of their elements pointing at other elements.
For example, each node of a linked list points at the next node. Similarly, each node of a binary tree points at the left and right branches under that node. Pointers are encountered in most other data structures as well.
Although it is possible to take advantage of D's reference types, pointers may be more natural and efficient in some cases.
We will see examples of pointer members below.
When accessing memory directly
Being low-level microprocessor features, pointers provide byte-level access to memory locations. Note that such locations must still belong to valid variables. It is undefined behavior to attempt to access a random memory location.
Examples
A simple linked list
The elements of linked lists are stored in nodes. The concept of a linked list is based on each node pointing at the node that comes after it. The last node has no other node to point at, so it is set to null
:
first node next node last node ┌─────────┬───┐ ┌─────────┬───┐ ┌─────────┬──────┐ │ element │ o────▶ │ element │ o────▶ ... │ element │ null │ └─────────┴───┘ └─────────┴───┘ └─────────┴──────┘
The figure above may be misleading: In reality, the nodes are not side-by-side in memory. Each node does point to the next node but the next node may be at a completely different location.
The following struct
can be used for representing the nodes of such a linked list of int
s:
struct Node { int element; Node * next; // ... }
Note: Because it contains a reference to the same type as itself, Node
is a recursive type.
The entire list can be represented by a single pointer that points at the first node, which is commonly called the head:
struct List { Node * head; // ... }
To keep the example short, let's define just one function that adds an element to the head of the list:
struct List { Node * head; void insertAtHead(int element) { head = new Node(element, head); } // ... }
The line inside insertAtHead()
keeps the nodes linked by adding a new node to the head of the list. (A function that adds to the end of the list would be more natural and more useful. We will see that function later in one of the problems.)
The right-hand side expression of that line constructs a Node
object. When this new object is constructed, its next
member is initialized by the current head of the list. When the head
member of the list is assigned to this newly linked node, the new element ends up being the first element.
The following program tests these two structs:
import std.stdio; import std.conv; import std.string; struct Node { int element; Node * next; string toString() const { string result = to!string(element); if (next) { result ~= " -> " ~ to!string(*next); } return result; } } struct List { Node * head; void insertAtHead(int element) { head = new Node(element, head); } string toString() const { return format("(%s)", head ? to!string(*head) : ""); } } void main() { List numbers; writeln("before: ", numbers); foreach (number; 0 .. 10) { numbers.insertAtHead(number); } writeln("after : ", numbers); }
The output:
before: () after : (9 -> 8 -> 7 -> 6 -> 5 -> 4 -> 3 -> 2 -> 1 -> 0)
Observing the contents of memory by ubyte*
The data stored at each memory address is a byte. Every variable is constructed on a piece of memory that consists of as many bytes as the size of the type of that variable.
A suitable pointer type to observe the content of a memory location is ubyte*
. Once the address of a variable is assigned to a ubyte
pointer, then all of the bytes of that variable can be observed by incrementing the pointer.
Let's consider the following integer that is initialized by the hexadecimal notation so that it will be easy to understand how its bytes are placed in memory:
int variable = 0x01_02_03_04;
A pointer that points at that variable can be defined like this:
int * address = &variable;
The value of that pointer can be assigned to a ubyte
pointer by the cast
operator:
ubyte * bytePointer = cast(ubyte*)address;
Such a pointer allows accessing the four bytes of the int
variable individually:
writeln(bytePointer[0]); writeln(bytePointer[1]); writeln(bytePointer[2]); writeln(bytePointer[3]);
If your microprocessor is little-endian like mine, you should see the bytes of the value 0x01_02_03_04
in reverse:
4 3 2 1
Let's use that idea in a function that will be useful when observing the bytes of all types of variables:
import std.stdio; void printBytes(T)(ref T variable) { const ubyte * begin = cast(ubyte*)&variable; // (1) writefln("type : %s", T.stringof); writefln("value : %s", variable); writefln("address: %s", begin); // (2) writef ("bytes : "); writefln("%(%02x %)", begin[0 .. T.sizeof]); // (3) writeln(); }
- Assigning the address of the variable to a
ubyte
pointer. - Printing the value of the pointer.
- Obtaining the size of the type by
.sizeof
and printing the bytes of the variable. (Note how a slice is produced from thebegin
pointer and then that slice is printed directly bywritefln()
.)
Another way of printing the bytes would be to apply the *
operator individually:
foreach (bytePointer; begin .. begin + T.sizeof) { writef("%02x ", *bytePointer); }
The value of bytePointer
would change from begin
to begin + T.sizeof
to visit all of the bytes of the variable. Note that the value begin + T.sizeof
is outside of the range and is never accessed.
The following program calls printBytes()
with various types of variables:
struct Struct { int first; int second; } class Class { int i; int j; int k; this(int i, int j, int k) { this.i = i; this.j = j; this.k = k; } } void main() { int integerVariable = 0x11223344; printBytes(integerVariable); double doubleVariable = double.nan; printBytes(doubleVariable); string slice = "a bright and charming façade"; printBytes(slice); int[3] array = [ 1, 2, 3 ]; printBytes(array); auto structObject = Struct(0xaa, 0xbb); printBytes(structObject); auto classVariable = new Class(1, 2, 3); printBytes(classVariable); }
The output of the program is informative:
type : int value : 287454020 address: 7FFF19A83FB0 bytes : 44 33 22 11 ← (1) type : double value : nan address: 7FFF19A83FB8 bytes : 00 00 00 00 00 00 f8 7f ← (2) type : string value : a bright and charming façade address: 7FFF19A83FC0 bytes : 1d 00 00 00 00 00 00 00 e0 68 48 00 00 00 00 00 ← (3) type : int[3LU] value : [1, 2, 3] address: 7FFF19A83FD0 bytes : 01 00 00 00 02 00 00 00 03 00 00 00 ← (1) type : Struct value : Struct(170, 187) address: 7FFF19A83FE8 bytes : aa 00 00 00 bb 00 00 00 ← (1) type : Class value : deneme.Class address: 7FFF19A83FF0 bytes : 80 df 79 d5 97 7f 00 00 ← (4)
Observations:
- Although in reverse order on little-endian systems, the bytes of some of the types are as one would expect: The bytes are laid out in memory side by side for
int
s, fixed-length arrays (int[3]
), and struct objects. - Considering that the bytes of the special value of
double.nan
are also in reverse order in memory, we can see that it is represented by the special bit pattern 0x7ff8000000000000. string
is reported to be consisting of 16 bytes but it is impossible to fit the letters"a bright and charming façade"
into so few bytes. This is due to the fact that behind the scenesstring
is actually implemented as a struct. Prefixing its name by__
to stress the fact that it is an internal type used by the compiler, that struct is similar to the following one:struct __string { size_t length; char * ptr; // the actual characters }
The evidence of this fact is hidden in the bytes that are printed for
string
above. Note that because ç is made up of two UTF-8 code units, the 28 letters of the string"a bright and charming façade"
consists of a total of 29 bytes. The value 0x000000000000001d, the first 8 of the bytes of the string in the output above, is also 29. This is a strong indicator that strings are indeed laid out in memory as in the struct above.- Similarly, it is not possible to fit the three
int
members of the class object in 8 bytes. The output above hints at the possibility that behind the scenes a class variable is implemented as a single pointer that points at the actual class object:struct __Class_VariableType { __Class_ActualObjecType * object; }
Let's now consider a more flexible function. Instead of printing the bytes of a variable, let's define a function that prints specified number of bytes at a specified location:
import std.stdio; import std.ascii; void printMemory(T)(T * location, size_t length) { const ubyte * begin = cast(ubyte*)location; foreach (address; begin .. begin + length) { char c = (isPrintable(*address) ? *address : '.'); writefln("%s: %02x %s", address, *address, c); } }
Since some of the UTF-8 code units may correspond to control characters of the terminal and disrupt its output, we print only the printable characters by first checking them individually by std.ascii.isPrintable()
. The non-printable characters are printed as a dot.
We can use that function to print the UTF-8 code units of a string
through its .ptr
property:
import std.stdio; void main() { string s = "a bright and charming façade"; printMemory(s.ptr, s.length); }
As seen in the output, the letter ç consists of two bytes:
47B4F0: 61 a 47B4F1: 20 47B4F2: 62 b 47B4F3: 72 r 47B4F4: 69 i 47B4F5: 67 g 47B4F6: 68 h 47B4F7: 74 t 47B4F8: 20 47B4F9: 61 a 47B4FA: 6e n 47B4FB: 64 d 47B4FC: 20 47B4FD: 63 c 47B4FE: 68 h 47B4FF: 61 a 47B500: 72 r 47B501: 6d m 47B502: 69 i 47B503: 6e n 47B504: 67 g 47B505: 20 47B506: 66 f 47B507: 61 a 47B508: c3 . 47B509: a7 . 47B50A: 61 a 47B50B: 64 d 47B50C: 65 e
Exercises
- Fix the following function so that the values of the arguments that are passed to it are swapped. For this exercise, do not specify the parameters as
ref
but take them as pointers:void swap(int lhs, int rhs) { int temp = lhs; lhs = rhs; rhs = temp; } void main() { int i = 1; int j = 2; swap(i, j); // Their values should be swapped assert(i == 2); assert(j == 1); }
When you start the program you will notice that the
assert
checks currently fail. - Convert the linked list that we have defined above to a template so that it can be used for storing elements of any type.
- It is more natural to add elements to the end of a linked list. Modify
List
so that it is possible to add elements to the end as well.For this exercise, an additional pointer member variable that points at the last element will be useful.