Programming in D - Pointers

Pointers

Pointers are variables that provide access to other variables. The value of a pointer is the address of the variable that it provides access to.

Pointers can point at any type of variable, object, and even other pointers. In this chapter, I will refer to all of these simply as variables.

Pointers are low level features of microprocessors. They are an important part of system programming.

The syntax and semantics of pointers in D are inherited directly from C. Although pointers are notoriously the most difficult feature of C to comprehend, they should not be as difficult in D. This is because other features of D that are semantically close to pointers are more useful in situations where pointers would have to be used in other languages. When the ideas behind pointers are already understood from those other features of D, pointers should be easier to grasp.

The short examples throughout the most of this chapter are decidedly simple. The programs at the end of the chapter will be more realistic.

The names like ptr (short for "pointer") that I have used in these examples should not be considered as useful names in general. As always, names must be chosen to be more meaningful and explanatory in actual programs.

The concept of a reference

Although we have encountered references many times in the previous chapters, let's summarize this concept one more time.

The `ref` variables in `foreach` loops

As we have seen in the foreach Loop chapter, normally the loop variables are copies of elements:

import std.stdio;

void main() {
    int[] numbers = [ 1, 11, 111 ];

    foreach (number; numbers) {
        number = 0;     // ← the copy changes, not the element
    }

    writeln("After the loop: ", numbers);
}

The number that gets assigned 0 each time is a copy of one of the elements of the array. Modifying that copy does not modify the element:

After the loop: [1, 11, 111]

When the actual elements need to be modified, the foreach variable must be defined as ref:

    foreach (ref number; numbers) {
        number = 0;     // ← the actual element changes
    }

This time number is a reference to an actual element in the array:

After the loop: [0, 0, 0]

`ref` function parameters

As we have seen in the Function Parameters chapter, the parameters of value types are normally copies of the arguments:

import std.stdio;

void addHalf(double value) {
    value += 0.5;        // ← Does not affect 'value' in main
}

void main() {
    double value = 1.5;

    addHalf(value);

    writeln("The value after calling the function: ", value);
}

Because the function parameter is not defined as ref, the assignment inside the function affects only the local variable there. The variable in main() is not affected:

The value after calling the function: 1.5

The ref keyword would make the function parameter a reference to the argument:

void addHalf(ref double value) {
    value += 0.5;
}

This time the variable in main() gets modified:

The value after calling the function: 2

Reference types

Some types are reference types. Variables of such types provide access to separate variables:

Class variables
Slices
Associative arrays

We have seen this distinction in the Value Types and Reference Types chapter. The following example demonstrates reference types by two class variables:

import std.stdio;

class Pen {
    double ink;

    this() {
        ink = 15;
    }

    void use(double amount) {
        ink -= amount;
    }
}

void main() {
    auto pen = new Pen;
    auto otherPen = pen;  // ← Now both variables provide
                          //   access to the same object

    writefln("Before: %s %s", pen.ink, otherPen.ink);

    pen.use(1);          // ← the same object is used
    otherPen.use(2);     // ← the same object is used

    writefln("After : %s %s", pen.ink, otherPen.ink);
}

Because classes are reference types, the class variables pen and otherPen provide access to the same Pen object. As a result, using either of those class variables affects the same object:

Before: 15 15
After : 12 12

That single object and the two class variables would be laid out in memory similar to the following figure:

      (The Pen object)            pen        otherPen
 ───┬───────────────────┬───  ───┬───┬───  ───┬───┬───
    │        ink        │        │ o │        │ o │
 ───┴───────────────────┴───  ───┴─│─┴───  ───┴─│─┴───
              ▲                    │            │
              │                    │            │
              └────────────────────┴────────────┘

References point at actual variables as pen and otherPen do above.

Programming languages implement the reference and pointer concepts by special registers of the microprocessor, which are specifically for pointing at memory locations.

Behind the scenes, D's higher-level concepts (class variables, slices, associative arrays, etc.) are all implemented by pointers. As these higher-level features are already efficient and convenient, pointers are rarely needed in D programming. Still, it is important for D programmers to understand pointers well.

Syntax

The pointer syntax of D is mostly the same as in C. Although this can be seen as an advantage, the peculiarities of C's pointer syntax are necessarily inherited by D as well. For example, the different meanings of the * character may be confusing.

With the exception of void pointers, every pointer is associated with a certain type and can point at only variables of that specific type. For example, an int pointer can only point at variables of type int.

The pointer definition syntax consists of the associated type and a * character:

    type_to_point_at * name_of_the_pointer_variable;

Accordingly, a pointer variable that would be pointing at int variables would be defined like this:

    int * myPointer;

The * character in that syntax may be pronounced as "pointer". So, the type of myPointer above is an "int pointer". The spaces before and after the * character are optional. The following syntaxes are common as well:

    int* myPointer;
    int *myPointer;

When it is specifically a pointer type that is being mentioned as in "int pointer", it is common to write the type without any spaces as in int*.

Pointer value and the address-of operator `&`

Being variables themselves pointers have values as well. The default value of a pointer is the special value null, which means that the pointer is not pointing at any variable yet (i.e. does not provide access to any variable).

To make a pointer provide access to a variable, the value of the pointer must be set to the address of that variable. The pointer starts pointing at the variable that is at that specific address. From now on, I will call that variable the pointee.

The & operator which we have used many times before with readf has also been briefly mentioned in the Value Types and Reference Types chapter. This operator produces the address of the variable that is written after it. Its value can be used when initializing a pointer:

    int myVariable = 180;
    int * myPointer = &myVariable;

Initializing myPointer by the address of myVariable makes myPointer point at myVariable.

The value of the pointer is the same as the address of myVariable:

    writeln("The address of myVariable: ", &myVariable);
    writeln("The value of myPointer   : ", myPointer);

The address of myVariable: 7FFF2CE73F10
The value of myPointer   : 7FFF2CE73F10

Note: The address value is likely to be different every time the program is started.

The following figure is a representation of these two variables in memory:

      myVariable at                myPointer at
  address 7FFF2CE73F10          some other address
───┬────────────────┬───     ───┬────────────────┬───
   │      180       │           │  7FFF2CE73F10  │
───┴────────────────┴───     ───┴────────│───────┴───
           ▲                             │
           │                             │
           └─────────────────────────────┘

The value of myPointer is the address of myVariable, conceptually pointing at the variable that is at that location.

Since pointers are variables as well, the & operator can produce the address of the pointer as well:

    writeln("The address of myPointer : ", &myPointer);

The address of myPointer : 7FFF2CE73F18

Since the difference between the two addresses above is 8, remembering that an int takes up 4 bytes, we can deduce that myVariable and myPointer are 4 bytes apart in memory.

After removing the arrow that represented the concept of pointing at, we can picture the contents of memory around these addresses like this:

    7FFF2CE73F10     7FFF2CE73F14     7FFF2CE73F18
    :                :                :                :
 ───┬────────────────┬────────────────┬────────────────┬───
    │      180       │    (unused)    │  7FFF2CE73F10  │
 ───┴────────────────┴────────────────┴────────────────┴───

The names of variables, functions, classes, etc. and keywords are not parts of programs of compiled languages like D. The variables that have been defined by the programmer in the source code are converted to bytes that occupy memory or registers of the microprocessor.

Note: The names (a.k.a. symbols) may actually be included in programs to help with debugging but those names do not affect the operation of the program.

The access operator `*`

We have seen above that the * character which normally represents multiplication is also used when defining pointers. A difficulty with the syntax of pointers is that the same character has a third meaning: It is also used when accessing the pointee through the pointer.

When it is written before the name of a pointer, it means the variable that the pointer is pointing at (i.e. the pointee):

    writeln("The value that it is pointing at: ", *myPointer);

The value that it is pointing at: 180

The `.` (dot) operator to access a member of the pointee

If you know pointers from C, this operator is the same as the -> operator in that language.

We have seen above that the * operator is used for accessing the pointee. That is sufficiently useful for pointers of fundamental types like int*: The value of a fundamental type is accessed simply by writing *myPointer.

However, when the pointee is a struct or a class object, the same syntax becomes inconvenient. To see why, let's consider the following struct:

struct Coordinate {
    int x;
    int y;

    string toString() const {
        return format("(%s,%s)", x, y);
    }
}

The following code defines an object and a pointer of that type:

    auto center = Coordinate(0, 0);
    Coordinate * ptr = &center;    // pointer definition
    writeln(*ptr);                 // object access

That syntax is convenient when accessing the value of the entire Coordinate object:

(0,0)

However, the code becomes complicated when accessing a member of an object through a pointer and the * operator:

    // Adjust the x coordinate
    (*ptr).x += 10;

That expression modifies the value of the x member of the center object. The left-hand side of that expression can be explained by the following steps:

ptr: The pointer that points at center
*ptr: Accessing the object (i.e. center itself)
(*ptr): Parentheses so that the . (dot) operator is applied to the object, not to the pointer
(*ptr).x: The x member of the object that ptr is pointing at

To reduce the complexity of pointer syntax in D, the . (dot) operator is transferred to the pointee and provides access to the member of the object. (The exceptions to this rule are at the end of this section.)

So, the previous expression is normally written as:

    ptr.x += 10;

Since the pointer itself does not have a member named x, .x is applied to the pointee and the x member of center gets modified:

(10,0)

Note that this is the same as the use of the . (dot) operator with classes. When the . (dot) operator is applied to a class variable, it provides access to a member of the class object:

class ClassType {
    int member;
}

// ...

    // Variable on the left, object on the right
    ClassType variable = new ClassType;

    // Applied to the variable but accesses the member of
    // the object
    variable.member = 42;

As you remember from the Classes chapter, the class object is constructed by the new keyword on the right-hand side. variable is a class variable that provides access to it.

Realizing that it is the same with pointers is an indication that class variables and pointers are implemented similarly by the compiler.

There is an exception to this rule both for class variables and for pointers. Type properties like .sizeof are applied to the type of the pointer, not to the type of the pointee:

    char c;
    char * p = &c;

    writeln(p.sizeof);  // size of the pointer, not the pointee

.sizeof produces the size of p, which is a char*, not the size of c, which is a char. On a 64-bit system pointers are 8-byte long:

Modifying the value of a pointer

The values of pointers can be incremented or decremented and they can be used in addition and subtraction:

    ++ptr;
    --ptr;
    ptr += 2;
    ptr -= 2;
    writeln(ptr + 3);
    writeln(ptr - 3);

Different from their arithmetic counterparts, these operations do not modify the actual value by the specified amount. Rather, the value of the pointer gets modified so that it now points at the variable that is a certain number of variables beyond the current one. The amount of the increment or the decrement specifies how many variables away should the pointer now point at.

For example, incrementing the value of a pointer makes it point at the next variable:

    ++ptr;  // Starts pointing at a variable that is next in
            // memory from the old variable

For that to work correctly, the actual value of the pointer must be incremented by the size of the variable. For example, because the size of int is 4, incrementing a pointer of type int* changes its value by 4. The programmer need not pay attention to this detail; the pointer value is modified by the correct amount automatically.

Warning: It is undefined behavior to point at a location that is not a valid byte that belongs to the program. Even if it is not actually used to access any variable there, it is invalid for a pointer to point at a nonexistent variable. (The only exception to this rule is that it is valid to point at the imaginary element one past the end of an array. This will be explained later below.)

For example, it is invalid to increment a pointer that points at myVariable, because myVariable is defined as a single int:

    ++myPointer;       // ← undefined behavior

Undefined behavior means that it cannot be known what the behavior of the program will be after that operation. There may be systems where the program crashes after incrementing that pointer. However, on most modern systems the pointer is likely to point at the unused memory location that has been shown as being between myVariable and myPointer in the previous figure.

For that reason, the value of a pointer must be incremented or decremented only if there is a valid object at the new location. (As we will see below, pointing at the element one past the end of an array is valid as well). Arrays (and slices) have that property: The elements of an array are side by side in memory.

A pointer that is pointing at an element of a slice can be incremented safely as long as it is not used to access an element beyond the end of the slice. Incrementing such a pointer by the ++ operator makes it point at the next element:

import std.stdio;
import std.string;
import std.conv;

enum Color { red, yellow, blue }

struct Crayon {
    Color color;
    double length;

    string toString() const {
        return format("%scm %s crayon", length, color);
    }
}

void main() {
    writefln("Crayon objects are %s bytes each.", Crayon.sizeof);

    Crayon[] crayons = [ Crayon(Color.red, 11),
                         Crayon(Color.yellow, 12),
                         Crayon(Color.blue, 13) ];

    Crayon * ptr = &crayons[0];                   // (1)

    for (int i = 0; i != crayons.length; ++i) {
        writeln("Pointer value: ", ptr);          // (2)

        writeln("Crayon: ", *ptr);                // (3)
        ++ptr;                                    // (4)
    }
}

Definition: The pointer is initialized by the address of the first element.
Using its value: The value of the pointer is the address of the element that it is pointing at.
Accessing the element that is being pointed at.
Pointing at the next element.

The output:

Crayon objects are 16 bytes each.
Pointer value: 7F37AC9E6FC0
Crayon: 11cm red crayon
Pointer value: 7F37AC9E6FD0
Crayon: 12cm yellow crayon
Pointer value: 7F37AC9E6FE0
Crayon: 13cm blue crayon

Note that the loop above is iterated a total of crayons.length times so that the pointer is always used for accessing a valid element.

Pointers are risky

The compiler and the D runtime environment cannot guarantee that the pointers are always used correctly. It is the programmer's responsibility to ensure that a pointer is either null or points at a valid memory location (at a variable, at an element of an array, etc.).

For that reason, it is always better to consider higher-level features of D before thinking about using pointers.

The element one past the end of an array

It is valid to point at the imaginary element one past the end of an array.

This is a useful idiom that is similar to number ranges. When defining a slice with a number range, the second index is one past the elements of the slice:

    int[] values = [ 0, 1, 2, 3 ];
    writeln(values[1 .. 3]);   // 1 and 2 included, 3 excluded

This idiom can be used with pointers as well. It is a common function design in C and C++ where a function parameter points at the first element and another one points at the element after the last element:

import std.stdio;

void tenTimes(int * begin, int * end) {
    while (begin != end) {
        *begin *= 10;
        ++begin;
    }
}

void main() {
    int[] values = [ 0, 1, 2, 3 ];

    // The address of the second element:
    int * begin = &values[1];

    // The address of two elements beyond that one
    tenTimes(begin, begin + 2);

    writeln(values);
}

The value begin + 2 means two elements after the one that begin is pointing at (i.e. the element at index 3).

The tenTimes() function takes two pointer parameters. It uses the element that the first one is pointing at but it never accesses the element that the second one is pointing at. As a result, only the elements at indexes 1 and 2 get modified:

[0, 10, 20, 3]

Such functions can be implemented by a for loop as well:

    for ( ; begin != end; ++begin) {
        *begin *= 10;
    }

Two pointers that define a range can also be used with foreach loops:

    foreach (ptr; begin .. end) {
        *ptr *= 10;
    }

For these methods to be applicable to all of the elements of a slice, the second pointer must necessarily point after the last element:

    // The second pointer is pointing at the imaginary element
    // past the end of the array:
    tenTimes(begin, begin + values.length);

That is the reason why it is legal to point at the imaginary element one beyond the last element of an array.

Using pointers with the array indexing operator `[]`

Although it is not absolutely necessary in D, pointers can directly be used for accessing the elements of an array by an index value:

    double[] floats = [ 0.0, 1.1, 2.2, 3.3, 4.4 ];

    double * ptr = &floats[2];

    *ptr = -100;      // direct access to what it points at
    ptr[1] = -200;    // access by indexing

    writeln(floats);

The output:

[0, 1.1, -100, -200, 4.4]

In that syntax, the element that the pointer is pointing at is thought of being the first element of an imaginary slice. The [] operator provides access to the specified element of that slice. The ptr above initially points at the element at index 2 of the original floats slice. ptr[1] is a reference to the element 1 of the imaginary slice that starts at ptr (i.e. index 3 of the original slice).

Although this behavior may seem complicated, there is a very simple conversion behind that syntax. Behind the scenes, the compiler converts the pointer[index] syntax to the *(pointer + index) expression:

    ptr[1] = -200;      // slice syntax
    *(ptr + 1) = -200;  // the equivalent of the previous line

As I have mentioned earlier, the compiler may not guarantee that this expression refers to a valid element. D's slices provide a much safer alternative and should be considered instead:

    double[] slice = floats[2 .. 4];
    slice[0] = -100;
    slice[1] = -200;

Normally, index values are checked for slices at run time:

    slice[2] = -300;  // Runtime error: accessing outside of the slice

Because the slice above does not have an element at index 2, an exception would be thrown at run time (unless the program has been compiled with the -release compiler switch):

core.exception.RangeError@deneme(8391): Range violation

Producing a slice from a pointer

Pointers are not as safe or as useful as slices because although they can be used with the slice indexing operator, they are not aware of the valid range of elements.

However, when the number of valid elements is known, a pointer can be used to construct a slice.

Let's assume that the makeObjects() function below is inside a C library. Let's assume that makeObjects makes specified number of Struct objects and returns a pointer to the first one of those objects:

    Struct * ptr = makeObjects(10);

The syntax that produces a slice from a pointer is the following:

    /* ... */ slice = pointer[0 .. count];

Accordingly, a slice to the 10 objects that are returned by makeObjects() can be constructed by the following code:

    Struct[] slice = ptr[0 .. 10];

After that definition, slice is ready to be used safely in the program just like any other slice:

    writeln(slice[1]);    // prints the second element

`void*` can point at any type

Although it is almost never needed in D, C's special pointer type void* is available in D as well. void* can point at any type:

    int number = 42;
    double otherNumber = 1.25;
    void * canPointAtAnything;

    canPointAtAnything = &number;
    canPointAtAnything = &otherNumber;

The void* above is able to point at variables of two different types: int and double.

void* pointers are limited in functionality. As a consequence of their flexibility, they cannot provide access to the pointee. When the actual type is unknown, its size is not known either:

    *canPointAtAnything = 43;     // ← compilation ERROR

Instead, its value must first be converted to a pointer of the correct type:

    int number = 42;                                  // (1)
    void * canPointAtAnything = &number;              // (2)

    // ...

    int * intPointer = cast(int*)canPointAtAnything;  // (3)
    *intPointer = 43;                                 // (4)

The actual variable
Storing the address of the variable in a void*
Assigning that address to a pointer of the correct type
Modifying the variable through the new pointer

It is possible to increment or decrement values of void* pointers, in which case their values are modified as if they are pointers of 1-byte types like ubyte:

    ++canPointAtAnything;    // incremented by 1

void* is sometimes needed when interacting with libraries that are written in C. Since C does not have higher level features like interfaces, classes, templates, etc. C libraries must rely on the void* type.

Using pointers in logical expressions

Pointers can automatically be converted to bool. Pointers that have the value null produce false and the others produce true. In other words, pointers that do not point at any variable are false.

Let's consider a function that prints objects to the standard output. Let's design this function so that it also provides the number of bytes that it has just output. However, let's have it produce this information only when specifically requested.

It is possible to make this behavior optional by checking whether the value of a pointer is null or not:

void print(Crayon crayon, size_t * numberOfBytes) {
    immutable info = format("Crayon: %s", crayon);
    writeln(info);

    if (numberOfBytes) {
        *numberOfBytes = info.length;
    }
}

When the caller does not need this special information, they can pass null as the argument:

    print(Crayon(Color.yellow, 7), null);

When the number of bytes is indeed important, then a non-null pointer value must be passed:

    size_t numberOfBytes;
    print(Crayon(Color.blue, 8), &numberOfBytes);
    writefln("%s bytes written to the output", numberOfBytes);

Note that this is just an example. Otherwise, it would be better for a function like print() to return the number of bytes unconditionally:

size_t print(Crayon crayon) {
    immutable info = format("Crayon: %s", crayon);
    writeln(info);

    return info.length;
}

`new` returns a pointer for some types

new, which we have been using only for constructing class objects can be used with other types as well: structs, arrays, and fundamental types. The variables that are constructed by new are called dynamic variables.

new first allocates space from the memory for the variable and then constructs the variable in that space. The variable itself does not have a symbolic name in the compiled program; it would be accessed through the reference that is returned by new.

The reference that new returns is a different kind depending on the type of the variable:

For class objects, it is a class variable:
```
    Class classVariable = new Class;
```

For struct objects and variables of fundamental types, it is a pointer:

    Struct * structPointer = new Struct;
    int * intPointer = new int;

For arrays, it is a slice:
```
    int[] slice = new int[100];
```

This distinction is usually not obvious when the type is not spelled-out on the left-hand side:

    auto classVariable = new Class;
    auto structPointer = new Struct;
    auto intPointer = new int;
    auto slice = new int[100];

The following program prints the return type of new for different kinds of variables:

import std.stdio;

struct Struct {
}

class Class {
}

void main() {
    writeln(typeof(new int   ).stringof);
    writeln(typeof(new int[5]).stringof);
    writeln(typeof(new Struct).stringof);
    writeln(typeof(new Class ).stringof);
}

new returns pointers for structs and fundamental types:

int*
int[]
Struct*
Class

When new is used for constructing a dynamic variable of a value type, then the lifetime of that variable is extended as long as there is still a reference (e.g. a pointer) to that object in the program. (This is the default situation for reference types.)

The `.ptr` property of arrays

The .ptr property of arrays and slices is the address of the first element. The type of this value is a pointer to the type of the elements:

    int[] numbers = [ 7, 12 ];

    int * addressOfFirstElement = numbers.ptr;
    writeln("First element: ", *addressOfFirstElement);

This property is useful especially when interacting with C libraries. Some C functions take the address of the first of a number of consecutive elements in memory.

Remembering that strings are also arrays, the .ptr property can be used with strings as well. However, note that the first element of a string need not be the first letter of the string; rather, the first Unicode code unit of that letter. As an example, the letter é is stored as two code units in a char string.

When accessed through the .ptr property, the code units of strings can be accessed individually. We will see this in the examples section below.

The `in` operator of associative arrays

Actually, we have used pointers earlier in the Associative Arrays chapter. In that chapter, I had intentionally not mentioned the exact type of the in operator and had used it only in logical expressions:

    if ("purple" in colorCodes) {
        // there is an element for key "purple"

    } else {
        // no element for key "purple"
    }

In fact, the in operator returns the address of the element if there is an element for the specified key; otherwise, it returns null. The if statement above actually relies on the automatic conversion of the pointer value to bool.

When the return value of in is stored in a pointer, the element can be accessed efficiently through that pointer:

import std.stdio;

void main() {
    string[int] numbers =
        [ 0 : "zero", 1 : "one", 2 : "two", 3 : "three" ];

    int number = 2;
    auto element = number in numbers;             // (1)

    if (element) {                                // (2)
        writefln("I know: %s.", *element);        // (3)

    } else {
        writefln("I don't know the spelling of %s.", number);
    }
}

The pointer variable element is initialized by the value of the in operator (1) and its value is used in a logical expression (2). The value of the element is accessed through that pointer (3) only if the pointer is not null.

The actual type of element above is a pointer to the same type of the elements (i.e. values) of the associative array. Since the elements of numbers above are of type string, in returns a string*. Accordingly, the type could have been spelled out explicitly:

    string * element = number in numbers;

When to use pointers

Pointers are rare in D. As we have seen in the Reading from the Standard Input chapter, readf can in fact be used without explicit pointers.

When required by libraries

Pointers can appear on C and C++ library bindings. For example, the following function from the GtkD library takes a pointer:

    GdkGeometry geometry;
    // ... set the members of 'geometry' ...

    window.setGeometryHints(/* ... */, &geometry, /* ... */);

When referencing variables of value types

Pointers can be used for referring to local variables. The following program counts the outcomes of flipping a coin. It takes advantage of a pointer when referring to one of two local variables:

import std.stdio;
import std.random;

void main() {
    size_t headsCount = 0;
    size_t tailsCount = 0;

    foreach (i; 0 .. 100) {
        size_t * theCounter = (uniform(0, 2) == 1)
                               ? &headsCount
                               : &tailsCount;
        ++(*theCounter);
    }

    writefln("heads: %s  tails: %s", headsCount, tailsCount);
}

Obviously, there are other ways of achieving the same goal. For example, using the ternary operator in a different way:

        uniform(0, 2) ? ++headsCount : ++tailsCount;

By using an if statement:

        if (uniform(0, 2)) {
            ++headsCount;

        } else {
            ++tailsCount;
        }

As member variables of data structures

Pointers are essential when implementing many data structures.

Unlike the elements of an array being next to each other in memory, elements of many other data structures are apart. Such data structures are based on the concept of their elements pointing at other elements.

For example, each node of a linked list points at the next node. Similarly, each node of a binary tree points at the left and right branches under that node. Pointers are encountered in most other data structures as well.

Although it is possible to take advantage of D's reference types, pointers may be more natural and efficient in some cases.

We will see examples of pointer members below.

When accessing memory directly

Being low-level microprocessor features, pointers provide byte-level access to memory locations. Note that such locations must still belong to valid variables. It is undefined behavior to attempt to access a random memory location.

Examples

A simple linked list

The elements of linked lists are stored in nodes. The concept of a linked list is based on each node pointing at the node that comes after it. The last node has no other node to point at, so it is set to null:

   first node           next node                 last node
 ┌─────────┬───┐     ┌─────────┬───┐          ┌─────────┬──────┐
 │ element │ o────▶  │ element │ o────▶  ...  │ element │ null │
 └─────────┴───┘     └─────────┴───┘          └─────────┴──────┘

The figure above may be misleading: In reality, the nodes are not side-by-side in memory. Each node does point to the next node but the next node may be at a completely different location.

The following struct can be used for representing the nodes of such a linked list of ints:

struct Node {
    int element;
    Node * next;

    // ...
}

Note: Because it contains a reference to the same type as itself, Node is a recursive type.

The entire list can be represented by a single pointer that points at the first node, which is commonly called the head:

struct List {
    Node * head;

    // ...
}

To keep the example short, let's define just one function that adds an element to the head of the list:

struct List {
    Node * head;

    void insertAtHead(int element) {
        head = new Node(element, head);
    }

    // ...
}

The line inside insertAtHead() keeps the nodes linked by adding a new node to the head of the list. (A function that adds to the end of the list would be more natural and more useful. We will see that function later in one of the problems.)

The right-hand side expression of that line constructs a Node object. When this new object is constructed, its next member is initialized by the current head of the list. When the head member of the list is assigned to this newly linked node, the new element ends up being the first element.

The following program tests these two structs:

import std.stdio;
import std.conv;
import std.string;

struct Node {
    int element;
    Node * next;

    string toString() const {
        string result = to!string(element);

        if (next) {
            result ~= " -> " ~ to!string(*next);
        }

        return result;
    }
}

struct List {
    Node * head;

    void insertAtHead(int element) {
        head = new Node(element, head);
    }

    string toString() const {
        return format("(%s)", head ? to!string(*head) : "");
    }
}

void main() {
    List numbers;

    writeln("before: ", numbers);

    foreach (number; 0 .. 10) {
        numbers.insertAtHead(number);
    }

    writeln("after : ", numbers);
}

The output:

before: ()
after : (9 -> 8 -> 7 -> 6 -> 5 -> 4 -> 3 -> 2 -> 1 -> 0)

Observing the contents of memory by `ubyte*`

The data stored at each memory address is a byte. Every variable is constructed on a piece of memory that consists of as many bytes as the size of the type of that variable.

A suitable pointer type to observe the content of a memory location is ubyte*. Once the address of a variable is assigned to a ubyte pointer, then all of the bytes of that variable can be observed by incrementing the pointer.

Let's consider the following integer that is initialized by the hexadecimal notation so that it will be easy to understand how its bytes are placed in memory:

    int variable = 0x01_02_03_04;

A pointer that points at that variable can be defined like this:

    int * address = &variable;

The value of that pointer can be assigned to a ubyte pointer by the cast operator:

    ubyte * bytePointer = cast(ubyte*)address;

Such a pointer allows accessing the four bytes of the int variable individually:

    writeln(bytePointer[0]);
    writeln(bytePointer[1]);
    writeln(bytePointer[2]);
    writeln(bytePointer[3]);

If your microprocessor is little-endian like mine, you should see the bytes of the value 0x01_02_03_04 in reverse:

Let's use that idea in a function that will be useful when observing the bytes of all types of variables:

import std.stdio;

void printBytes(T)(ref T variable) {
    const ubyte * begin = cast(ubyte*)&variable;    // (1)

    writefln("type   : %s", T.stringof);
    writefln("value  : %s", variable);
    writefln("address: %s", begin);                 // (2)
    writef  ("bytes  : ");

    writefln("%(%02x %)", begin[0 .. T.sizeof]);    // (3)

    writeln();
}

Assigning the address of the variable to a ubyte pointer.
Printing the value of the pointer.
Obtaining the size of the type by .sizeof and printing the bytes of the variable. (Note how a slice is produced from the begin pointer and then that slice is printed directly by writefln().)

Another way of printing the bytes would be to apply the * operator individually:

    foreach (bytePointer; begin .. begin + T.sizeof) {
        writef("%02x ", *bytePointer);
    }

The value of bytePointer would change from begin to begin + T.sizeof to visit all of the bytes of the variable. Note that the value begin + T.sizeof is outside of the range and is never accessed.

The following program calls printBytes() with various types of variables:

struct Struct {
    int first;
    int second;
}

class Class {
    int i;
    int j;
    int k;

    this(int i, int j, int k) {
        this.i = i;
        this.j = j;
        this.k = k;
    }
}

void main() {
    int integerVariable = 0x11223344;
    printBytes(integerVariable);

    double doubleVariable = double.nan;
    printBytes(doubleVariable);

    string slice = "a bright and charming façade";
    printBytes(slice);

    int[3] array = [ 1, 2, 3 ];
    printBytes(array);

    auto structObject = Struct(0xaa, 0xbb);
    printBytes(structObject);

    auto classVariable = new Class(1, 2, 3);
    printBytes(classVariable);
}

The output of the program is informative:

type   : int
value  : 287454020
address: 7FFF19A83FB0
bytes  : 44 33 22 11                             ← (1)

type   : double
value  : nan
address: 7FFF19A83FB8
bytes  : 00 00 00 00 00 00 f8 7f                 ← (2)

type   : string
value  : a bright and charming façade
address: 7FFF19A83FC0
bytes  : 1d 00 00 00 00 00 00 00 e0 68 48 00 00 00 00 00
                                                 ← (3)
type   : int[3LU]
value  : [1, 2, 3]
address: 7FFF19A83FD0
bytes  : 01 00 00 00 02 00 00 00 03 00 00 00     ← (1)

type   : Struct
value  : Struct(170, 187)
address: 7FFF19A83FE8
bytes  : aa 00 00 00 bb 00 00 00                 ← (1)

type   : Class
value  : deneme.Class
address: 7FFF19A83FF0
bytes  : 80 df 79 d5 97 7f 00 00                 ← (4)

Observations:

Although in reverse order on little-endian systems, the bytes of some of the types are as one would expect: The bytes are laid out in memory side by side for ints, fixed-length arrays (int[3]), and struct objects.
Considering that the bytes of the special value of double.nan are also in reverse order in memory, we can see that it is represented by the special bit pattern 0x7ff8000000000000.
string is reported to be consisting of 16 bytes but it is impossible to fit the letters "a bright and charming façade" into so few bytes. This is due to the fact that behind the scenes string is actually implemented as a struct. Prefixing its name by __ to stress the fact that it is an internal type used by the compiler, that struct is similar to the following one:
```
struct __string {
    size_t length;
    char * ptr;    // the actual characters
}
```
The evidence of this fact is hidden in the bytes that are printed for string above. Note that because ç is made up of two UTF-8 code units, the 28 letters of the string "a bright and charming façade" consists of a total of 29 bytes. The value 0x000000000000001d, the first 8 of the bytes of the string in the output above, is also 29. This is a strong indicator that strings are indeed laid out in memory as in the struct above.
Similarly, it is not possible to fit the three int members of the class object in 8 bytes. The output above hints at the possibility that behind the scenes a class variable is implemented as a single pointer that points at the actual class object:
```
struct __Class_VariableType {
    __Class_ActualObjecType * object;
}
```

Let's now consider a more flexible function. Instead of printing the bytes of a variable, let's define a function that prints specified number of bytes at a specified location:

import std.stdio;
import std.ascii;

void printMemory(T)(T * location, size_t length) {
    const ubyte * begin = cast(ubyte*)location;

    foreach (address; begin .. begin + length) {
        char c = (isPrintable(*address) ? *address : '.');

        writefln("%s:  %02x  %s", address, *address, c);
    }
}

Since some of the UTF-8 code units may correspond to control characters of the terminal and disrupt its output, we print only the printable characters by first checking them individually by std.ascii.isPrintable(). The non-printable characters are printed as a dot.

We can use that function to print the UTF-8 code units of a string through its .ptr property:

import std.stdio;

void main() {
    string s = "a bright and charming façade";
    printMemory(s.ptr, s.length);
}

As seen in the output, the letter ç consists of two bytes:

47B4F0:  61  a
47B4F1:  20   
47B4F2:  62  b
47B4F3:  72  r
47B4F4:  69  i
47B4F5:  67  g
47B4F6:  68  h
47B4F7:  74  t
47B4F8:  20   
47B4F9:  61  a
47B4FA:  6e  n
47B4FB:  64  d
47B4FC:  20   
47B4FD:  63  c
47B4FE:  68  h
47B4FF:  61  a
47B500:  72  r
47B501:  6d  m
47B502:  69  i
47B503:  6e  n
47B504:  67  g
47B505:  20   
47B506:  66  f
47B507:  61  a
47B508:  c3  .
47B509:  a7  .
47B50A:  61  a
47B50B:  64  d
47B50C:  65  e

Exercises

Fix the following function so that the values of the arguments that are passed to it are swapped. For this exercise, do not specify the parameters as ref but take them as pointers:

void swap(int lhs, int rhs) {
    int temp = lhs;
    lhs = rhs;
    rhs = temp;
}

void main() {
    int i = 1;
    int j = 2;

    swap(i, j);

    // Their values should be swapped
    assert(i == 2);
    assert(j == 1);
}

When you start the program you will notice that the assert checks currently fail.

Convert the linked list that we have defined above to a template so that it can be used for storing elements of any type.
It is more natural to add elements to the end of a linked list. Modify List so that it is possible to add elements to the end as well.
For this exercise, an additional pointer member variable that points at the last element will be useful.

... the solutions

[ ↢ Prev ] [ Next ↣ ]

Pointers

The concept of a reference

The ref variables in foreach loops

ref function parameters

Reference types

Syntax

Pointer value and the address-of operator &

The access operator *

The . (dot) operator to access a member of the pointee

Modifying the value of a pointer

Pointers are risky

The element one past the end of an array

Using pointers with the array indexing operator []

Producing a slice from a pointer

void* can point at any type

Using pointers in logical expressions

new returns a pointer for some types

The .ptr property of arrays

The in operator of associative arrays

When to use pointers

When required by libraries

When referencing variables of value types

As member variables of data structures

When accessing memory directly

Examples

A simple linked list

Observing the contents of memory by ubyte*

Exercises

The `ref` variables in `foreach` loops

`ref` function parameters

Pointer value and the address-of operator `&`

The access operator `*`

The `.` (dot) operator to access a member of the pointee

Using pointers with the array indexing operator `[]`

`void*` can point at any type

`new` returns a pointer for some types

The `.ptr` property of arrays

The `in` operator of associative arrays

Observing the contents of memory by `ubyte*`