Programming in D - Slices and Other Array Features

Slices and Other Array Features

We have seen in the Arrays chapter how elements are grouped as a collection in an array. That chapter was intentionally brief, leaving most of the features of arrays to this chapter.

Before going any further, here are a few brief definitions of some of the terms that happen to be close in meaning:

Array: The general concept of a group of elements that are located side by side and are accessed by indexes.
Fixed-length array (static array): An array with a fixed number of elements. This type of array owns its elements.
Dynamic array: An array that can gain or lose elements. This type of array provides access to elements that are owned by the D runtime environment.
Slice: Another name for dynamic array.

When I write slice I will specifically mean a slice; and when I write array, I will mean either a slice or a fixed-length array, with no distinction.

Slices

Slices are the same feature as dynamic arrays. They are called dynamic arrays for being used like arrays, and are called slices for providing access to portions of other arrays. They allow using those portions as if they are separate arrays.

Slices are defined by the number range syntax that correspond to the indexes that specify the beginning and the end of the range:

  beginning_index .. one_beyond_the_end_index

In the number range syntax, the beginning index is a part of the range but the end index is outside of the range:

/* ... */ = monthDays[0 .. 3];  // 0, 1, and 2 are included; but not 3

Note: Number ranges are different from Phobos ranges. Phobos ranges are about struct and class interfaces. We will see these features in later chapters.

As an example, we can slice the monthDays array to be able to use its parts as four smaller arrays:

    int[12] monthDays =
        [ 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 ];

    int[] firstQuarter  = monthDays[0 .. 3];
    int[] secondQuarter = monthDays[3 .. 6];
    int[] thirdQuarter  = monthDays[6 .. 9];
    int[] fourthQuarter = monthDays[9 .. 12];

The four variables in the code above are slices; they provide access to four parts of an already existing array. An important point worth stressing here is that those slices do not have their own elements. They merely provide access to the elements of the actual array. Modifying an element of a slice modifies the element of the actual array. To see this, let's modify the first elements of each slice and then print the actual array:

    firstQuarter[0]  = 1;
    secondQuarter[0] = 2;
    thirdQuarter[0]  = 3;
    fourthQuarter[0] = 4;

    writeln(monthDays);

The output:

[1, 28, 31, 2, 31, 30, 3, 31, 30, 4, 30, 31]

Each slice modifies its first element, and the corresponding element of the actual array is affected.

We have seen earlier that valid array indexes are from 0 to one less than the length of the array. For example, the valid indexes of a 3-element array are 0, 1, and 2. Similarly, the end index in the slice syntax specifies one beyond the last element that the slice will be providing access to. For that reason, when the last element of an array needs to be included in a slice, the length of the array must be specified as the end index. For example, a slice of all elements of a 3-element array would be array[0..3].

An obvious limitation is that the beginning index cannot be greater than the end index:

    int[3] array = [ 0, 1, 2 ];
    int[] slice = array[2 .. 1];  // ← run-time ERROR

It is legal to have the beginning and the end indexes to be equal. In that case the slice is empty. Assuming that index is valid:

    int[] slice = anArray[index .. index];
    writeln("The length of the slice: ", slice.length);

The output:

The length of the slice: 0

Using `$`, instead of `array.length`

When indexing, $ is a shorthand for the length of the array:

    writeln(array[array.length - 1]);  // the last element
    writeln(array[$ - 1]);             // the same thing

Using `.dup` to copy

Short for "duplicate", the .dup property makes a new array from the copies of the elements of an existing array:

    double[] array = [ 1.25, 3.75 ];
    double[] theCopy = array.dup;

As an example, let's define an array that contains the number of days of the months of a leap year. A method is to take a copy of the non-leap-year array and then to increment the element that corresponds to February:

import std.stdio;

void main() {
    int[12] monthDays =
        [ 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 ];

    int[] leapYear = monthDays.dup;

    ++leapYear[1];   // increments the days in February

    writeln("Non-leap year: ", monthDays);
    writeln("Leap year    : ", leapYear);
}

The output:

Non-leap year: [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
Leap year    : [31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]

Assignment

We have seen so far that the assignment operator modifies values of variables. It is the same with fixed-length arrays:

    int[3] a = [ 1, 1, 1 ];
    int[3] b = [ 2, 2, 2 ];

    a = b;        // the elements of 'a' become 2
    writeln(a);

The output:

[2, 2, 2]

The assignment operation has a completely different meaning for slices: It makes the slice start providing access to new elements:

    int[] odds = [ 1, 3, 5, 7, 9, 11 ];
    int[] evens = [ 2, 4, 6, 8, 10 ];

    int[] slice;   // not providing access to any elements yet

    slice = odds[2 .. $ - 2];
    writeln(slice);

    slice = evens[1 .. $ - 1];
    writeln(slice);

Above, slice does not provide access to any elements when it is defined. It is then used to provide access to some of the elements of odds, and later to some of the elements of evens:

[5, 7]
[4, 6, 8]

Making a slice longer may terminate sharing

Since the length of a fixed-length array cannot be changed, the concept of termination of sharing is only about slices.

It is possible to access the same elements by more than one slice. For example, the first two of the eight elements below are being accessed through three slices:

import std.stdio;

void main() {
    int[] slice = [ 1, 3, 5, 7, 9, 11, 13, 15 ];
    int[] half = slice[0 .. $ / 2];
    int[] quarter = slice[0 .. $ / 4];

    quarter[1] = 0;     // modify through one slice

    writeln(quarter);
    writeln(half);
    writeln(slice);
}

The effect of the modification to the second element of quarter is seen through all slices:

[1, 0]
[1, 0, 5, 7]
[1, 0, 5, 7, 9, 11, 13, 15]

When viewed this way, slices provide shared access to elements. This sharing opens the question of what happens when a new element is added to one of the slices. Since multiple slices can provide access to same elements, there may not be room to add elements to a slice without stomping on the elements of others.

D disallows element stomping and answers this question by terminating the sharing relationship if there is no room for the new element: The slice that has no room to grow leaves the sharing. When this happens, all of the existing elements of that slice are copied to a new place automatically and the slice starts providing access to these new elements.

To see this in action, let's add an element to quarter before modifying its second element:

    quarter ~= 42;    // this slice leaves the sharing because
                      // there is no room for the new element

    quarter[1] = 0;   // for that reason this modification
                      // does not affect the other slices

The output of the program shows that the modification to the quarter slice does not affect the others:

[1, 0, 42]
[1, 3, 5, 7]
[1, 3, 5, 7, 9, 11, 13, 15]

Explicitly increasing the length of a slice makes it leave the sharing as well:

    ++quarter.length;       // leaves the sharing

    quarter.length += 5;    // leaves the sharing

On the other hand, shortening a slice does not affect sharing. Shortening the slice merely means that the slice now provides access to fewer elements:

    int[] a = [ 1, 11, 111 ];
    int[] d = a;

    d = d[1 .. $];  // shortening from the beginning
    d[0] = 42;      // modifying the element through the slice

    writeln(a);     // printing the other slice

As can be seen from the output, the modification through d is seen through a; the sharing is still in effect:

[1, 42, 111]

Reducing the length in different ways does not terminate the sharing either:

    d = d[0 .. $ - 1];         // shortening from the end
    --d.length;                // same thing
    d.length = d.length - 1;   // same thing

Sharing of elements is still in effect.

Using `capacity` to determine whether sharing will be terminated

There are cases when slices continue sharing elements even after an element is added to one of them. This happens when the element is added to the longest slice and there is room at the end of it:

import std.stdio;

void main() {
    int[] slice = [ 1, 3, 5, 7, 9, 11, 13, 15 ];
    int[] half = slice[0 .. $ / 2];
    int[] quarter = slice[0 .. $ / 4];

    slice ~= 42;      // adding to the longest slice ...
    slice[1] = 0;     // ... and then modifying an element

    writeln(quarter);
    writeln(half);
    writeln(slice);
}

As seen in the output, although the added element increases the length of a slice, the sharing has not been terminated, and the modification is seen through all slices:

[1, 0]
[1, 0, 5, 7]
[1, 0, 5, 7, 9, 11, 13, 15, 42]

The capacity property of slices determines whether the sharing will be terminated if an element is added to a particular slice. (capacity is actually a function but this distinction does not have any significance in this discussion.)

The value of capacity has the following meanings:

When its value is 0, it means that this is not the longest original slice. In this case, adding a new element would definitely relocate the elements of the slice and the sharing would terminate.
When its value is nonzero, it means that this is the longest original slice. In this case capacity denotes the total number of elements that this slice can hold without needing to be copied. The number of new elements that can be added can be calculated by subtracting the actual length of the slice from the capacity value. If the length of the slice equals its capacity, then the slice will be copied to a new location if one more element is added.

Accordingly, a program that needs to determine whether the sharing will terminate should use a logic similar to the following:

    if (slice.capacity == 0) {
        /* Its elements would be relocated if one more element
         * is added to this slice. */

        // ...

    } else {
        /* This slice may have room for new elements before
         * needing to be relocated. Let's calculate how
         * many: */
        auto howManyNewElements = slice.capacity - slice.length;

        // ...
    }

An interesting corner case is when there are more than one slice to all elements. In such a case all slices report to have capacity:

import std.stdio;

void main() {
    // Three slices to all elements
    int[] s0 = [ 1, 2, 3, 4 ];
    int[] s1 = s0;
    int[] s2 = s0;

    writeln(s0.capacity);
    writeln(s1.capacity);
    writeln(s2.capacity);
}

All three have capacity:

7
7
7

However, as soon as an element is added to one of the slices, the capacity of the others drop to 0:

    s1 ~= 42;    // ← s1 becomes the longest

    writeln(s0.capacity);
    writeln(s1.capacity);
    writeln(s2.capacity);

Since the slice with the added element is now the longest, it is the only one with capacity:

0
7        ← now only s1 has capacity
0

Reserving room for elements

Both copying elements and allocating new memory to increase capacity have some cost. For that reason, appending an element can be an expensive operation. When the number of elements to append is known beforehand, it is possible to reserve capacity for the elements:

import std.stdio;

void main() {
    int[] slice;

    slice.reserve(20);
    writeln(slice.capacity);

    foreach (element; 0 .. 17) {
        slice ~= element;  // ← these elements will not be moved
    }
}

31        ← Capacity for at least 20 elements

The elements of slice would be moved only after there are more than 31 elements.

Operations on all elements

This feature is for both fixed-length arrays and slices.

The [] characters written after the name of an array means all elements. This feature simplifies the program when certain operations need to be applied to all of the elements of an array.

import std.stdio;

void main() {
    double[3] a = [ 10, 20, 30 ];
    double[3] b = [  2,  3,  4 ];

    double[3] result = a[] + b[];

    writeln(result);
}

The output:

[12, 23, 34]

The addition operation in that program is applied to the corresponding elements of both of the arrays in order: First the first elements are added, then the second elements are added, etc. A natural requirement is that the lengths of the two arrays must be equal.

The operator can be one of the arithmetic operators +, -, *, /, %, and ^^; one of the binary operators ^, &, and |; as well as the unary operators - and ~ that are typed in front of an array. We will see some of these operators in later chapters.

The assignment versions of these operators can also be used: =, +=, -=, *=, /=, %=, ^^=, ^=, &=, and |=.

This feature works not only using two arrays; it can also be used with an array and a compatible expression. For example, the following operation divides all elements of an array by four:

    double[3] a = [ 10, 20, 30 ];
    a[] /= 4;

    writeln(a);

The output:

[2.5, 5, 7.5]

To assign a specific value to all elements:

    a[] = 42;
    writeln(a);

The output:

[42, 42, 42]

This feature requires great attention when used with slices. Although there is no apparent difference in element values, the following two expressions have very different meanings:

    slice2 = slice1;      // ← slice2 starts providing access
                          //   to the same elements that
                          //   slice1 provides access to

    slice3[] = slice1;    // ← the values of the elements of
                          //   slice3 change

The assignment of slice2 makes it share the same elements as slice1. On the other hand, since slice3[] means all elements of slice3, the values of its elements become the same as the values of the elements of slice1. The effect of the presence or absence of the [] characters cannot be ignored.

We can see an example of this difference in the following program:

import std.stdio;

void main() {
    double[] slice1 = [ 1, 1, 1 ];
    double[] slice2 = [ 2, 2, 2 ];
    double[] slice3 = [ 3, 3, 3 ];

    slice2 = slice1;      // ← slice2 starts providing access
                          //   to the same elements that
                          //   slice1 provides access to

    slice3[] = slice1;    // ← the values of the elements of
                          //   slice3 change

    writeln("slice1 before: ", slice1);
    writeln("slice2 before: ", slice2);
    writeln("slice3 before: ", slice3);

    slice2[0] = 42;       // ← the value of an element that
                          //   it shares with slice1 changes

    slice3[0] = 43;       // ← the value of an element that
                          //   only it provides access to
                          //   changes

    writeln("slice1 after : ", slice1);
    writeln("slice2 after : ", slice2);
    writeln("slice3 after : ", slice3);
}

The modification through slice2 affects slice1 too:

slice1 before: [1, 1, 1]
slice2 before: [1, 1, 1]
slice3 before: [1, 1, 1]
slice1 after : [42, 1, 1]
slice2 after : [42, 1, 1]
slice3 after : [43, 1, 1]

The danger here is that the potential bug may not be noticed until after the value of a shared element is changed.

Multi-dimensional arrays

So far we have used arrays with only fundamental types like int and double. The element type can actually be any other type, including other arrays. This enables the programmer to define complex containers like array of arrays. Arrays of arrays are called multi-dimensional arrays.

The elements of all of the arrays that we have defined so far have been written in the source code from left to right. To help us understand the concept of a two-dimensional array, let's define an array from top to bottom this time:

    int[] array = [
                    10,
                    20,
                    30,
                    40
                  ];

As you remember, most spaces in the source code are used to help with readability and do not change the meaning of the code. The array above could have been defined on a single line and would have the same meaning.

Let's now replace every element of that array with another array:

  /* ... */ array = [
                      [ 10, 11, 12 ],
                      [ 20, 21, 22 ],
                      [ 30, 31, 32 ],
                      [ 40, 41, 42 ]
                    ];

We have replaced elements of type int with elements of type int[]. To make the code conform to the array definition syntax, we must now specify the type of the elements as int[] instead of int:

    int[][] array = [
                      [ 10, 11, 12 ],
                      [ 20, 21, 22 ],
                      [ 30, 31, 32 ],
                      [ 40, 41, 42 ]
                    ];

Such arrays are called two-dimensional arrays because they can be seen as having rows and columns.

Two-dimensional arrays are used the same way as any other array as long as we remember that each element is an array itself and is used in array operations:

    array ~= [ 50, 51 ]; // adds a new element (i.e. a slice)
    array[0] ~= 13;      // adds to the first element

The new state of the array:

[[10, 11, 12, 13], [20, 21, 22], [30, 31, 32], [40, 41, 42], [50, 51]]

Arrays and elements can be fixed-length as well. The following is a three-dimensional array where all dimensions are fixed-length:

    int[2][3][4] array;  // 2 columns, 3 rows, 4 pages

The definition above can be seen as four pages of three rows of two columns of integers. As an example, such an array can be used to represent a 4-story building in an adventure game, each story consisting of 2x3=6 rooms.

For example, the number of items in the first room of the second floor can be incremented as follows:

    // The index of the second floor is 1, and the first room
    // of that floor is accessed by [0][0]
    ++itemCounts[1][0][0];

In addition to the syntax above, the new expression can also be used to create a slice of slices. The following example uses only two dimensions:

import std.stdio;

void main() {
    int[][] s = new int[][](2, 3);
    writeln(s);
}

The new expression above creates 2 slices containing 3 elements each and returns a slice that provides access to those slices and elements. The output:

[[0, 0, 0], [0, 0, 0]]

Summary

Fixed-length arrays own their elements; slices provide access to elements that don't belong exclusively to them.
Within the [] operator, $ is the equivalent of array_name.length.
.dup makes a new array that consists of the copies of the elements of an existing array.
With fixed-length arrays, the assignment operation changes the values of elements; with slices, it makes the slices start providing access to other elements.
Slices that get longer may stop sharing elements and start providing access to newly copied elements. capacity determines whether this will be the case.
The syntax array[] means all elements of the array; the operation that is applied to it is applied to each element individually.
Arrays of arrays are called multi-dimensional arrays.

Exercise

Iterate over the elements of an array of doubles and halve the ones that are greater than 10. For example, given the following array:

    double[] array = [ 1, 20, 2, 30, 7, 11 ];

Modify it as the following:

[1, 10, 2, 15, 7, 5.5]

Although there are many solutions of this problem, try to use only the features of slices. You can start with a slice that provides access to all elements. Then you can shorten the slice from the beginning and always use the first element.

The following expression shortens the slice from the beginning:

        slice = slice[1 .. $];

... the solution

[ ↢ Prev ] [ Next ↣ ]

Slices and Other Array Features

Slices

Using $, instead of array.length

Using .dup to copy

Assignment

Making a slice longer may terminate sharing

Using capacity to determine whether sharing will be terminated

Reserving room for elements

Operations on all elements

Multi-dimensional arrays

Summary

Exercise

Using `$`, instead of `array.length`

Using `.dup` to copy

Using `capacity` to determine whether sharing will be terminated