Programming in D - More Functions

More Functions

Functions have been covered in the following chapters so far in the book:

This chapter will cover more features of functions.

Return type attributes

Functions can be marked as auto, ref, inout, and auto ref. These attributes are about return types of functions.

`auto` functions

The return types of auto functions need not be specified:

auto add(int first, double second) {
    double result = first + second;
    return result;
}

The return type is deduced by the compiler from the return expression. Since the type of result is double, the return type of add() is double.

If there are more than one return statement, then the return type of the function is their common type. (We have seen common type in the Ternary Operator ?: chapter.) For example, because the common type of int and double is double, the return type of the following auto function is double as well:

auto func(int i) {
    if (i < 0) {
        return i;      // returns 'int' here
    }

    return i * 1.5;    // returns 'double' here
}

void main() {
    // The return type of the function is 'double'
    auto result = func(42);
    static assert(is (typeof(result) == double));
}

`ref` functions

Normally, the expression that is returned from a function is copied to the caller's context. ref specifies that the expression should be returned by-reference instead.

For example, the following function returns the greater of its two parameters:

int greater(int first, int second) {
    return (first > second) ? first : second;
}

Normally, both the parameters and the return value of that function are copied:

import std.stdio;

void main() {
    int a = 1;
    int b = 2;
    int result = greater(a, b);
    result += 10;                // ← neither a nor b changes
    writefln("a: %s, b: %s, result: %s", a, b, result);
}

Because the return value of greater() is copied to result, adding to result affects only that variable; neither a nor b changes:

a: 1, b: 2, result: 12

ref parameters are passed by references instead of being copied. The same keyword has the same effect on return values:

ref int greater(ref int first, ref int second) {
    return (first > second) ? first : second;
}

This time, the returned reference would be an alias to one of the arguments and mutating the returned reference would modify either a or b:

    int a = 1;
    int b = 2;
    greater(a, b) += 10;         // ← either a or b changes
    writefln("a: %s, b: %s", a, b);

Note that the returned reference is incremented directly. As a result, the greater of the two arguments changes:

a: 1, b: 12

Local reference requires a pointer: An important point is that although the return type is marked as ref, a and b would still not change if the return value were assigned to a local variable:

    int result = greater(a, b);
    result += 10;                // ← only result changes

Although greater() returns a reference to a or b, that reference gets copied to the local variable result, and again neither a nor b changes:

a: 1, b: 2, result: 12

For result be a reference to a or b, it has to be defined as a pointer:

    int * result = &greater(a, b);
    *result += 10;
    writefln("a: %s, b: %s, result: %s", a, b, *result);

This time result would be a reference to either a or b and the mutation through it would affect the actual variable:

a: 1, b: 12, result: 12

It is not possible to return a reference to a local variable: The ref return value is an alias to one of the arguments that start their lives even before the function is called. That means, regardless of whether a reference to a or b is returned, the returned reference refers to a variable that is still alive.

Conversely, it is not possible to return a reference to a variable that is not going to be alive upon leaving the function:

ref string parenthesized(string phrase) {
    string result = '(' ~ phrase ~ ')';
    return result;    // ← compilation ERROR
} // ← the lifetime of result ends here

The lifetime of local result ends upon leaving the function. For that reason, it is not possible to return a reference to that variable:

Error: escaping reference to local variable result

`auto ref` functions

auto ref helps with functions like parenthesized() above. Similar to auto, the return type of an auto ref function is deduced by the compiler. Additionally, if the returned expression can be a reference, that variable is returned by reference as opposed to being copied.

parenthesized() can be compiled if the return type is auto ref:

auto ref string parenthesized(string phrase) {
    string result = '(' ~ phrase ~ ')';
    return result;                  // ← compiles
}

The very first return statement of the function determines whether the function returns a copy or a reference.

auto ref is more useful in function templates where template parameters may be references or copies depending on context.

`inout` functions

The inout keyword appears for parameter and return types of functions. It works like a template for const, immutable, and mutable.

Let's rewrite the previous function as taking string (i.e. immutable(char)[]) and returning string:

string parenthesized(string phrase) {
    return '(' ~ phrase ~ ')';
}

// ...

    writeln(parenthesized("hello"));

As expected, the code works with that string argument:

(hello)

However, as it works only with immutable strings, the function can be seen as being less useful than it could have been:

    char[] m;    // has mutable elements
    m ~= "hello";
    writeln(parenthesized(m));    // ← compilation ERROR

Error: function deneme.parenthesized (string phrase)
is not callable using argument types (char[])

The same limitation applies to const(char)[] strings as well.

One solution for this usability issue is to overload the function for const and mutable strings:

char[] parenthesized(char[] phrase) {
    return '(' ~ phrase ~ ')';
}

const(char)[] parenthesized(const(char)[] phrase) {
    return '(' ~ phrase ~ ')';
}

That design would be less than ideal due to the obvious code duplications. Another solution would be to define the function as a template:

T parenthesized(T)(T phrase) {
    return '(' ~ phrase ~ ')';
}

Although that would work, this time it may be seen as being too flexible and potentially requiring template constraints.

inout is very similar to the template solution. The difference is that not the entire type but just the mutability attribute is deduced from the parameter:

inout(char)[] parenthesized(inout(char)[] phrase) {
    return '(' ~ phrase ~ ')';
}

inout transfers the deduced mutability attribute to the return type.

When the function is called with char[], it gets compiled as if inout is not specified at all. On the other hand, when called with immutable(char)[] or const(char)[], inout means immutable or const, respectively.

The following code demonstrates this by printing the type of the returned expression:

    char[] m;
    writeln(typeof(parenthesized(m)).stringof);

    const(char)[] c;
    writeln(typeof(parenthesized(c)).stringof);

    immutable(char)[] i;
    writeln(typeof(parenthesized(i)).stringof);

The output:

char[]
const(char)[]
string

Behavioral attributes

pure, nothrow, and @nogc are about function behaviors.

`pure` functions

As we have seen in the Functions chapter, functions can produce return values and side effects. When possible, return values should be preferred over side effects because functions that do not have side effects are easier to make sense of, which in turn helps with program correctness and maintainability.

A similar concept is the purity of a function. Purity is defined differently in D from most other programming languages: In D, a function that does not access mutable global or static state is pure. (Since input and output streams are considered as mutable global state, pure functions cannot perform input or output operations either.)

In other words, a function is pure if it produces its return value and side effects only by accessing its parameters, local variables, and immutable global state.

An important aspect of purity in D is that pure functions can mutate their parameters.

Additionally, the following operations that mutate the global state of the program are explicitly allowed in pure functions:

Allocate memory with the new expression
Terminate the program
Access the floating point processing flags
Throw exceptions

The pure keyword specifies that a function should behave according to those conditions and the compiler guarantees that it does so.

Naturally, since impure functions do not provide the same guarantees, a pure function cannot call impure functions.

The following program demonstrates some of the operations that a pure function can and cannot perform:

import std.stdio;
import std.exception;

int mutableGlobal;
const int constGlobal;
immutable int immutableGlobal;

void impureFunction() {
}

int pureFunction(ref int i, int[] slice) pure {
    // Can throw exceptions:
    enforce(slice.length >= 1);

    // Can mutate its parameters:
    i = 42;
    slice[0] = 43;

    // Can access immutable global state:
    i = constGlobal;
    i = immutableGlobal;

    // Can use the new expression:
    auto p = new int;

    // Cannot access mutable global state:
    i = mutableGlobal;    // ← compilation ERROR

    // Cannot perform input and output operations:
    writeln(i);           // ← compilation ERROR

    static int mutableStatic;

    // Cannot access mutable static state:
    i = mutableStatic;    // ← compilation ERROR

    // Cannot call impure functions:
    impureFunction();     // ← compilation ERROR

    return 0;
}

void main() {
    int i;
    int[] slice = [ 1 ];
    pureFunction(i, slice);
}

Although they are allowed to, some pure functions do not mutate their parameters. Following from the rules of purity, the only observable effect of such a function would be its return value. Further, since the function cannot access any mutable global state, the return value would be the same for a given set of arguments, regardless of when and how many times the function is called during the execution of the program. This fact gives both the compiler and the programmer optimization opportunities. For example, instead of calling the function a second time for a given set of arguments, its return value from the first call can be cached and used instead of actually calling the function again.

Since the exact code that gets generated for a template instantiation depends on the actual template arguments, whether the generated code is pure depends on the arguments as well. For that reason, the purity of a template is inferred by the compiler from the generated code. (The pure keyword can still be specified by the programmer.) Similarly, the purity of an auto function is inferred.

As a simple example, since the following function template would be impure when N is zero, it would not be possible to call templ!0() from a pure function:

import std.stdio;

// This template is impure when N is zero
void templ(size_t N)() {
    static if (N == 0) {
        // Prints when N is zero:
        writeln("zero");
    }
}

void foo() pure {
    templ!0();    // ← compilation ERROR
}

void main() {
    foo();
}

The compiler infers that the 0 instantiation of the template is impure and rejects calling it from the pure function foo():

Error: pure function 'deneme.foo' cannot call impure function
'deneme.templ!0.templ'

However, since the instantiation of the template for values other than zero is pure, the program can be compiled for such values:

void foo() pure {
    templ!1();    // ← compiles
}

We have seen earlier above that input and output functions like writeln() cannot be used in pure functions because they access global state. Sometimes such limitations are too restrictive e.g. when needing to print a message temporarily during debugging. For that reason, the purity rules are relaxed for code that is marked as debug:

import std.stdio;

debug size_t fooCounter;

void foo(int i) pure {
    debug ++fooCounter;

    if (i == 0) {
        debug writeln("i is zero");
        i = 42;
    }

    // ...
}

void main() {
    foreach (i; 0..100) {
        if ((i % 10) == 0) {
            foo(i);
        }
    }

    debug writefln("foo is called %s times", fooCounter);
}

The pure function above mutates the global state of the program by modifying a global variable and printing a message. Despite those impure operations, it still can be compiled because those operations are marked as debug.

Note: Remember that those statements are included in the program only if the program is compiled with the ‑debug command line switch.

Member functions can be marked as pure as well. Subclasses can override impure functions as pure but the reverse is not allowed:

interface Iface {
    void foo() pure;    // Subclasses must define foo as pure.

    void bar();         // Subclasses may define bar as pure.
}

class Class : Iface {
    void foo() pure {   // Required to be pure
        // ...
    }

    void bar() pure {   // pure although not required
        // ...
    }
}

Delegates and anonymous functions can be pure as well. Similar to templates, whether a function or delegate literal, or auto function is pure is inferred by the compiler:

import std.stdio;

void foo(int delegate(double) pure dg) {
    int i = dg(1.5);
}

void main() {
    foo(a => 42);                // ← compiles

    foo((a) {                    // ← compilation ERROR
            writeln("hello");
            return 42;
        });
}

foo() above requires that its parameter be a pure delegate. The compiler infers that the lambda a => 42 is pure and allows it as an argument for foo(). However, since the other delegate is impure it cannot be passed to foo():

Error: function deneme.foo (int delegate(double) pure dg)
is not callable using argument types (void)

One benefit of pure functions is that their return values can be used to initialize immutable variables. Although the array produced by makeNumbers() below is mutable, it is not possible for its elements to be changed by any code outside of that function. For that reason, the initialization works.

int[] makeNumbers() pure {
    int[] result;
    result ~= 42;
    return result;
}

void main() {
    immutable array = makeNumbers();
}

`nothrow` functions

We saw the exception mechanism in the Exceptions chapter.

It would be good practice for functions to document the types of exceptions that they may throw under specific error conditions. However, as a general rule, callers should assume that any function can throw any exception.

Sometimes it is more important to know that a function does not emit any exception at all. For example, some algorithms can take advantage of the fact that certain of their steps cannot be interrupted by an exception.

nothrow guarantees that a function does not emit any exception:

int add(int lhs, int rhs) nothrow {
    // ...
}

Note: Remember that it is not recommended to catch Error nor its base class Throwable. What is meant here by "any exception" is "any exception that is defined under the Exception hierarchy." A nothrow function can still emit exceptions that are under the Error hierarchy, which represents irrecoverable error conditions that should preclude the program from continuing its execution.

Such a function can neither throw an exception itself nor can call a function that may throw an exception:

int add(int lhs, int rhs) nothrow {
    writeln("adding");    // ← compilation ERROR
    return lhs + rhs;
}

The compiler rejects the code because add() violates the no-throw guarantee:

Error: function 'deneme.add' is nothrow yet may throw

This is because writeln is not (and cannot be) a nothrow function.

The compiler can infer that a function can never emit an exception. The following implementation of add() is nothrow because it is obvious to the compiler that the try-catch block prevents any exception from escaping the function:

int add(int lhs, int rhs) nothrow {
    int result;

    try {
        writeln("adding");    // ← compiles
        result = lhs + rhs;

    } catch (Exception error) {   // catches all exceptions
        // ...
    }

    return result;
}

As mentioned above, nothrow does not include exceptions that are under the Error hierarchy. For example, although accessing an element of an array with [] can throw RangeError, the following function can still be defined as nothrow:

int foo(int[] arr, size_t i) nothrow {
    return 10 * arr[i];
}

As with purity, the compiler automatically deduces whether a template, delegate, or anonymous function is nothrow.

`@nogc` functions

D is a garbage collected language. Many data structures and algorithms in most D programs take advantage of dynamic memory blocks that are managed by the garbage collector (GC). Such memory blocks are reclaimed again by the GC by an algorithm called garbage collection.

Some commonly used D operations take advantage of the GC as well. For example, elements of arrays live on dynamic memory blocks:

// A function that takes advantage of the GC indirectly
int[] append(int[] slice) {
    slice ~= 42;
    return slice;
}

If the slice does not have sufficient capacity, the ~= operator above allocates a new memory block from the GC.

Although the GC is a significant convenience for data structures and algorithms, memory allocation and garbage collection are costly operations that make the execution of some programs noticeably slow.

@nogc means that a function cannot use the GC directly or indirectly:

void foo() @nogc {
    // ...
}

The compiler guarantees that a @nogc function does not involve GC operations. For example, the following function cannot call append() above, which does not provide the @nogc guarantee:

void foo() @nogc {
    int[] slice;
    // ...
    append(slice);    // ← compilation ERROR
}

Error: @nogc function 'deneme.foo' cannot call non-@nogc function
'deneme.append'

Code safety attributes

@safe, @trusted, and @system are about the code safety that a function provides. As with purity, the compiler infers the safety level of templates, delegates, anonymous functions, and auto functions.

`@safe` functions

A class of programming errors involve corrupting data at unrelated locations in memory by writing at those locations unintentionally. Such errors are mostly due to mistakes made in using pointers and applying type casts.

@safe functions guarantee that they do not contain any operation that may corrupt memory. The compiler does not allow the following operations in @safe functions:

Pointers cannot be converted to other pointer types other than void*.
A non-pointer expression cannot be converted to a pointer value.
Pointer values cannot be changed (no pointer arithmetic; however, assigning a pointer to another pointer of the same type is safe).
Unions that have pointer or reference members cannot be used.
Functions marked as @system cannot be called.
Exceptions that are not descended from Exception cannot be caught.
Inline assembler cannot be used.
Mutable variables cannot be cast to immutable.
immutable variables cannot be cast to mutable.
Thread-local variables cannot be cast to shared.
shared variables cannot be cast to thread-local.
Addresses of function-local variables cannot be taken.
__gshared variables cannot be accessed.

`@trusted` functions

Some functions may actually be safe but cannot be marked as @safe for various reasons. For example, a function may have to call a library written in C, where no language support exists for safety in that language.

Some other functions may actually perform operations that are not allowed in @safe code, but may be well tested and trusted to be correct.

@trusted is an attribute that communicates to the compiler that although the function cannot be marked as @safe, consider it safe. The compiler trusts the programmer and treats @trusted code as if it is safe. For example, it allows @safe code to call @trusted code.

`@system` functions

Any function that is not marked as @safe or @trusted is considered @system, which is the default safety attribute.

Compile time function execution (CTFE)

In many programming languages, computations that are performed at compile time are very limited. Such computations are usually as simple as calculating the length of a fixed-length array or simple arithmetic operations:

    writeln(1 + 2);

The 1 + 2 expression above is compiled as if it has been written as 3; there is no computation at runtime.

D has CTFE, which allows any function to be executed at compile time as long as it is possible to do so.

Let's consider the following program that prints a menu to the output:

import std.stdio;
import std.string;
import std.range;

string menuLines(string[] choices) {
    string result;

    foreach (i, choice; choices) {
        result ~= format(" %s. %s\n", i + 1, choice);
    }

    return result;
}

string menu(string title,
            string[] choices,
            size_t width) {
    return format("%s\n%s\n%s",
                  title.center(width),
                  '='.repeat(width),    // horizontal line
                  menuLines(choices));
}

void main() {
    enum drinks =
        menu("Drinks",
             [ "Coffee", "Tea", "Hot chocolate" ], 20);

    writeln(drinks);
}

Although the same result can be achieved in different ways, the program above performs non-trivial operations to produce the following string:

       Drinks       
====================
 1. Coffee
 2. Tea
 3. Hot chocolate

Remember that the initial value of enum constants like drinks must be known at compile time. That fact is sufficient for menu() to be executed at compile time. The value that it returns at compile time is used as the initial value of drinks. As a result, the program is compiled as if that value is written explicitly in the program:

    // The equivalent of the code above:
    enum drinks = "       Drinks       \n"
                  "====================\n"
                  " 1. Coffee\n"
                  " 2. Tea\n"
                  " 3. Hot chocolate\n";

For a function to be executed at compile time, it must appear in an expression that in fact is needed at compile time:

Initializing a static variable
Initializing an enum variable
Calculating the length of a fixed-length array
Calculating a template value argument

Clearly, it would not be possible to execute every function at compile time. For example, a function that accesses a global variable cannot be executed at compile time because the global variable does not start its life until run time. Similarly, since stdout is available only at run time, functions that print cannot be executed at compile time.

The `__ctfe` variable

It is a powerful aspect of CTFE that the same function is used for both compile time and run time depending on when its result is needed. Although the function need not be written in any special way for CTFE, some operations in the function may make sense only at compile time or run time. The special variable __ctfe can be used to differentiate the code that are only for compile time or only for run time. The value of this variable is true when the function is being executed for CTFE, false otherwise:

import std.stdio;

size_t counter;

int foo() {
    if (!__ctfe) {
        // This code is for execution at run time
        ++counter;
    }

    return 42;
}

void main() {
    enum i = foo();
    auto j = foo();
    writefln("foo is called %s times.", counter);
}

As counter lives only at run time, it cannot be incremented at compile time. For that reason, the code above attempts to increment it only for run-time execution. Since the value of i is determined at compile time and the value of j is determined at run time, foo() is reported to have been called just once during the execution of the program:

foo is called 1 times.

Summary

The return type of an auto function is deduced automatically.
The return value of a ref function is a reference to an existing variable.
The return value of an auto ref function is a reference if possible, a copy otherwise.
inout carries the const, immutable, or mutable attribute of the parameter to the return type.
A pure function cannot access mutable global or static state. The compiler infers the purity of templates, delegates, anonymous functions, and auto functions.
nothrow functions cannot emit exceptions. The compiler infers whether a template, delegate, anonymous function, or auto function is no-throw.
@nogc functions cannot involve GC operations.
@safe functions cannot corrupt memory. The compiler infers the safety attributes of templates, delegates, anonymous functions, and auto functions.
@trusted functions are indeed safe but cannot be specified as such; they are considered @safe both by the programmer and the compiler.
@system functions can use every D feature. @system is the default safety attribute.
Functions can be executed at compile time as well (CTFE). This can be differentiated by the value of the special variable __ctfe.

[ ↢ Prev ] [ Next ↣ ]

More Functions

Return type attributes

auto functions

ref functions

auto ref functions

inout functions

Behavioral attributes

pure functions

nothrow functions

@nogc functions