Programming in D - Literals

Literals

Programs achieve their tasks by manipulating the values of variables and objects. They produce new values and new objects by using them with functions and operators.

Some values need not be produced during the execution of the program; they are instead written directly into the source code. For example, the floating point value 0.75 and the string value "Total price: " below are not calculated by the program:

    discountedPrice = actualPrice * 0.75;
    totalPrice += count * discountedPrice;
    writeln("Total price: ", totalPrice);

Such values that are directly typed into the source code are called literals. We have used many literals in the programs that we have written so far. We will cover all of the types of literals and their syntax rules.

Integer literals

Integer literals can be written in one of four ways: the decimal system that we use in our daily lives; the hexadecimal and binary systems, which are more suitable for certain computing tasks; and the octal system, which may be needed in very rare cases.

In order to make the code more readable, it is possible to insert _ characters anywhere after the first digit of integer literals. For example, we can use it to form groups of three digits, as in 1_234_567. Another example would be if we measured some value in cents of a currency, and used it to separate the currency units from the cents, as in 199_99. These characters are optional; they are ignored by the compiler.

In the decimal system: The literals are specified by the decimal numerals in exactly the same way as we are used to in our daily lives, such as 12. When using the decimal system in D the first digit cannot be 0. Such a leading zero digit is often used in other programming languages to indicate the octal system, so this constraint helps to prevent bugs that are caused by this easily overlooked difference. This does not preclude 0 on its own: 0 is zero.

In the hexadecimal system: The literals start with 0x or 0X and include the numerals of the hexadecimal system: "0123456789abcdef" and "ABCDEF" as in 0x12ab00fe.

In the octal system: The literals are specified using the octal template from the std.conv module and include the numerals of the octal system: "01234567" as in octal!576.

In the binary system: The literals start with 0b or 0B and include the numerals of the binary system: 0 and 1 as in 0b01100011.

The types of integer literals

Just like any other value, every literal is of a certain type. The types of literals are not specified explicitly as int, double, etc. The compiler infers the type from the value and syntax of the literal itself.

Although most of the time the types of literals are not important, sometimes the types may not match the expressions that they are used in. In such cases the type must be explicitly specified.

By default, integer literals are inferred to be of type int. When the value happens to be too large to be represented by an int, the compiler uses the following logic to decide on the type of the literal:

If the value of the literal does not fit an int and it is specified in the decimal system, then its type is long.
If the value of the literal does not fit an int and it is specified in any other system, then the type becomes the first of the following types that can accomodate the value: uint, long, and ulong.

To see this logic in action, let's try the following program that takes advantage of typeof and stringof:

import std.stdio;

void main() {
    writeln("\n--- these are written in decimal ---");

    // fits an int, so the type is int
    writeln(       2_147_483_647, "\t\t",
            typeof(2_147_483_647).stringof);

    // does not fit an int and is decimal, so the type is long
    writeln(       2_147_483_648, "\t\t",
            typeof(2_147_483_648).stringof);

    writeln("\n--- these are NOT written in decimal ---");

    // fits an int, so the type is int
    writeln(       0x7FFF_FFFF, "\t\t",
            typeof(0x7FFF_FFFF).stringof);

    // does not fit an int and is not decimal, so the type is uint
    writeln(       0x8000_0000, "\t\t",
            typeof(0x8000_0000).stringof);

    // does not fit a uint and is not decimal, so the type is long
    writeln(       0x1_0000_0000, "\t\t",
            typeof(0x1_0000_0000).stringof);

    // does not fit a long and is not decimal, so the type is ulong
    writeln(       0x8000_0000_0000_0000, "\t\t",
            typeof(0x8000_0000_0000_0000).stringof);
}

The output:

--- these are written in decimal ---
2147483647		int
2147483648		long

--- these are NOT written in decimal ---
2147483647		int
2147483648		uint
4294967296		long
9223372036854775808		ulong

The `L` suffix

Regardless of the magnitude of the value, if it ends with L as in 10L, the type is long.

The `U` suffix

If the literal ends with U as in 10U, then its type is an unsigned type. Lowercase u can also be used.

The L and U specifiers can be used together in any order. For example, 7UL and 8LU are both of type ulong.

Floating point literals

The floating point literals can be specified in either the decimal system, as in 1.234, or in the hexadecimal system, as in 0x9a.bc.

In the decimal system: An exponent may be appended after the character e or E, meaning "times 10 to the power of". For example, 3.4e5 means "3.4 times 10 to the power of 5", or 340000.

The - character typed before the value of the exponent changes the meaning to be "divided by 10 to the power of". For example, 7.8e-3 means "7.8 divided by 10 to the power of 3". A + character may also be specified before the value of the exponent, but it has no effect. For example, 5.6e2 and 5.6e+2 are the same.

In the hexadecimal system: The value starts with either 0x or 0X and the parts before and after the point are specified in the numerals of the hexadecimal system. Since e and E are valid numerals in this system, the exponent is specified by p or P.

Another difference is that the exponent does not mean "10 to the power of", but instead "2 to the power of". For example, the P4 part in 0xabc.defP4 means "2 to the power of 4".

Floating point literals almost always have a point but it may be omitted if an exponent is specified. For example, 2e3 is a floating point literal with the value 2000.

The value before the point may be omitted if zero. For example, .25 is a literal having the value "quarter".

The optional _ characters may be used with floating point literals as well, as in 1_000.5.

The types of floating point literals

Unless explicitly specified, the type of a floating point literal is double. The f and F specifiers mean float, and the L specifier means real. For example; 1.2 is double, 3.4f is float, and 5.6L is real.

Character literals

Character literals are specified within single quotes as in 'a', '\n', '\x21', etc.

As the character itself: The character may be typed directly by the keyboard or copied from a separate text: 'a', 'ş', etc.

As the character specifier: The character literal may be specified by a backslash character followed by a special character. For example, the backslash character itself can be specified by '\\'. The following character specifiers are accepted:

Syntax	Definition
\'	single quote
\"	double quote
\?	question mark
\\	backslash
\a	alert (bell sound on some terminals)
\b	delete character
\f	new page
\n	new-line
\r	carriage return
\t	tab
\v	vertical tab

As the extended ASCII character code: Character literals can also be specified by their codes. The codes can be specified either in the hexadecimal system or in the octal system. When using the hexadecimal system, the literal must start with \x and must use two digits for the code, and when using the octal system the literal must start with \ and have up to three digits. For example, the literals '\x21' and '\41' are both the exclamation point.

As the Unicode character code: When the literal is specified with u followed by 4 hexadecimal digits, then its type is wchar. When it is specified with U followed by 8 hexadecimal digits, then its type is dchar. For example, '\u011e' and '\U0000011e' are both the Ğ character, having the type wchar and dchar, respectively.

As named character entity: Characters that have entity names can be specified by that name using the HTML character entity syntax '\&name;'. D supports all character entities from HTML 5. For example, '\€' is €, '\&hearts;' is ♥, and '\©' is ©.

String literals

String literals are a combination of character literals and can be specified in a variety of ways.

Double-quoted string literals

The most common way of specifying string literals is by typing their characters within double quotes as in "hello". Individual characters of string literals follow the rules of character literals. For example, the literal "A4 ka\u011fıt: 3\½TL" is the same as "A4 kağıt: 3½TL".

Wysiwyg string literals

When string literals are specified using back-quotes, the individual characters of the string do not obey the special syntax rules of character literals. For example, the literal `c:\nurten` can be a directory name on the Windows operating system. If it were written using double quotes, the '\n' part would mean the new-line character:

    writeln(`c:\nurten`);
    writeln("c:\nurten");

c:\nurten  ← wysiwyg (what you see is what you get)
c:         ← the character literal is taken as new-line
urten

Wysiwyg string literals can alternatively be specified using double quotes but prepended with the r character: r"c:\nurten" is also a wysiwyg string literal.

Delimited string literals

The string literal may contain delimiters that are typed right inside the double quotes. These delimiters are not considered to be parts of the value of the literal. Delimited string literals start with a q before the opening double quote. For example, the value of q".hello." is "hello"; the dots are not parts of the value. As long as it ends with a new-line, the delimiter can have more than one character:

writeln(q"MY_DELIMITER
first line
second line
MY_DELIMITER");

MY_DELIMITER is not a part of the value:

first line
second line

Such a multi-line string literal including all the indentation is called a heredoc.

Token string literals

String literals that start with q and that use { and } as delimiters can contain only legal D source code:

    auto str = q{int number = 42; ++number;};
    writeln(str);

The output:

int number = 42; ++number;

This feature is particularly useful to help text editors display the contents of the string as syntax highlighted D code.

Types of string literals

By default the type of a string literal is immutable(char)[]. An appended c, w, or d character specifies the type of the string explicitly as immutable(char)[], immutable(wchar)[], or immutable(dchar)[], respectively. For example, the characters of "hello"d are of type immutable(dchar).

We have seen in the Strings chapter that these three string types are aliased as string, wstring, and dstring, respectively.

Literals are calculated at compile time

It is possible to specify literals as expressions. For example, instead of writing the total number of seconds in January as 2678400 or 2_678_400, it is possible to specify it by the terms that make up that value, namely 60 * 60 * 24 * 31. The multiplication operations in that expression do not affect the run-time speed of the program; the program is compiled as if 2678400 were written instead.

The same applies to string literals. For example, the concatenation operation in "hello " ~ "world" is executed at compile time, not at run time. The program is compiled as if the code contained the single string literal "hello world".

Exercises

The following line causes a compilation error:
```
    int amount = 10_000_000_000;    // ← compilation ERROR
```
Change the program so that the line can be compiled and that amount equals ten billions.
Write a program that increases the value of a variable and prints it continuously. Make the value always be printed on the same line, overwriting the previous value:
```
Number: 25774  ← always on the same line
```
A special character literal other than '\n' may be useful here.

... the solutions

[ ↢ Prev ] [ Next ↣ ]