C++

Expression Category Taxonomy in C++

A computation is any type of calculation that follows a well-defined algorithm. An expression is a sequence of operators and operands that specifies a computation. In other words, an expression is an identifier or a literal, or a sequence of both, joined by operators.In programming, an expression can result in a value and/or cause some happening. When it results in a value, the expression is a glvalue, rvalue, lvalue, xvalue, or prvalue. Each of these categories is a set of expressions. Each set has a definition and particular situations where its meaning prevails, differentiating it from another set. Each set is called a value category.

Note: A value or literal is still an expression, so these terms classify expressions and not really values.

glvalue and rvalue are the two subsets from the big set expression. glvalue exists in two further subsets: lvalue and xvalue. rvalue, the other subset for expression, also exists in two further subsets: xvalue and prvalue. So, xvalue is a subset of both glvalue and rvalue: that is, xvalue is the intersection of both glvalue and rvalue. The following taxonomy diagram, taken from the C++ specification, illustrates the relationship of all the sets:

prvalue, xvalue, and lvalue are the primary category values. glvalue is the union of lvalues and xvalues, while rvalues are the union of xvalues and prvalues.

You need basic knowledge in C++ in order to understand this article; you also need knowledge of Scope in C++.

Article Content

Basics

To really understand the expression category taxonomy, you need to recall or know the following basic features first: location and object, storage and resource, initialization, identifier and reference, lvalue and rvalue references, pointer, free store, and re-using of a resource.

Location and Object

Consider the following declaration:

int ident;

This is a declaration that identifies a location in memory. A location is a particular set of consecutive bytes in memory. A location can consist of one byte, two bytes, four bytes, sixty-four bytes, etc. The location for an integer for a 32bit machine is four bytes. Also, the location can be identified by an identifier.

In the above declaration, the location does not have any content. It means that it does not have any value, as the content is the value. So, an identifier identifies a location (small continuous space). When the location is given a particular content, the identifier then identifies both the location and the content; that is, the identifier then identifies both the location and the value.

Consider the following statements:

int ident1 = 5;

int ident2 = 100;

Each of these statements is a declaration and a definition. The first identifier has the value (content) 5, and the second identifier has the value 100. In a 32bit machine, each of these locations is four bytes long. The first identifier identifies both a location and a value. The second identifier also identifies both.

An object is a named region of storage in memory. So, an object is either a location without a value or a location with a value.

Object Storage and Resource

The location for an object is also called the storage or resource of the object.

Initialization

Consider the following code segment:

int ident;

ident = 8;

The first line declares an identifier. This declaration provides a location (storage or resource) for an integer object, identifying it with the name, ident. The next line puts the value 8 (in bits) into the location identified by ident. The putting of this value is initialization.

The following statement defines a vector with content, {1, 2, 3, 4, 5}, identified by vtr:

std::vector vtr{1, 2, 3, 4, 5};

Here, the initialization with {1, 2, 3, 4, 5}is done in the same statement of the definition (declaration). The assignment operator is not used. The following statement defines an array with content {1, 2, 3, 4, 5}:

int arr[] = {1, 2, 3, 4, 5};

This time, an assignment operator has been used for the initialization.

Identifier and Reference

Consider the following code segment:

int ident = 4;

int& ref1 = ident;

int& ref2 = ident;

cout<< ident <<' '<< ref1 <<' '<< ref2 << '\n';

The output is:

4 4 4

ident is an identifier, while ref1 and ref2 are references; they reference the same location. A reference is a synonym to an identifier. Conventionally, ref1 and ref2 are different names of one object, while ident is the identifier of the same object. However, ident can still be called the name of the object, which means, ident, ref1, and ref2 name the same location.

The main difference between an identifier and a reference is that, when passed as an argument to a function, if passed by identifier, a copy is made for the identifier in the function, while if passed by reference, the same location is used within the function. So, passing by identifier ends up with two locations, while passing by reference ends up with the same one location.

lvalue Reference and rvalue Reference

The normal way to create a reference is as follows:

int ident;

ident = 4;

int& ref = ident;

The storage (resource) is located and identified first (with a name such as ident), and then a reference (with a name such as a ref) is made. When passing as an argument to a function, a copy of the identifier will be made in the function, while for the case of a reference, the original location will be used (referred to) in the function.

Today, it is possible to just have a reference without identifying it. This means that it is possible to create a reference first without having an identifier for the location. This uses &&, as shown in the following statement:

int&& ref = 4;

Here, there is no preceding identification. To access the value of the object, simply use ref as you would use the ident above.

With the && declaration, there is no possibility of passing an argument to a function by identifier. The only choice is to pass by reference. In this case, there is only one location used within the function and not the second copied location as with an identifier.

A reference declaration with & is called lvalue reference. A reference declaration with && is called rvalue reference, which is also a prvalue reference (see below).

Pointer

Consider the following code:

int ptdInt = 5;

int *ptrInt;

ptrInt = &ptdInt;

cout<< *ptrInt <<'\n';

The output is 5.

Here, ptdInt is an identifier like the ident above. There are two objects (locations) here instead of one: the pointed object, ptdInt identified by ptdInt, and the pointer object, ptrInt identified by ptrInt. &ptdInt returns the address of the pointed object and puts it as the value in the pointer ptrInt object. To return (obtain) the value of the pointed object, use the identifier for the pointer object, as in “*ptrInt”.

Note: ptdInt is an identifier and not a reference, while the name, ref, mentioned previously, is a reference.

The second and third lines in the above code can be reduced to one line, leading to the following code:

int ptdInt = 5;

int *ptrInt = &ptdInt;

cout<< *ptrInt <<'\n';

Note: When a pointer is incremented, it points to the next location, which is not an addition of the value 1. When a pointer is decremented, it points to the previous location, which is not a subtraction of the value 1.

Free Store

An operating system allocates memory for each program that is running. A memory that is not allocated to any program is known as the free store. The expression that returns a location for an integer from the free store is:

new int

This returns a location for an integer that is not identified. The following code illustrates how to use the pointer with the free store:

int *ptrInt = new int;

*ptrInt = 12;

cout<< *ptrInt  <<'\n';

The output is 12.

To destroy the object, use the delete expression as follows:

delete ptrInt;

The argument to the delete expression is a pointer. The following code illustrates its use:

int *ptrInt = new int;

*ptrInt = 12;

delete ptrInt;

cout<< *ptrInt <<'\n';

The output is 0, and not anything like null or undefined. delete replaces the value for the location with the default value of the particular type of the location, then allows the location for re-use. The default value for an int location is 0.

Re-using a Resource

In expression category taxonomy, reusing a resource is the same as reusing a location or storage for an object. The following code illustrates how a location from free store can be reused:

int *ptrInt = new int;

*ptrInt = 12;

cout<< *ptrInt <<'\n';

delete ptrInt;

cout<< *ptrInt <<'\n';

*ptrInt = 24;

cout<< *ptrInt <<'\n';

The output is:

12

0

24

A value of 12 is first assigned to the unidentified location. Then the content of the location is deleted (in theory the object is deleted). The value of 24 is re-assigned to the same location.

The following program shows how an integer reference returned by a function is reused:

#include <iostream>

using namespace std;

int& fn()

{

int i = 5;

int& j = i;

return j;

}

int main()

{

int& myInt = fn();

cout<< myInt <<'\n';

myInt = 17;

cout<< myInt <<'\n';

return 0;

}

The output is:

5

17

An object such as i, declared in a local scope (function scope), ceases to exist at the end of the local scope. However, the function fn() above, returns the reference of i. Through this returned reference, the name, myInt in the main() function, reuses the location identified by i for the value 17.

lvalue

An lvalue is an expression whose evaluation determines the identity of an object, bit-field, or function. The identity is an official identity like ident above, or an lvalue reference name, a pointer, or the name of a function. Consider the following code which works:

int myInt = 512;

int& myRef = myInt;

int* ptr = &myInt;

int fn()

{

++ptr; --ptr;

return myInt;

}

Here, myInt is an lvalue; myRef is an lvalue reference expression; *ptr is an lvalue expression because its result is identifiable with ptr; ++ptr or –ptr is an lvalue expression because its result is identifiable with the new state (address) of ptr, and fn is an lvalue (expression).

Consider the following code segment:

int a = 2, b = 8;

int c = a + 16 + b + 64;

In the second statement, the location for ‘a’ has 2 and is identifiable by ‘a’, and so is an lvalue. The location for b has 8 and is identifiable by b, and so is an lvalue. The location for c will have the sum, and is identifiable by c, and so is an lvalue. In the second statement, the expressions or values of 16 and 64 are rvalues (see below).

Consider the following code segment:

char seq[5];

seq[0]='l', seq[1]='o', seq[2]='v', seq[3]='e', seq[4]='\0';

cout<< seq[2] <<'\n';

The output is ‘v’;

seq is an array. The location for ‘v’ or any similar value in the array is identified by seq[i], where i is an index. So, the expression, seq[i], is an lvalue expression. seq, which is the identifier for the whole array, is also an lvalue.

prvalue

A prvalue is an expression whose evaluation initializes an object or a bit-field or computes the value of the operand of an operator, as specified by the context in which it appears.

In the statement,

int myInt = 256;

256 is a prvalue (prvalue expression) that initializes the object identified by myInt. This object is not referenced.

In the statement,

int&& ref = 4;

4 is a prvalue (prvalue expression) that initializes the object referenced by ref. This object is not identified officially. ref is an example of an rvalue reference expression or prvalue reference expression; it is a name, but not an official identifier.

Consider the following code segment:

int ident;

ident = 6;

int& ref = ident;

6 is a prvalue that initializes the object identified by ident; the object is also referenced by ref. Here, the ref is an lvalue reference and not a prvalue reference.

Consider the following code segment:

int a = 2, b = 8;

int c = a + 15 + b + 63;

15 and 63 are each a constant that computes to itself, producing an operand (in bits) for the addition operator. So, 15 or 63 is a prvalue expression.

Any literal, except the string literal, is a prvalue (i.e., a prvalue expression). So, a literal such as 58 or 58.53, or true or false, is a prvalue. A literal can be used to initialize an object or would compute to itself (into some other form in bits) as the value of an operand for an operator. In the above code, the literal 2 initializes the object, a. It also computes itself as an operand for the assignment operator.

Why is a string literal not a prvalue? Consider the following code:

char str[] = "love not hate";

cout << str <<'\n';

cout << str[5] <<'\n';

The output is:

love not hate

n

str identifies the whole string. So, the expression, str, and not what it identifies, is an lvalue. Each character in the string can be identified by str[i], where i is an index. The expression, str[5], and not the character it identifies, is an lvalue. The string literal is an lvalue and not a prvalue.

In the following statement, an array literal initializes the object, arr:

ptrInt++ or  ptrInt-- 

Here, ptrInt is a pointer to an integer location. The whole expression, and not the final value of the location it points to, is a prvalue (expression). This is because the expression, ptrInt++ or ptrInt–, identifies the original first value of its location and not the second final value of the same location. On the other-hand, –ptrInt or  –ptrInt is an lvalue because it identifies the only value of the interest in the location. Another way of looking at it is that the original value computes the second final value.

In the second statement of the following code, a or b can still be considered as a prvalue:

int a = 2, b = 8;

int c = a + 15 + b + 63;

So, a or b in the second statement is an lvalue because it identifies an object. It is also a prvalue since it computes to the integer of an operand for the addition operator.

(new int), and not the location it establishes is a prvalue. In the following statement, the return address of the location is assigned to a pointer object:

int *ptrInt = new int

Here, *ptrInt is an lvalue, while (new int) is a prvalue. Remember, an lvalue or a prvalue is an expression. (new int) does not identify any object. Returning the address does not mean identifying the object with a name (such as ident, above). In *ptrInt, the name, ptrInt, is what really identifies the object, so *ptrInt is an lvalue. On the other hand, (new int) is a prvalue, as it computes a new location to an address of operand value for the assignment operator =.

xvalue

Today, lvalue stands for Location Value; prvalue stands for “pure” rvalue (see what rvalue stands for below). Today, xvalue stands for “eXpiring” lvalue.

The definition of xvalue, quoted from the C++ specification, is as follows:

“An xvalue is a glvalue that denotes an object or bit-field whose resources can be reused (usually because it is near the end of its lifetime). [Example: Certain kinds of expressions involving rvalue references yield xvalues, such as a call to a function whose return type is an rvalue reference or a cast to an rvalue reference type— end example]”

What this means is that both lvalue and prvalue can expire. The following code (copied from above) shows how the storage (resource) of the lvalue, *ptrInt is re-used after it has been deleted.

int *ptrInt = new int;

*ptrInt = 12;

cout<< *ptrInt <<'\n';

delete ptrInt;

cout<< *ptrInt <<'\n';

*ptrInt = 24;

cout<< *ptrInt <<'\n';

The output is:

12

0

24

The following program (copied from above) shows how the storage of an integer reference, which is an lvalue reference returned by a function, is reused in the main() function:

#include <iostream>

using namespace std;

int& fn()

{

int i = 5;

int& j = i;

return j;

}

int main()

{

int& myInt = fn();

cout<< myInt <<'\n';

myInt = 17;

cout<< myInt <<'\n';

return 0;

}

The output is:

5

17

When an object such as i in the fn() function goes out of scope, it naturally is destroyed. In this case, the storage of i has still been reused in the main() function.

The above two code samples illustrate the re-use of the storage of lvalues. It is possible to have a storage re-use of prvalues (rvalues) (see later).

The following quote concerning xvalue is from the C++ specification:

“In general, the effect of this rule is that named rvalue references are treated as lvalues and unnamed rvalue references to objects are treated as xvalues. rvalue references to functions are treated as lvalues whether named or not.” (see later).

So, an xvalue is an lvalue or a prvalue whose resources (storage) can be reused. xvalues is the intersection set of lvalues and prvalues.

There is more to xvalue than what has been addressed in this article. However, xvalue deserves a whole article on its own, and so the extra specifications for xvalue are not addressed in this article.

Expression Category Taxonomy Set

Another quotation from the C++ specification:

Note: Historically, lvalues and rvalues were so-called because they could appear on the left- and right-hand side of an assignment (although this is no longer generally true); glvalues are “generalized” lvalues, prvalues are “pure” rvalues, and xvalues are “eXpiring” lvalues. Despite their names, these terms classify expressions, not values. — end note”

So, glvalues is the union set of lvalues and xvalues and rvalues are the union set of xvalues and prvalues. xvalues is the intersection set of lvalues and prvalues.

As of now, the expression category taxonomy is better illustrated with a Venn diagram as follows:

Conclusion

An lvalue is an expression whose evaluation determines the identity of an object, bit-field, or function.

A prvalue is an expression whose evaluation initializes an object or a bit-field or computes the value of the operand of an operator, as specified by the context in which it appears.

An xvalue is an lvalue or a prvalue, with the additional property that its resources (storage) can be reused.

The C++ specification illustrates expression category taxonomy with a tree diagram, indicating that there is some hierarchy in the taxonomy. As of now, there is no hierarchy in the taxonomy, so a Venn diagram is used by some authors, as it illustrates the taxonomy better than the tree diagram.

About the author

Chrysanthus Forcha

Discoverer of mathematics Integration from First Principles and related series. Master’s Degree in Technical Education, specializing in Electronics and Computer Software. BSc Electronics. I also have knowledge and experience at the Master’s level in Computing and Telecommunications. Out of 20,000 writers, I was the 37th best writer at devarticles.com. I have been working in these fields for more than 10 years.