In this tutorial, we will discuss the concept of endianness in computer. Endian means byte ordering. Two types of endianness are present in computer: Big Indian and Little Indian. When the data is stored in memory, the bits are stored byte-by-byte in byte. Byte is a unit of memory data which consists of a group of 8 bits. The computer memory is byte-addressable, so only one byte can be present inside the memory location. When the data is only one byte, this byte is stored in a single memory location. But when the data is more than one byte, these bytes may be stored in memory location in two different ways.
Examples to Understand:
Two byte binary representation of 513 is 0000001000000001.
MSB | LSB |
00000010 | 00000001 |
Memory is byte addressable. One byte is stored in one memory location and this memory location has an address. If one byte is stored in the address “a”, the next byte is stored in the next address which is “a+1”, and so on. Now, the bytes may be stored in the memory from the left-most bytes to the right-most bytes or from the right-most bytes to the left-most bytes.
Here, the starting memory address is “a”. So, if the bytes are stored in the memory from the left-most bytes to the right-most bytes, the left most byte “00000010” is stored in the memory location “a” and the right-most byte “00000001” is stored in the memory location “a+1”.
a – 1 | |
a | 00000010 |
a + 1 | 00000001 |
a + 2 |
What Is Big Endian?
When the Most Significant Byte is present in the smallest memory location, the next byte is stored in the next memory address, and so on. This type of byte order is called “Big Endian”.
If the bytes are stored in the memory from the right-most bytes to the left-most bytes, the right most byte “00000001” is stored in the memory location “a” and the left-most byte “00000010” is stored in the memory location “a+1”.
a – 1 | |
a | 00000001 |
a + 1 | 00000010 |
a + 2 |
What Is Little Endian?
When the Least Significant Byte is stored in the smallest memory location, the previous byte is stored in the next memory address, and so on. This type of byte order is called “Little Endian”.
a – 1 | |
a | 00000010 |
a + 1 | 00000001 |
a + 2 |
We can check if a computer is Big Endian or Little Endian using the following C code examples:
Example 1:
Output:
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$ ./Example1
Machine Endianness => Little Endian
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$
Here, if the integer is two bytes, the binary representation of 1 is 00000000 00000001. We convert the pointer to a char pointer which is one byte long. So, if we access the first memory location and we get the value of 1, it means that the right-most byte is stored in the lowest memory location. It is then a Little Endian machine. Otherwise, it will be a Big Endian machine.
Example 2:
Output:
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$ ./Example2
Little Endian
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$
In this example, we use union. If 1 is stored in the 0th location of the array, the machine must be Little Endian. Otherwise, the machine will be Big Endian.
Problem in Endianness
A computer stores and retrieves the data in the same endianness in which the result is the same. The issue arises in endianness when the data transfers from one machine to another machine. If the two machines are in different byte sex, it means that one computer uses the Big Endian and another computer uses the Little Endian. When the data transfers from one to another, the actual problems come up. This issue is called NUXI problem: The “UNIX” string might look like “NUXI” on a machine with a different byte sex.
Another problem comes up when we use typecast in our program. For example: if we create a character array of arr[4] and typecast it to an int of size 4 byte, we will get different result on different endian machine. Let’s discuss it in the next example.
Example 3:
int main()
{
char arr[4] = {0x01, 0x00,0x00,0x00};
int x = *(int *)arr;
printf("0x%x\n", x);
return 0;
}
Output:
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$ ./Example3
0x1
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$
In this this example, “arr” is a character array. We typecast it to a 4 byte integer x. If we compile the program in a little endian machine, we get the value of 0x00000001. But if we compile the program in a big endian machine, we get the value of 0x01000000.
0x01 | 0x00 | 0x00 | 0x00 |
arr[0] | arr[1] | arr[2] | arr[3] |
Example 4:
int main()
{
char arr[4] = {0x00, 0x00,0x00,0x01};
int x = *(int *)arr;
printf("0x%x\n", x);
return 0;
}
Output:
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$ ./Example4
0x1000000
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$
This example is the same as in Example 3. Only change the value to understand it in simpler way. If we compile the program in a little endian machine, we get the value of 0x01000000. But if we compile the program in a big endian machine, we get the value of 0x00000001.
0x00 | 0x00 | 0x00 | 0x01 |
arr[0] | arr[1] | arr[2] | arr[3] |
When the data is transferred from one endian machine to another endian machine, we need to swap the data accordingly. Now, let’s see how to swap the data in the following examples.
Example 5:
In this example, we use the bitwise operation to swap the data. In bitwise operation, there is no effect of endianness.
#include <inttypes.h>
uint32_t ByteSwap(uint32_t value)
{
uint32_t result = 0;
result = result | (value & 0x000000FF) << 24;
result = result | (value & 0x0000FF00) << 8;
result = result | (value & 0x00FF0000) >> 8;
result = result | (value & 0xFF000000) >> 24;
return result;
}
int main()
{
uint32_t data = 0x44445555;
uint32_t resultData =0;
resultData = ByteSwap(data);
printf("Original Data => 0x%x\n", data);
printf("Converted Data => 0x%x\n", resultData);
return 0;
}
Output:
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$ ./Example3
Original Data => 0x44445555
Converted Data => 0x55554444
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$
Example 6:
In this example, we will do the same thing as we did in example 5. But here, we use Macros instead of function.
#include <inttypes.h>
#define CHANGE_ENDIANNESS(A) ((((uint32_t)(A) & 0xFF000000) >> 24) \
| (((uint32_t)(A) & 0x00FF0000) >> 8) \
| (((uint32_t)(A) & 0x0000FF00) << 8) \
| (((uint32_t)(A) & 0x000000FF) << 24))
int main(){
uint32_t data = 0x44445555;
uint32_t resultData =0;
resultData = CHANGE_ENDIANNESS(data);
printf("Original Data => 0x%x\n", data);
printf("Converted Data => 0x%x\n", resultData);
return 0;
}
Output:
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$ ./Example6
Original Data => 0x44445555
Converted Data => 0x55554444
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$
Example 7:
In this example, we will do the same thing as we did in the previous example. But here, we use union.
#include <inttypes.h>
typedef union
{
uint32_t RawData;
uint8_t DataBuff[4];
} RawData;
uint32_t CHANGE_ENDIANNESS(uint32_t Value)
{
RawData Change,Orginal;
Orginal.u32RawData = Value;
//change the value
Change.DataBuff[0] = Orginal.DataBuff[3];
Change.DataBuff[1] = Orginal.DataBuff[2];
Change.DataBuff[2] = Orginal.DataBuff[1];
Change.DataBuff[3] = Orginal.DataBuff[0];
return (Change.RawData);
}
int main(){
uint32_t data = 0x44445555;
uint32_t resultData =0;
resultData = CHANGE_ENDIANNESS(data);
printf("Original Data => 0x%x\n", data);
printf("Converted Data => 0x%x\n", resultData);
return 0;
}
Output:
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$ ./Example5
Original Data => 0x44445555
Converted Data => 0x55554444
somnath@somnath-VirtualBox:~/Desktop/c_prog/Endian$
When we send the data over the network, the data is sent in a common format. When data is sent over the network, we don’t know the endianness of the sender and receiver. So, we use a special order that is big endian. This order is called “Network Order”.
If the sender is a little endian machine, it converts the data to a big endian order before sending the data. If the receiver is a little endian machine, it converts the data to a little endian format after receiving the data.
When we send the data to the network, the following macros are used:
htons() – “Host to Network Short”
htonl() – “Host to Network Long”
ntohs() – “Network to Host Short”
ntohl() – “Network to Host Long”
htons() – This macro is read as “Host to Network Short”. A 16-bit unsigned value’s bytes should be rearranged like the following:
processor order -> network order.
htonl() – This macro is read as “Host to Network Long”. A 32-bit unsigned value’s bytes should be rearranged like the following:
processor order -> network order.
ntohs() – This macro is read as “Network to Host Short “. A 16-bit unsigned value’s bytes should be rearranged like the following:
network order -> processor order.
ntohl() – This macro is read as “Network to Host Long “. A 32-bit unsigned value’s bytes should be rearranged like the following:
network order -> processor order.
Conclusion
In this tutorial, we learned the details about endianness in computer. It appears that utilizing one endianness approach over the other has no advantage. They are both still utilized by various architectures. Majority of personal computers and laptops today employ the little-endian-based CPUs (and their clones), making the little endian as the predominant architecture for desktop computers.