Metadata is data about data. The apparent size is the size of the data that matters to the ordinary user of the computer. The content (text) of the user’s letters, images, videos, etc., constitute the apparent size. This data is not kept arbitrarily in the computer. The apparent-sized data must be kept in a control fashion. It must be identifiable. It must be complete. There are other requirements for it. Some extra data is needed to achieve these extra requirements, and this extra data is the meta-data.
Remember, there is only one directory in a volume. The rest are sub-directories. The root directory gives rise to sub-directories, which give rise to other sub-directories going down. However, sub-directories are usually simply called directories. And so, there is only one directory tree.
So, “ls -s” is not useful to obtain the size of a directory. Which command then is useful? – The du command. “du” stands for Disk Usage. It prints the disk usage of the directory.
This article explains the different features of the du command in Linux, which provides the different programmer ways to know the sizes of the directories and their sub-directories. Bash is the shell used for the code samples of this article.
- du without Option or Argument
- Size of Other Directories
- The sudo Command
- Excluding Entries by Size
du without Option or Argument
The current working directory is the directory the user is currently working in. The prompt normally shows the current working directory. Typing du without any option and argument, like so:
and then pressing the Enter key will display the disk usage for all the sub-directories of the current working directory. It will display this information for the sub-tree of the current working directory. A dot at the display represents the current working directory.
Each path of the sub-tree is represented in a line, at the display. Each line begins with the size of the directory (which is last name in the path). The display may be something like:
Notice that it is not clear if the size is in bytes or kilobytes or megabytes or gigabytes. Kilobytes of symbol, K means 1024 bytes; Megabytes of symbol, M means 1,048,576 bytes; Gigabytes of symbol, G means 1,073,741,824 bytes. For the multiples to be indicated, the -h option (switch) should be used, as follows:
The display would then look like so:
When the -h option is used, the sizes are said to be in a readable form.
Note: with the –all option, the du command will also give disk usage for files; however, disk usage for files will not be addressed in this article.
Size of Other Directories
A typical absolute path for a Linux volume is as follows:
The first / is the root directory. This directory has immediate sub-directories, including the home directory. The home directory has the directory of the user. If the user’s name is John, then he can name the user’s directory, john. The user’s directory is identified by ~. So, the user can use the command “cd ~” to reach his directory from any directory. dirOne is a directory created by the user. The user can also create other directories at this level. dirTwo, dirThree and dirFour are sub-directories to their previous directories, created by the user.
The user can know the size of any other directory and its subdirectories (sub-tree) from any directory, bypassing the absolute path as an argument. For example, if disk usage is needed,
then the command would be:
where ~ represents the user’s directory.
To use a relative path, the user must already be in the corresponding parent directory. For example, if the prompt is showing,
meaning the user is at the directory, /home/john/dirOne, then the following command will give the same result as the above command:
The paths would still be relative. To display the same information for the current directory, use no argument, or use the dot.
This scheme can be used to get the size of only one directory, the last in a path (preceded by the path). It is possible to get the size of a directory in the middle of a path – see “exclude=PATTERN” below.
A grand total size of all the directories involved can be produced. For the above situation, the command would be:
The apparent size is usually smaller than the disk usage. However, in some situations, the apparent size is bigger than the disk usage; reason – see later. The command to obtain the apparent sizes for the relative path above would be:
With –max-depth=0, du prints the size of only the current working directory; with du –max-depth=1, du prints the size of the current working directory and the sizes of all its first level sub-directories; with –max-depth=2, du prints the size of the current working directory and the sizes of all its first level sub-directories, and of all its second level sub-directories; with –max-depth=3, du prints the size of the current working directory and all its first level sub-directories, and all its second level sub-directories, and of all its third level sub-directories; this continues with increasing value of max-depth. An example of its use is:
The sudo Command
One of the directories with its own sub-directories in the root directory is var. If the user types
and presses Enter, he will realize that permission is denied for some directories. That is, he cannot know the sizes of some directories. The permission is denied because the user is not the superuser. The superuser has the privilege to see the sizes (disk usage) of those directories. So, for the user to acquire that privilege, he has to use the sudo command as follows:
If the shell asks the user for his password, the user must type in the password and press Enter. With the sudo command, the ordinary user (programmer) can see the sizes of all directories in the var directory and similar directories.
Excluding Entries by Size
The “–threshold=SIZE” option will not allow listing directories whose sizes are less than SIZE. For the path,
with the prompt at “[email protected]:~$,” then
where 12K means 12 kilobytes, will not display the line for any directory whose disk usage is less than 12K.
This option and value can omit directory lines that the user does not want in the listing.
To omit the line for the last directory, dirFour of the path
the command should be:
The result will be something like,
Note: the sizes have not included the size of the last level directory (dirFour) of the path.
To have only the sizes of the upper depth directories and their sub-trees, do not have the lower depth directories in the option. So with the command,
the output will be something like,
Note: the sizes have excluded the sizes of the lower lever directories of the tree.
Consider again, the absolute link,
The following command will obtain the disk usage of only the dirTwo directory, which is a directory within the path. The command is:
The argument has all the preceding directories down to the one in question. The value of excluding has all the preceding directories, ending with *, just after the one in question. * means all sub-directories at that level (and their sub-trees). The result will be something like,
Trying to know the size of a directory with the “ls -s” command is misleading. With it, only the meta-data of the directory is obtained. To know the disk usage of a directory, the du command should be used. When used with the -h option, the sizes of the directories are readable. The apparent size can be obtained using the apparent-size option. Without any option and argument, the du command just displays the sizes of all the subdirectories of the current directory, including that of the current directory. The argument to “du” is the path, which may begin from the root. Options and some of their values decide exactly which directories are addressed. The sudo command gives the ordinary user superuser privileges by default.