What happens when you type “ls l” in the Linux shell?

Connor Brereton
6 min readNov 20, 2018

Authors: Bennett Dixon and Connor Brereton

What exactly is a shell in UNIX systems?

All computers rely on an operating system to get tasks done and in UNIX humans interact with the operating system with something called a CLI (command line interface). A command line interface is a textual user interface that interacts with an operating system’s Kernel (in this case the Linux Kernel). Take a look at the diagram below as we dive into more details on the inner workings of the Linux Shell architecture.

Source: http://hpssociety.info/news/shell-in-linux.html
  • In Linux, the shell uses the POSIX API to communicate with the Kernel.
  • The Kernel communicates directly with a computers hardware.

Much like mobile apps, a shell is essentially an infinite loop that runs until it is interrupted. On mobile devices these interruptions are obvious (phone call, battery dying, etc) but on the shell it’s less obvious on how a processes gets interrupted. Shells are interrupted by exit (sending a kill signal to the process) or EOF (end of file) aka ^D.

Why is the `-l` flag useful?

The -l flag lists the current directory in long format. That means that it (the terminals output) shows all of the extra detail of the files in your current directory that users usually don’t care about.

Some examples of additional information showed include:

  • permissions of a given file
  • number of hard links
  • owner (name)
  • group (name)
  • file size (in bytes)
  • date/time of last modification
  • filename
Source: https://www.guru99.com/must-know-linux-commands.html

How does `ls` actually work?

Step 1: The shell is prompted.

Step 2: The user enters some command(s) which are read to standard input using the system call getline() that reads an entire line from standard input up to the \n character to a buffer. This \n character is inserted into standard input when you press the RETURN key.

Step 3: Next, the line read to the buffer is tokenized which is just a complex way of saying that the words in the string are broken up into “real words” using the strtok() function. This is done by counting how many words (separated by given delimiters) are in the input string, how long each substring is (the word separated by a space), and traversing the input stream and copying words (separated by the delimiter) into newly allocated memory. The end result is a newly allocated array of pointers to newly allocated character arrays containing the word, terminated by a null byte.

For example, running strtok()on the string echo hi && echo everyone and setting our delimiter to be & would be broken up into

|e|c|h|o|SPACE|h|i|\0|

and

|&|SPACE|e|c|h|o|SPACE|e|v|e|r|y|o|n|e|\0|

The \0 (null byte) is added at the end of the string to create two separate commands that will later be executed by the execvp system call. You would use the & character for features in the shell relating to logical operators (think || and &&).

The best way (read: efficient) to store these strings in “the same memory” so that it can be accessed by functions is by using a double pointer. You would have ptr1 point to the array of pointers where each pointer points to each “substring” and ptr2 point to each “substring” in the overall memory. Like this:

Source: https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2010/hh184278(v=vs.88)

Step 4: The shell will check if the first token is an alias. What this means is it checks if the command (read: user inputted string) is an identifying string for a command or set of commands that the operating system can launch.

Here is an example of creating an alias where the ls -l command we’ve been using throughout the post is being attached to the alias command lll which has no meaning prior to us creating an alias for it.

`lll` is an alias we created

Step 5: At this step the shell will know that the token is not an alias and the next operation to run is to check if the command is a “built in” command. Built in commands are commands that are built into the Linux operating system. These are commands such as cd exit and read.

`cd` is a built in command

Step 6: At this point the string (user entered command) is not an alias and is not a built-in so therefore it must be a custom command. When Linux identifies that it is a custom command the following steps are taken:

  • the PATH is passed to the tokenizer strtok() by using the : delimiter to break up the path into tokens.

Input:

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games

Output (tokenized):

/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin
/usr/games
/usr/local/games

Output (command appended):

/usr/local/sbin/ls
/usr/local/bin/ls
/usr/sbin/ls
/usr/bin/ls
/sbin/ls
/bin/ls
/usr/games/ls
/usr/local/games/ls

Next, the new string is passed into the access() system call like so:

const char *foo = “/usr/local/sbin/ls"

access(foo, X_OK)

where the X_OK flag does the following:

  • check if the executable exists
  • check for RWX (read, write, execute) permissions, respectively

if access() returns 0 it means that the check was successful, but if it returns -1 it will signal that it failed (command does not exist) and will set the errno appropriately.

Step 7: If the executable is found the process is forked by fork(), creating a child process (the executable being the parent process so we duplicate them). These processes will run concurrently and each of them will have a separate PID (process id). The diagram below does an excellent job visualizing what is going on at the operating system level.

Source: http://www.it.uu.se/education/course/homepage/os/vt18/module-2/process-management/
  • The parent uses the wait() process to wait for the child() process to finish
  • The child() process uses the execve() function to complete the process and tell the parent process when to start

Here is the flow described in terms that we have been discussing:

  • > ls -l → executable /bin/ls → tokenize {“/bin/ls”, “-l”, “NULL”} → (see below)
Result of `ls -l`

Step 8: Finally, the shell runs the end of the program procedures that the engineer developed into it: (i) free stack and heap memory (ii) recalls the main function (iii) re-prompts the symbol in the start_shell() file

And there you have it, you just learned a lot about operating systems and how they work on commands entered into the terminal. For more information on operating systems I highly recommend checking out this repository that provides access to many universities notes on this subject and many more!

--

--