Everything is a File
"On a UNIX system, everything is a file; if something is not a file, it is a process."
"Everything is a file" is a core design principle of Unix (and hence Linux) architecture. It can be taken to mean two things
Everything is a stream of bytes
Everything is a stream of bytes you can read and/or write to. Not just documents but also storage devices like a CD-ROM in fact almost any device. This is in keeping with Unix philosophy which favors simplicity, universality and plain text. This means that you can treat devices and other parts of the system just like you would any other text file. To demonstrate this plug in an external mouse and type
sudo cat /dev/input/mice
in the terminal and move your mouse around. It might become clearer if you pipe the output through hexdump
sudo cat /dev/input/mice | hexdump
It can be even said that Unix even treats the user as file. At the command line user interaction consists of reading and inputting a stream of text. More specifically Unix is centered around three standard input/output streams: standard input (stdin
), standard output (stdout
) and standard error (stderr
).
Most files in Unix are regular files: they contain regular data e.g. text files, images, executables or program input/output. There are exceptions however. These are things that can be said to be "more than files", that is things that have special properties although they can also be treated as regular files e.g. viewed with ls
. They are:
- Directories: lists of other files.
- Special files: most of these are in /dev/. They are often used for input and output for example
/dev/input/mice
and there are also special files that have a particular purpose e.g./dev/random
exists as a source of randomness and/dev/full
exists so programmers can testo see what happens if a disk is full. (tryecho "I ate too many donuts" > /dev/full
. If you are on a diet, just route it to/dev/null
instead). - Links: files that are a link to another file (or directory). These allow them to be seen elsewhere or under a different name. Most often we talk about and use symlinks, these have the advantage that if you delete a symlink the original doesn't also get deleted.
- Unix Domain sockets: These enable two processes on the same computer to communicate (especially client server relations). They are similar to named pipes but are full duplex (i.e. information can flow both ways) and allow for datagrams a --- discrete packets of data. This means that more than one client can connect to a server for example. They are also connectionless --- programs can communicate without have to keep the connection open. Despite the name they have very little to do with networking. Because they are treated as files security can be implemented by using standard file permissions.
- Named Pipes (a.k.a. FIFO pipes): These also allow for inter-process communication like domain sockets. They are more limited but simpler to deal with. Much like an actual water pipe they allow a one way flow fron one place to another. Two processes can access a pipe --- one to write to it, one to read it. They are explicity created using the
mkfifo
command.
mkfifo named_pipe gzip -c < named_pipe > out.gz cat my_file > named_pipe rm named_pipe # deletes pipe as anyother file
Note here gzip is only reading from the pipe, it is writing to the file out.gz
.
Pipes only work when they are connected at both ends. This is a usefull property as you can feed the output of a command to a pipe and it will only execute when something is connected to the other end to read from it (and vice versa).
Unix File System as a Universal Name Space
Today the Unix File System name space is universally recognized, even it is not known as such, it forms the basis of a URL e.g. http://www.freegeek.org/volunteer/. Unix and Linux file systems tend to be alike (though there are exceptions e.g. Mac OS X). There is a well established, coherent layout for files that has been replicated across many different varieties of Unix and Linux (parts of it can even be seen on some mobile phones). This makes them easy to work on as they are familiar and do not tend to change from machine to machine. Having a clear, consistent, hierarchy of files and directories provides a mechanism for addressing a resource regardless of its nature. E.g. you can refer to a directory as /home/oem
, a file as /home/oem/example.pdf
, a hard disk as /dev/sda1
a mouse as /dev/mouse
etc. It also means that if you add a resource such as a hard drive it can be placed anywhere in the file hierarchy, not just at the root level. This is very flexible in contrast to sat, the drive letter system historically used by Windows where each drive appeared as a separate entity without a useful label. A hard drive to Unix appears transparent to the end user. For example you could mount a harddrive as /home/oem/Music
and it would act just like an ordinary directory.
See Linux File System for much more info about the layout of the file system and what you typically find there.