File Types

From COMP15212 Wiki
On path: Filing 1: Filing System • 2: File Systems • 3: Files • 4: File Attributes • 5: File Types • 6: File Permissions • 7: File Access • 9: Filing System Implementation • 10: I-nodes • 11: Links • 12: File Descriptor
Depends on FilesFile Attributes

The file-store holds a large number of files which represent numerous different things: i.e. the file contain different types of data. There are various ways in which the file type can be classified.

Types within the filing system

Unix, Windows etc. show various filenames in a directory (“folder”) which are not necessarily immediately distinguishable to the user as ‘regular’ files or subdirectories. This is, of course, an important distinction for the filing system itself and the information is kept within the file attributes.

For Unix files this is shown by (e.g.) ls -l:

 -rwx------   ...   file_1       A "regular file"
 drwx------   ...   file_2       A directory

You can also look at this in [Exercise_Files this exercise].

There are other items which might appear as a file of a different type in a Unix-like system – these are not always true files, just made to appear that way:

This does not distinguish an audio file from a photograph from a text document from a compiled program binary; in Unix all of these are “regular files”. Some filing systems do include more attributes – and this can be useful in knowing how to process a file – but, of course, there is an ever-growing list of data types which users want to store.

File browsers (and similar tools) often try to deduce what a file actually contains; there is a similar function implemented in the utility file \<filename\>, part of a man page of which says:

      file tests each argument in an attempt to classify it.  There are three
      sets of tests, performed in this order: filesystem tests, magic tests,
      and language tests.  The first test that succeeds causes the file type to
      be printed.
  • Filesystem tests use the attribute information available as above
  • “Magic” tests look for a “magic number” at a known place in the file.
    For example an ELF file – a common binary file format – begins with the byte 7F16 followed by “ELF”. This is a useful way to guess what a file type is – it is, of course, not 100% reliable.
  • Language tests are applied if a file has not been determined yet to see if it looks like text (which bytes are used, which are not) and then some keywords may be tried to see if it seems to be the source code for a particular compiler (for example). This, too, is not 100% reliable!

User types

It is not uncommon for users to imply the type of a file with its name, often with a suffix extension . For example, a ‘text’ file may have a name like file.txt. In some systems this extension was logically separate from the file-name and formed part of the metadata. On common modern systems (Unix, Windows) this is not the case and it is there purely for human convenience.