Filing System: Difference between revisions

From COMP15212 Wiki
gravatar Yuron [userbureaucratinterface-adminsysopPHRhYmxlIGNsYXNzPSJ0d3BvcHVwIj48dHI+PHRkIGNsYXNzPSJ0d3BvcHVwLWVudHJ5dGl0bGUiPkdyb3Vwczo8L3RkPjx0ZD51c2VyPGJyIC8+YnVyZWF1Y3JhdDxiciAvPmludGVyZmFjZS1hZG1pbjxiciAvPnN5c29wPGJyIC8+PC90ZD48L3RyPjwvdGFibGU+] (talk | contribs)
m (1 revision imported)
gravatar E47796is [userPHRhYmxlIGNsYXNzPSJ0d3BvcHVwIj48dHI+PHRkIGNsYXNzPSJ0d3BvcHVwLWVudHJ5dGl0bGUiPkdyb3Vwczo8L3RkPjx0ZD51c2VyPGJyIC8+PC90ZD48L3RyPjwvdGFibGU+] (talk | contribs)
 
(2 intermediate revisions by 2 users not shown)
Line 3: Line 3:
-->{{#invoke:Dependencies|add|Resources,4|User,3}}
-->{{#invoke:Dependencies|add|Resources,4|User,3}}
<blockquote>
<blockquote>
You should be (at least) reasonably familiar with filing systems –
You should be (at least) reasonably familiar with filing systems – from a user point of view – by now.  Thus, this introduction will be quite cursory!
from a user point of view – by now.  Thus, this introduction will
be quite cursory!
</blockquote>
</blockquote>
A typical computer might have a number of ‘levels’ of
A typical computer might have a number of ‘levels’ of storage.  The <em>machine’s</em> (hardware) view is usually something like:
storage.  The <em>machine’s</em> (hardware) view is usually something like:


*Processor [[Extra:ISA|registers]]
*Processor [[Extra:ISA|registers]]
Line 26: Line 23:
*Internet
*Internet


These are listed from small-&-fast to big-&-slow.  There is also the
These are listed from small-&-fast to big-&-slow.  There is also the issue of data <em>persistence</em>; due largely to the technology (currently) employed, only file-store and the Internet (someone else’s file-store) retain data indefinitely.
issue of data <em>persistence</em>; due largely to the technology (currently)
employed, only file-store and the Internet (someone else’s file-store)
retain data indefinitely.


The memory is of (more or less) limited size, set by the machine
The memory is of (more or less) limited size, set by the machine architecture.  For example a 32-bit machine will typically have a limit of 4 GiB … each location with its own address.
architecture.  For example a 32-bit machine will typically have a
limit of 4 GiB … each location with its own address.


File-store is provided to supply the following requirements:
File-store is provided to supply the following requirements:
Line 42: Line 34:


=== Large capacity ===
=== Large capacity ===
Rather than storing bytes or words – each with an explicit address –
Rather than storing bytes or words – each with an explicit address – file-store can be larger because it stores ‘items’ as
file-store can be larger because it stores ‘items’ as
[[files]] – of indeterminate size – rather than as (lots of) single bytes.  The ‘address’ of an item is a <em>filename</em>
[[files]] – of indeterminate size – rather than as (lots of)
rather than a ‘numeric’ address.  In practice there is a limit on both the number of different files a particular filing system
single bytes.  The ‘address’ of an item is a <em>filename</em>
can handle and the maximum size of any one file but these are both typically large numbers.  The total data capacity of the latest systems is still larger than the highest capacity disk drives (although more than one physical drive may be used).
rather than a ‘numeric’ address.  In practice there is a
limit on both the number of different files a particular filing system
can handle and the maximum size of any one file but these are both
typically large numbers.  The total data capacity of the latest
systems is still larger than the highest capacity disk drives
(although more than one physical drive may be used).


For example: the [https://en.wikipedia.org/wiki/Ext4 Linux Ext4 system] (2008) will support up to
For example: the [https://en.wikipedia.org/wiki/Ext4 Linux Ext4 system] can support up to a million terabytes, compared with (for example) a [https://www.zdnet.com/article/worlds-largest-ssd-hits-100tb/ large SSD] (2018) holding 100 terabytes.  In 2019, 14TB hard drives can be obtained on Amazon for reasonable prices, with the best capacity-to-price being offered for 4-8 TB hard drives.
a million terabytes, compared with (for example) a [https://petapixel.com/2015/08/15/samsung-16tb-ssd-is-the-worlds-largest-hard-drive/ large SSD]
(2015) holding 16 terabytes.  In 2017 a typical, domestic HDD (Hard
Disk Drive) might hold about 1 TB (1000 MB).


To add some perspective:
To add some perspective:
Line 63: Line 46:
*Digital Versatile Disc (DVD): 5 GB - 17 GB (1995)
*Digital Versatile Disc (DVD): 5 GB - 17 GB (1995)
*Blu-ray: 25/50 GB (2006)
*Blu-ray: 25/50 GB (2006)
*USB Flash drive: ~1 TB (as of 2018)
*USB Flash drive: up to 2 TB, though typical sizes are around 128GB to 256GB (as of 2019)


<blockquote>
<blockquote>
The Internet is even bigger and <em>addresses</em> items via
The Internet is even bigger and <em>addresses</em> items via “URL”s, compound addresses specifying a server machine plus a <em>notional</em> file on that server.  It’s total capacity is a mystery but must be several exabytes (10<sup>18</sup> bytes)
“URL”s, compound addresses specifying a server machine
plus a <em>notional</em> file on that server.  It’s total capacity is a
mystery but must be several exabytes (10<sup>18</sup> bytes)
(2017) and growing; fortunately it is outside our scope here.
(2017) and growing; fortunately it is outside our scope here.
</blockquote>
</blockquote>


=== Persistence ===
=== Persistence ===
Most RAM technologies hold data only whilst continuously powered.
Most RAM technologies hold data only whilst continuously powered. This has various consequences in run-time [[Power_Management|power management]] but the major issue for most users is that the primary memory data is lost when the power goes off. Secondary storage is <em>persistent</em>.  This means it uses different technologies from the ‘main’ memory.
This has various consequences in run-time [[Power_Management|power
management]] but the major issue for most users is
that the primary memory data is lost when the power goes off.
Secondary storage is <em>persistent</em>.  This means it uses different
technologies from the ‘main’ memory.


The chief contemporary technologies are:
The chief contemporary technologies are:
Line 87: Line 62:
*Optical disc – e.g. DVD
*Optical disc – e.g. DVD


All of these technologies are <strong>much slower</strong> than the main memory and
All of these technologies are <strong>much slower</strong> than the main memory and favour access in <em>blocks</em> rather than true <em>random</em> access.
favour access in <em>blocks</em> rather than true <em>random</em> access.


Flash memory offers some compactness in relatively small storage
Flash memory offers some compactness in relatively small storage applications; it is semiconductor store but is significantly slower to read and much <em>much</em> slower to write to than the main RAM.  It also has ‘lifetime’ issues, typically only being guaranteed for a limited number of write operations in each ‘block’. This number – somewhere around the 10<sup>5</sup>-10<sup>6</sup> is satisfactory for many applications – e.g. SD cards – for file-store but not for other secondary applications such as supporting [[paging]].
applications; it is semiconductor store but is significantly slower to
read and much <em>much</em> slower to write to than the main RAM.  It also
has ‘lifetime’ issues, typically only being guaranteed for
a limited number of write operations in each ‘block’.
This number – somewhere around the 10<sup>5</sup>-10<sup>6</sup> is
satisfactory for many applications – e.g. SD cards – for file-store
but not for other secondary applications such as supporting
[[paging]].


=== Access rights ===
=== Access rights ===
In primary storage there may be [[threads]] and
In primary storage there may be [[threads]] and [[processes]].  Within a process, threads have common access to memory; processes are specifically isolated from each other with protection being enforced in hardware by an [[Memory Management Unit (MMU)|MMU]].
[[processes]].  Within a process, threads have common access
to memory; processes are specifically isolated from each other with
protection being enforced in hardware by an [[Memory Management Unit (MMU)|MMU]].


The philosophy in file-store organisation is typically different; as
The philosophy in file-store organisation is typically different; as files are intended to out-live processes it makes no sense for a particular process to ‘own’ a file.  Instead, files provide a means of [[Interprocess_Communication|communication]] both
files are intended to out-live processes it makes no sense for a
particular process to ‘own’ a file.  Instead, files
provide a means of [[Interprocess_Communication|communication]] both
through time and space.
through time and space.


At the same time, some [[security]] is important, especially
At the same time, some [[security]] is important, especially on shared systems.  The [[File_Permissions|access control systems]] for files are typically more sophisticated than for processes as they operate much less frequently than primary memory accesses and can be run in (operating system) software.
on shared systems.  The [[File_Permissions|access control systems]] for
files are typically more sophisticated than for processes as they
operate much less frequently than primary memory accesses and can be
run in (operating system) software.


=== Filing system ===
=== Filing system ===
The filing system lives between the user applications – which want to
The filing system lives between the user applications – which want to handle <strong>files</strong> and the disk (or other) [[Device_Drivers|device drivers]] which move blocks of data to and fro.  All processes should see the same files so this common software is clearly part of the operating system.  In a ‘layered’ model the filing system is quite a high-level O.S. layer.  In a [[Kernel|microkernel]] it may well be run (by the O.S.) in user mode.
handle <strong>files</strong> and the disk (or other) [[Device_Drivers|device drivers]] which move blocks of data to and fro.  All
processes should see the same files so this common software is clearly
part of the operating system.  In a ‘layered’ model the filing system
is quite a high-level O.S. layer.  In a [[Kernel|microkernel]] it may
well be run (by the O.S.) in user mode.


[[Image:file_system.png|link=|alt=File system]]
[[Image:file_system.png|link=|alt=File system]]


We can – to some extent – decouple the <em>structure</em> of a filing
We can – to some extent – decouple the <em>structure</em> of a filing system, as seen by users and outlined here, from the
system, as seen by users and outlined here, from the
[[Filing_System_Implementation|implementation]], which is the more ‘technical’ side.
[[Filing_System_Implementation|implementation]], which is the more
‘technical’ side.


=== File access ===
=== File access ===
The O.S. provides [[System_Calls|system calls]] for access to files:
The O.S. provides [[System_Calls|system calls]] for access to files: basically operations for reading and writing files … without harmful interactions from different client actions.
basically operations for reading and writing files … without harmful
interactions from different client actions.


[[File_Access|File access]] is developed further in another article.
[[File_Access|File access]] is developed further in another article.


=== Overview ===
=== Overview ===
The simplest file-stores are ‘flat’: i.e. there is one place where
The simplest file-stores are ‘flat’: i.e. there is one place where <em>all</em> the files are kept.  This soon becomes inconveniently crowded.
<em>all</em> the files are kept.  This soon becomes inconveniently crowded.


Another mechanism is to identify separate <em>devices</em>, such as
Another mechanism is to identify separate <em>devices</em>, such as “A:” or “C:” evolving through generations and
“A:” or “C:” evolving through generations and
probably ‘familiar’ these days from <strong>Windows</strong>.  In modern systems these may be <em>virtual</em> rather than physically separate devices.
probably ‘familiar’ these days from <strong>Windows</strong>.  In
modern systems these may be <em>virtual</em> rather than physically separate
devices.
<blockquote>
<blockquote>
Although these were satisfactory for <em>small</em> file-stores – maybe
Although these were satisfactory for <em>small</em> file-stores – maybe using interchangeable media – they don’t work well with <em>millions</em> of files.
using interchangeable media – they don’t work well with <em>millions</em>
of files.
</blockquote>
</blockquote>
A modern file-store is most likely to be <em>hierarchical</em> with a
A modern file-store is most likely to be <em>hierarchical</em> with a tree-like structure of arbitrary depth.  Each tree has a single ‘root’ which branches repeatedly.  <strong>Windows</strong> systems still retain separate “roots” for each (virtual) device though.
tree-like structure of arbitrary depth.  Each tree has a single
‘root’ which branches repeatedly.  <strong>Windows</strong> systems
still retain separate “roots” for each (virtual) device
though.


[[Image:file_tree_Windows.png|link=|alt=Windows file trees]]
[[Image:file_tree_Windows.png|link=|alt=Windows file trees]]


<strong>Unix</strong> (on the other hand) file-store
<strong>Unix</strong> (on the other hand) file-store [https://en.wikipedia.org/wiki/mount_(Unix) <em>mounts</em>] devices in a single ‘tree’, so different disks (and other stuff) appear as branches.
[https://en.wikipedia.org/wiki/mount_(Unix) <em>mounts</em>] devices in a single ‘tree’, so different disks (and other stuff) appear as branches.


[[Image:file_tree_Unix.png|link=|alt=Unix file tree]]
[[Image:file_tree_Unix.png|link=|alt=Unix file tree]]
Line 171: Line 107:


<blockquote>
<blockquote>
“Tree” is an expedient simplification for the moment.
“Tree” is an expedient simplification for the moment. When [[links]] are considered, the file-store can look like a
When [[links]] are considered, the file-store can look like a
directed graph.
directed graph.
</blockquote>
</blockquote>
In a Unix system, when file-store is distributed over an network
In a Unix system, when file-store is distributed over a network different machines may see the same files in a different structure. They may also have different [[File_Permissions|properties]]: for example we mount some systems as <em>read-only</em> on the student network which appear (possibly with a different path/name) as writeable on staff machines: this means every individual file does not need its permission settings checked.
different machines may see the same files in a different structure.
They may also have different [[File_Permissions|properties]]: for
example we mount some systems as <em>read-only</em> on the student network
which appear (possibly with a different path/name) as writeable on
staff machines: this means every individual file does not need its
permission settings checked.
----
----


===== A note on Unix directories =====
===== A note on Unix directories =====
Every Unix directory contains at least two files: these are
Every Unix directory contains at least two files: these are “<code>.</code>” and “<code>..</code>”.  Thus the parent directory
“<code>.</code>” and “<code>..</code>”.  Thus the parent directory
is explicitly specified.
is explicitly specified.


Line 193: Line 121:
Note <code>four</code> is a [[Links|symbolic link]].
Note <code>four</code> is a [[Links|symbolic link]].


Try setting up a structure as in the figure and <code>cd</code> to directory <code>two</code>.
Try setting up a structure as in the figure and <code>cd</code> to directory <code>two</code>. Then try <code>ls four/..</code>.
Then try <code>ls four/..</code>.


What did you expect?
What did you expect?
----
----
{{PageGraph}}
{{PageGraph}}
{{Category|Filing System}}
{{Category|Filing System}}

Latest revision as of 09:34, 25 May 2021

On path: Filing 1: Filing System • 2: File Systems • 3: Files • 4: File Attributes • 5: File Types • 6: File Permissions • 7: File Access • 9: Filing System Implementation • 10: I-nodes • 11: Links • 12: File Descriptor
Depends on ResourcesUser

You should be (at least) reasonably familiar with filing systems – from a user point of view – by now. Thus, this introduction will be quite cursory!

A typical computer might have a number of ‘levels’ of storage. The machine’s (hardware) view is usually something like:

  • Processor registers
  • Primary memory – chiefly RAM
    This probably has some internal cache hierarchy
  • Secondary storage – typically disks although there may be a variety of physical devices
  • Network – not really part of the computer but a source of data nevertheless

All memory

From the user’s perspective the view is simpler.

Memory hierarchy

  • Internal context (“registers” etc.)
    Only relevant for assembler programmers or compiler writers
  • Memory
  • Filestore
  • Internet

These are listed from small-&-fast to big-&-slow. There is also the issue of data persistence; due largely to the technology (currently) employed, only file-store and the Internet (someone else’s file-store) retain data indefinitely.

The memory is of (more or less) limited size, set by the machine architecture. For example a 32-bit machine will typically have a limit of 4 GiB … each location with its own address.

File-store is provided to supply the following requirements:

  • Large storage capacity
  • Data persistence
  • Inter-process communication

Large capacity

Rather than storing bytes or words – each with an explicit address – file-store can be larger because it stores ‘items’ as files – of indeterminate size – rather than as (lots of) single bytes. The ‘address’ of an item is a filename rather than a ‘numeric’ address. In practice there is a limit on both the number of different files a particular filing system can handle and the maximum size of any one file but these are both typically large numbers. The total data capacity of the latest systems is still larger than the highest capacity disk drives (although more than one physical drive may be used).

For example: the Linux Ext4 system can support up to a million terabytes, compared with (for example) a large SSD (2018) holding 100 terabytes. In 2019, 14TB hard drives can be obtained on Amazon for reasonable prices, with the best capacity-to-price being offered for 4-8 TB hard drives.

To add some perspective:

  • Compact Disc (CD): 700 MB (1982)
  • Digital Versatile Disc (DVD): 5 GB - 17 GB (1995)
  • Blu-ray: 25/50 GB (2006)
  • USB Flash drive: up to 2 TB, though typical sizes are around 128GB to 256GB (as of 2019)

The Internet is even bigger and addresses items via “URL”s, compound addresses specifying a server machine plus a notional file on that server. It’s total capacity is a mystery but must be several exabytes (1018 bytes) (2017) and growing; fortunately it is outside our scope here.

Persistence

Most RAM technologies hold data only whilst continuously powered. This has various consequences in run-time power management but the major issue for most users is that the primary memory data is lost when the power goes off. Secondary storage is persistent. This means it uses different technologies from the ‘main’ memory.

The chief contemporary technologies are:

  • Magnetic disk – Hard Disk Drive (HDD)
  • Flash memory – Solid State Drive (SSD)
  • Optical disc – e.g. DVD

All of these technologies are much slower than the main memory and favour access in blocks rather than true random access.

Flash memory offers some compactness in relatively small storage applications; it is semiconductor store but is significantly slower to read and much much slower to write to than the main RAM. It also has ‘lifetime’ issues, typically only being guaranteed for a limited number of write operations in each ‘block’. This number – somewhere around the 105-106 is satisfactory for many applications – e.g. SD cards – for file-store but not for other secondary applications such as supporting paging.

Access rights

In primary storage there may be threads and processes. Within a process, threads have common access to memory; processes are specifically isolated from each other with protection being enforced in hardware by an MMU.

The philosophy in file-store organisation is typically different; as files are intended to out-live processes it makes no sense for a particular process to ‘own’ a file. Instead, files provide a means of communication both through time and space.

At the same time, some security is important, especially on shared systems. The access control systems for files are typically more sophisticated than for processes as they operate much less frequently than primary memory accesses and can be run in (operating system) software.

Filing system

The filing system lives between the user applications – which want to handle files and the disk (or other) device drivers which move blocks of data to and fro. All processes should see the same files so this common software is clearly part of the operating system. In a ‘layered’ model the filing system is quite a high-level O.S. layer. In a microkernel it may well be run (by the O.S.) in user mode.

File system

We can – to some extent – decouple the structure of a filing system, as seen by users and outlined here, from the implementation, which is the more ‘technical’ side.

File access

The O.S. provides system calls for access to files: basically operations for reading and writing files … without harmful interactions from different client actions.

File access is developed further in another article.

Overview

The simplest file-stores are ‘flat’: i.e. there is one place where all the files are kept. This soon becomes inconveniently crowded.

Another mechanism is to identify separate devices, such as “A:” or “C:” evolving through generations and probably ‘familiar’ these days from Windows. In modern systems these may be virtual rather than physically separate devices.

Although these were satisfactory for small file-stores – maybe using interchangeable media – they don’t work well with millions of files.

A modern file-store is most likely to be hierarchical with a tree-like structure of arbitrary depth. Each tree has a single ‘root’ which branches repeatedly. Windows systems still retain separate “roots” for each (virtual) device though.

Windows file trees

Unix (on the other hand) file-store mounts devices in a single ‘tree’, so different disks (and other stuff) appear as branches.

Unix file tree

  • Branching points are directories – sometimes now called “folders” but we shall stick to “directories” which is more usual in the O.S. context.
    A directory can branch zero (empty) or more ways; there is no particular logical limit.
  • Terminal points on the tree are files, containing data.

“Tree” is an expedient simplification for the moment. When links are considered, the file-store can look like a directed graph.

In a Unix system, when file-store is distributed over a network different machines may see the same files in a different structure. They may also have different properties: for example we mount some systems as read-only on the student network which appear (possibly with a different path/name) as writeable on staff machines: this means every individual file does not need its permission settings checked.


A note on Unix directories

Every Unix directory contains at least two files: these are “.” and “..”. Thus the parent directory is explicitly specified.

Unix directories and pointers

Note four is a symbolic link.

Try setting up a structure as in the figure and cd to directory two. Then try ls four/...

What did you expect?