With all this in mind we are now ready to embark on the layout. I have based this on my own method developed when I got hold of 3 old SCSI disks and boggled over the possibilities.
The tables in the appendices are designed to simplify the mapping process. They have been designed to help you go through the process of optimizations as well as making an useful log in case of system repair. A few examples are also given.
Determine your needs and set up a list of all the parts of the file system you want to be on separate partitions and sort them in descending order of speed requirement and how much space you want to give each partition.
The table in Appendix A (section ) is a useful tool to select what directories you should put on different partitions. It is sorted in a logical order with space for your own additions and notes about mounting points and additional systems. It is therefore NOT sorted in order of speed, instead the speed requirements are indicated by bullets ('o').
If you plan to RAID make a note of the disks you want to use and what partitions you want to RAID. Remember various RAID solutions offers different speeds and degrees of reliability.
(Just to make it simple I'll assume we have a set of identical SCSI disks and no RAID)
Then we want to place the partitions onto physical disks. The point of the following algorithm is to maximise parallelizing and bus capacity. In this example the drives are A, B and C and the partitions are 987654321 where 9 is the partition with the highest speed requirement. Starting at one drive we 'meander' the partition line over and over the drives in this way:
A : 9 4 3
B : 8 5 2
C : 7 6 1
This makes the 'sum of speed requirements' the most equal across each drive.
Use the table in Appendix B (section ) to select what drives to use for each partition in order to optimize for parallelicity.
Note the speed characteristics of your drives and note each directory under the appropriate column. Be prepared to shuffle directories, partitions and drives around a few times before you are satisfied.
After that it is recommended to select partition numbering for each drive.
Use the table in Appendix C (section ) to select partition numbers in order to optimize for track characteristics. At the end of this you should have a table sorted in ascending partition number. Fill these numbers back into the tables in appendix A and B.
You will find these tables useful
when running the partitioning program (fdisk
or
cfdisk
) and when doing the installation.
After this there are usually a few partitions that have to be 'shuffled' over the drives either to make them fit or if there are special considerations regarding speed, reliability, special file systems etc. Nevertheless this gives what this author believes is a good starting point for the complete setup of the drives and the partitions. In the end it is actual use that will determine the real needs after we have made so many assumptions. After commencing operations one should assume a time comes when a repartitioning will be beneficial.
For instance if one of the 3 drives in the above mentioned example is very slow compared to the two others a better plan would be as follows:
A : 9 6 5
B : 8 7 4
C : 3 2 1
Often drives can be similar in apparent overall speed but some advantage can be gained by matching drives to the file size distribution and frequency of access. Thus binaries are suited to drives with fast access that offer command queueing, and libraries are better suited to drives with larger transfer speeds where IDE offers good performance for the money.
Avoid drive contention by looking at tasks: for instance if you are
accessing /usr/local/bin
chances are you will soon also need files
from /usr/local/lib
so placing these at separate drives allows less
seeking and possible parallel operation and drive caching. It is
quite possible that choosing what may appear less than ideal drive
characteristics will still be advantageous if you can gain parallel
operations. Identify common tasks, what partitions they use and try
to keep these on separate physical drives.
Just to illustrate my point I will give a few examples of task analysis here.
such as editing, word processing and spreadsheets are typical examples of low intensity software both in terms of CPU and disk intensity. However, should you have a single server for a huge number of users you should not forget that most such software have auto save facilities which cause extra traffic, usually on the home directories. Splitting users over several drives would reduce contention.
readers also feature auto save features on home directories so ISPs should consider separating home directories
News spools are notorious for their deeply nested directories and
their large number of very small files. Loss of a news spool
partition is not a big problem for most people, too, so they are good
candidates for a RAID 0 setup with many small disks to distribute
the many seeks among multiple spindles. It is recommended in the
manuals and FAQs for the INN news server to put news spool
and .overview
files on separate drives for larger installations.
There is also a web page dedicated to INN optimising well worth reading.
applications can be demanding both in terms of drive usage and speed requirements. The details are naturally application specific, read the documentation carefully with disk requirements in mind. Also consider RAID both for performance and reliability.
reading and sending involves home directories as well as in- and outgoing spool files. If possible keep home directories and spool files on separate drives. If you are a mail server or a mail hub consider putting in- and outgoing spool directories on separate drives.
Losing mail is an extremely bad thing, if you are and ISP or major hub. Think about RAIDing your mail spool and consider frequent backups.
can require a large number of directories
for binaries, libraries, include files as well as source and project
files. If possible split as much as possible across separate
drives. On small systems you can place /usr/src
and project files on
the same drive as the home directories.
is becoming more and more popular. Many browsers have a local cache which can expand to rather large volumes. As this is used when reloading pages or returning to the previous page, speed is quite important here. If however you are connected via a well configured proxy server you do not need more than typically a few megabytes per user for a session. See also the sections on Home Directories and WWW.
When you get a box of 10 or so CD-ROMs with a Linux distribution and the entire contents of the big FTP sites it can be tempting to install as much as your drives can take. Soon, however, one would find that this leaves little room to grow and that it is easy to bite over more than can be chewed, at least in polite company. Therefore I will make a few comments on a few points to keep in mind when you plan out your system. Comments here are actively sought.
Linux is simple and you don't even need a hard disk to try it out, if you can get the boot floppies to work you are likely to get it to work on your hardware. If the standard kernel does not work for you, do not forget that often there can be special boot disk versions available for unusual hardware combinations that can solve your initial problems until you can compile your own kernel.
about operating system is something Linux excels in, there is plenty of documentation and the source is available. A single drive with 50 MB is enough to get you started with a shell, a few of the most frequently used commands and utilities.
use or more serious learning requires more commands and utilities but a single drive is still all it takes, 500 MB should give you plenty of room, also for sources and documentation.
software development or just serious hobby work requires even more space. At this stage you have probably a mail and news feed that requires spool files and plenty of space. Separate drives for various tasks will begin to show a benefit. At this stage you have probably already gotten hold of a few drives too. Drive requirements gets harder to estimate but I would expect 2-4 GB to be plenty, even for a small server.
come in many flavours, ranging from mail servers to full sized ISP servers. A base of 2 GB for the main system should be sufficient, then add space and perhaps also drives for separate features you will offer. Cost is the main limiting factor here but be prepared to spend a bit if you wish to justify the "S" in ISP. Admittedly, not all do it.
Big tasks require big drives and a separate section here. If possible keep as much as possible on separate drives. Some of the appendices detail the setup of a small departmental server for 10-100 users. Here I will present a few consideration for the higher end servers. In general you should not be afraid of using RAID, not only because it is fast and safe but also because it can make growth a little less painful. All the notes below come as additions to the points mentioned earlier.
Popular servers rarely just happens, rather they grow over time and this
demands both generous amounts of disk space as well as a good net
connection. In many of these cases it might be a good idea to reserve
entire SCSI drives, in singles or as arrays, for each task. This way you
can move the data should the computer fail. Note that transferring drives
across computers is not simple and might not always work, especially in the
case of IDE drives. Drive arrays require careful setup in order to
reconstruct the data correctly, so you might want to keep a paper copy of
your fstab
file as well as a note of SCSI IDs.
Estimate how many drives you will need, if this is more than 2 I would
recommend RAID, strongly. If not you should separate users across your
drives dedicated to users based on some kind of simple hashing algorithm.
For instance you could use the first 2 letters in the user name, so
jbloggs
is put on /u/j/b/jbloggs
where /u/j
is a symbolic link to a
physical drive so you can get a balanced load on your drives.
This is an essential service if you are serious about service. Good
servers are well maintained, documented, kept up to date, and
immensely popular no matter where in the world they are located. The
big server ftp.funet.fi
is an excellent example of this.
In general this is not a question of CPU but of network bandwidth. Size is hard to estimate, mainly it is a question of ambition and service attitudes. I believe the big archive at ftp.cdrom.com is a *BSD machine with 50 GB disk. Also memory is important for a dedicated FTP server, about 256 MB RAM would be sufficient for a very big server, whereas smaller servers can get the job done well with 64 MB RAM. Network connections would still be the most important factor.
For many this is the main reason to get onto the Internet, in fact many now seem to equate the two. In addition to being network intensive there is also a fair bit of drive activity related to this, mainly regarding the caches. Keeping the cache on a separate, fast drive would be beneficial. Even better would be installing a caching proxy server. This way you can reduce the cache size for each user and speed up the service while at the same time cut down on the bandwidth requirements.
With a caching proxy server you need a fast set of drives, RAID0 would
be ideal as reliability is not important here. Higher capacity is
better but about 2 GB should be sufficient for most. Remember to match
the cache period to the capacity and demand. Too long periods would on
the other hand be a disadvantage, if possible try to adjust based on
the URL. For more information check up on the most used servers such as
Harvest
,
Squid
and the one from Netscape.
Handling mail is something most machines do to some extent. The big mail
servers, however, come into a class of their own. This is a demanding task
and a big server can be slow even when connected to fast drives and a good
net feed. In the Linux world the big server at vger.rutgers.edu
is a
well known example. Unlike a news service which is distributed and which
can partially reconstruct the spool using other machines as a feed, the
mail servers are centralised. This makes safety much more important, so for
a major server you should consider a RAID solution with emphasize on
reliability. Size is hard to estimate, it all depends on how many lists you
run as well as how many subscribers you have.
This is definitely a high volume task, and very dependent on what
news groups you subscribe to. On Nyx there is a fairly complete feed
and the spool files consume about 17 GB. The biggest groups are no doubt
in the alt.binary.*
hierarchy, so if you for some reason decide not to
get these you can get a good service with perhaps 12 GB. Still others,
that shall remain nameless, feel 2 GB is sufficient to claim ISP status.
In this case news expires so fast I feel the spelling IsP is barely
justified. A full newsfeed means a traffic of a few GB every day and this
is an ever growing number.
There are many services available on the net and even though many have been put somewhat in the shadows by the web. Nevertheless, services like archie, gopher and wais just to name a few, still exist and remain valuable tools on the net. If you are serious about starting a major server you should also consider these services. Determining the required volumes is hard, it all depends on popularity and demand. Providing good service inevitably has its costs, disk space is just one of them.
The dangers of splitting up everything into separate partitions are briefly mentioned in the section about volume management. Still, several people have asked me to emphasize this point more strongly: when one partition fills up it cannot grow any further, no matter if there is plenty of space in other partitions.
In particular look out for explosive growth in the news spool
(/var/spool/news
). For multi user machines with quotas keep
an eye on /tmp
and /var/tmp
as some people try to hide their
files there, just look out for filenames ending in gif or jpeg...
In fact, for single physical drives this scheme offers very little gains
at all, other than making file growth monitoring easier
(using 'df
') and physical track positioning. Most importantly
there is no scope for parallel disk access. A freely available volume
management system would solve this but this is still some time in the
future. However, when more specialised file systems become available
even a single disk could benefit from being divided into several
partitions.
One way to avoid the aforementioned pitfalls is to only set off fixed
partitions to directories with a fairly well known size such as swap,
/tmp
and /var/tmp
and group together the remainders
into the remaining partitions using symbolic links.
Example: a slow disk (slowdisk
),
a fast disk (fastdisk
) and an
assortment of files. Having set up swap
and tmp
on fastdisk
;
and /home
and root on slowdisk we have (the fictitious) directories
/a/slow
, /a/fast
, /b/slow
and /b/fast
left to allocate on the partitions
/mnt.slowdisk
and /mnt.fastdisk
which represents the
remaining partitions of the two drives.
Putting /a
or /b
directly on either drive gives the same
properties to the subdirectories. We could make all 4 directories
separate partitions but would lose some flexibility in managing
the size of each directory. A better solution is to make
these 4 directories symbolic links to appropriate directories on
the respective drives.
Thus we make
/a/fast point to /mnt.fastdisk/a/fast or /mnt.fastdisk/a.fast
/a/slow point to /mnt.slowdisk/a/slow or /mnt.slowdisk/a.slow
/b/fast point to /mnt.fastdisk/b/fast or /mnt.fastdisk/b.fast
/b/slow point to /mnt.slowdisk/b/slow or /mnt.slowdisk/b.slow
and we get all fast directories on the fast drive without having to set up a partition for all 4 directories. The second (right hand) alternative gives us a flatter files system which in this case can make it simpler to keep an overview of the structure.
The disadvantage is that it is a complicated scheme to set up and plan in the first place and that all mount point and partitions have to be defined before the system installation.