Administrators (and, if they log onto the servers, users) have to care about security of the system. Why? Maybe you are concerned about attackers breaking in, or maybe auditors or insurers require you maintain a certain level of security. Make sure that everyone understands what the drivers are. This seems obvious but is surprisingly rare. Security staff tend to see security as intrinsically a good thing that needs no justification, so they forget to give one. But administrators tend to see security as intrinsically an annoyance which make it more difficult for them to do their jobs, so they very much need one. Of course, telling all your staff where the security weak points are in detail may be counterproductive, but an overview that includes real-life examples of security breaches can get people interested in the topic.
Having understood why security is important to your organisation, the administration staff need to know what the rules are. There are a couple of things you can be pretty certain of: Your administrators will think that some of the security rules are pointless, and they will think that some of the rules are completely unworkable. They will probably be entirely correct in some of their concerns. A rule that seems entirely workable when devised often turns out to have a major flaw in real life. For example, a rule might ban NFS, but if the company has just invested heavily in an application that requires it, security planners are going to have to give some ground.
Listen to your administrators. If they say that a security rule isn't workable, don't ignore them. Find out their reasons. Ideally, everyone should end up in agreement (if grudging) that either the rule needs amending or there is a workable way to implement it. Not involving the administrators is just asking for the rules to be bypassed, and whilst it might seem easy to blame the staff for this, the security people are at least as much at fault. In some cases, making a rule workable might require having a vendor make software changes. If the vendor is one of the big boys and you're not a major customer, this may be unrealistic (but you can try). If the vendor is a smaller company or the software is open source, you should have a lot more luck getting changes put in, or least getting a good explanation of why it can't be done that way with suggestions of alternatives.
Because of the lack of security updates, security rules that prevent unauthorised command line access are especially important, as they protect the server perimeter (not to be confused with the network perimeter). On the technical side, this means rules that prevent passwords going across accessible networks in the clear, and rules which prevent a server being tricked into a trusted relationship with another host. They cover protocols such as telnet, rsh, ftp, DNS, NIS, and NFS. They cover password rules: technical rules preventing weak passwords and enforcing password aging, for example. Most Unix/Linux flavours have other advanced account controls you can use as well, such as the ability to check passwords against a dictionary, prevent a password being reused within a time limit, and expire passwords after a period of time (normally one month).
Enforcing strong passwords is critical. Running a password cracker such as Crack or John the Ripper can be a real eye-opener: If you haven't trained your users in choosing good passwords, it's very likely that you'll crack at least a few accounts within a minute.
Ross Anderson at Cambridge University has done some research on different methods for choosing passwords. He looked at randomly chosen simple passwords to see how secure and how easy to remember (Post-it Note resistant) they were. He found one method that was easy to remember and difficult to crack. In this initial letter method, the password is made up of the initial letters of a phrase that is meaningful to the user, with some numbers and punctuation substituted for letters. For example, if the phrase was "Hit me baby one more time -- Britney Spears," the password might be *Mb1mtBs.
Encourage administrators to use the initial letter method of choosing passwords, and run regular cracking sessions to spot users who don't. A quiet notification that someone's password has been cracked, along with a reminder of how to choose a strong password and the reasons for doing so, should be enough to get the user to change his ways.
With all this security in place across a large server environment, the risk goes up that either the security or the functionality of the system will get broken by someone either making a mistake or changing something on purpose. For example, file permissions may be changed, or a configuration file might be incorrectly amended. There has to be some ongoing check for compliance, and an audit every two years won't be enough. There doesn't have to be anything too complex or expensive. A simple shell script that runs daily or weekly should be sufficient. It should check everything it can, report non-compliance, and, if at all possible, automatically correct problems to bring the server in line with the standards. A great deal of caution is appropriate when having a script make automatic corrections to a server, but the alternative -- having someone manually fix the problems -- probably will not stand the test of time.
Having everyone protect their user accounts is important, but may not be critical if there are accounts no one is looking after. These may be accounts of people who have left the organization, or they could be accounts installed by an application that has never been used. Of course you have a well-documented process for deleting accounts when someone leaves the company or moves departments, but processes are never perfect, so you should have a backup method. In a script (like the script to check ongoing compliance), check for dormant accounts that have never been used, or haven't been used for a few months. Be careful -- no one's going to thank you if the guy on call can't log onto a server at 3 a.m. because his account was automatically deleted the previous month. You can automatically delete accounts, but if you do, you have to be very confident that all your administrators and users log onto every server they need access to on a regular basis, and you need to exclude application accounts. One way of doing this is to have administrators manually run a script once a month that automatically logs onto every server they manage, just as a handshake to update the last login time (though you can use it to do other things as well, such as change passwords).
Social engineering is an obvious way for an attacker to gain command-line access to a server. Many IT departments are large and geographically spread out, maybe even with teams in different countries. Some of your administrators may never have met each other. If a new person joins the team, some people may not find out for some time. These sorts of departments are easy targets for a social engineering attack. An attacker with some minimal knowledge of how things work in the organisation has a good chance of getting an account created or a password reset with a well-placed, convincing phone call.
It may sound odd, but make sure your administrators know who is on the team. It's not uncommon these days for companies to have administration teams spread across different locations; maybe even across countries or continents. When someone gets a phone call, he needs to know who is a legitimate administrator and who isn't. It's amazing how often new staff are introduced to the people in their own office, but no one else is even told they exist. (Even introductions within the office can be missed, until a few days later someone timidly asks who the guy in the corner is and what he's doing.) Following the same principle, the security team should be familiar to the administrators (and vice versa) with easy communication routes in both directions, whether it is just being contactable on the telephone or sending out a regular security newsletter. This all seems simple, but in modern companies it can be hard to figure out which department you're in yourself after the latest reorganisation, without worrying about who is in other departments.
Recognise that keeping the servers secure is an achievement to be proud of, and the people who manage it should be shown appreciation. Keep up the security training and awareness with boosters for existing staff as well as education for new people. Keep involving the administrators and listening to them. Consider having non-security staff going on external security training courses as part of their personal development.
Be aware of security fatigue. Time goes by and there are no (known) security breaches. People naturally start dropping their guard. What was all that security business about anyway? We don't have a security problem. Before you know it, you're back to square one. There are a few strategies to counter this. Easiest of all, keep an eye on security break-ins reported in the media and pass on details to staff. This at least keeps awareness high that the criminals are out there, even if they aren't in here right now. Repeat audits are important, but do the audit correctly. There is very little point in auditing the security documentation: most attackers won't check your security rules before breaking in, they'll just see what is actually implemented. Better to do spot checks on real servers, even if only a small number. Also, run some penetration tests. Give some external consultant access inside your perimeter and have him try to break into the computers. Involving external people is useful as they might spot problems you've missed, and they should be able to give you some insight as to how you compare with other organisations.
All being well, you end up with security that is good enough to protect the business whilst being workable for the administrators and acceptable for the owners and users. All being really well, you'll still have that security in place a year or two down the line
Monday, March 31, 2008
Managing Unix security in large organisations, part 1
Managing security in large organizations can be a challenge. Here are some practical tips for keeping your organization sealed tight.
In large heterogeneous Unix/Linux environments with several hundred servers, keeping up to date with security patches, which are the number one requirement for strong security, is next to impossible. Pushing out patches across hundreds of servers, with a mix of different operating systems, kernels, and versions, is complex, time-consuming, risky, and very expensive. For most large companies a six-month refresh of security patches is the best you can hope for, and even that is optimistic. This means that your servers will have exploitable security bugs most of the time. The challenge is not to eliminate those bugs but to mitigate the risk they pose.
The vast majority of exploitable bugs require local access for an attacker to be able to use the exploit; he has to be logged onto the server. We tend to think that the main aim of Unix security is to guard root, but for most servers in large companies this doesn't work. A competent attacker with any local login can probably find an exploit to get root; so the real challenge is to prevent attackers from getting a local command line login.
This is a big problem because all of your administrators (and maybe users too) have an account. Any one of them can allow an attacker in. No one can remember 200 passwords, so if an attacker gets a login to one box, he's probably got a login to tens or hundreds.
The first way to mitigate this risk is to compartmentalise the users and administrators. You may be able to prevent some people having command line access. Maybe they can be locked into a menu or launched into a single command (e.g. to change their password). In some cases, a chrooted environment may be suitable for users.
One method is to have each of your administrators responsible for certain groups of servers. If Bob only has access to 30 servers and his password is compromised, at least you've limited the damage that can be done. Unfortunately, this works less well in practise, because it clashes with a key aim of most IT departments to save money by consolidating support personnel. If you have your administrators compartmentalised into five teams, that's five different on-call rotas. Worse that this, you risk splitting up the Unix expertise and limiting communication between teams, which reduces everyone's effectiveness. This is a problem we'll see again: the contention between security and efficiency.
There are ways that compartmentalising can have a chance of working, but they are not very satisfactory. For example, separate teams could perform day-to-day administration on groups of systems, but with only one on-call rota shared between the teams. You set up a special on-call user account which is maintained across the servers, store the passwords securely, and allow the on-call engineer to access them as needed. If a password is accessed, change it the next day. In this way, any administrator can access any server.
Notice that identification and authentication are separated from authorisation in this model. Identification and authentication occur not on the server but when the password is accessed, which means that there needs to be a reasonably strong authorisation method for password access. When people access the server with this protocol, they are merely confirming that they are someone who has gained access to the password, which in itself is not too impressive. Whatever method of holding and accessing passwords is used, it is important that only approved people can access the on-call password and that the identity of the person who does so is correctly recorded.
With this route there's still a basic problem on the efficiency side. The people on call are required to fix problems, at all hours of the night, on systems that they do not normally log onto and so have no familiarity with. In the unlikely event that the servers are either very well documented or have very standarised builds, this might work. More likely, every server has its quirks and having a problem worked on by someone unfamiliar with those could add substantially to the time it takes to fix problems and get the production system working again.
There are other methods of compartmentalising, but these mostly restrict who has access to root commands (e.g. with the wheel group, user roles, or sudo). Because of all those security holes, this is rather like locking the stable door after the horse has bolted. To be fair, it's not pointless - not every attacker will have root exploits up his sleeve - but it's certainly not sufficient.
One other compartmentalisation method which is worth implementing is to ensure that users do not have write access to others' home directories. By default they should not, but it's common for someone to allow world write access for convenience. This allows attackers who break into one account to take control of others too, even before they have root access.
Whether or not you can compartmentalise your administrators and users, there are other techniques to make it more difficult for attackers to gain command-line user access.
One option is to enforce access by secure shell with public/private key pairs and pass-phrases. This puts a significant barrier in the way of an attacker. Instead of just getting a password, he now has to get a private key and pass-phrase: something I have and something I know. Of course for this to work, password access must be blocked.
In large environments, public/private key access are tough to manage. First there's the problem of key distribution. If there are 20 administrators and 200 servers, that's 4,000 individual keys to push around onto servers. If someone changes his key, the new key has to be pushed out to all the servers again. On the client side, each administrator's private key must be on each client machine the administrator might use to access servers (or on a network directory mounted onto every client machine). This may include home PCs and laptops that are easily stealable.
There are other problems with using public key access. Enforcing good pass-phrases and key aging is difficult. Revoking keys is also difficult. If an administrator leaves, his account should be locked or deleted. But because he had root access, he could have hidden his public key in someone else's account to give himself a back door onto the system, or created a fake user account. To some extent this can be checked for. For example, scripts can flag up accounts with more than one key, but this isn't easy, and is certainly hard work.
This is one situation where a public key infrastructure (PKI) is effective. Having a PKI allows keys to be revoked and can also overcome the key distribution problem. It doesn't make people choose strong pass-phrases though; and it is certainly non-trivial to implement.
One last thing to consider with the public key solution is the efficiency angle. A critical requirement for any solution is that administrators and users can get onto any server when they need to. As protocols get more complex, there are more things that can go wrong and prevent access, and a greater chance that the people using a product won't understand how it works and will break it, or compromise security by using it wrongly. For example, there are various file permissions which prevent secure shell working but do not cause telnet a problem. How confident are you that your solution will always allow the right people on while keeping the wrong people off? If you are not too confident, you may need an alternative method to access the servers.
Two possible server access methods that can be used in emergencies are TCP-wrappered telnet (so people can telnet to a server, but only from a specific other server within the same location, with the plain text password not leaving the computer room) and remote console access, again within a secure computer room.
All of these techniques are helpful in maintaining security
In large heterogeneous Unix/Linux environments with several hundred servers, keeping up to date with security patches, which are the number one requirement for strong security, is next to impossible. Pushing out patches across hundreds of servers, with a mix of different operating systems, kernels, and versions, is complex, time-consuming, risky, and very expensive. For most large companies a six-month refresh of security patches is the best you can hope for, and even that is optimistic. This means that your servers will have exploitable security bugs most of the time. The challenge is not to eliminate those bugs but to mitigate the risk they pose.
The vast majority of exploitable bugs require local access for an attacker to be able to use the exploit; he has to be logged onto the server. We tend to think that the main aim of Unix security is to guard root, but for most servers in large companies this doesn't work. A competent attacker with any local login can probably find an exploit to get root; so the real challenge is to prevent attackers from getting a local command line login.
This is a big problem because all of your administrators (and maybe users too) have an account. Any one of them can allow an attacker in. No one can remember 200 passwords, so if an attacker gets a login to one box, he's probably got a login to tens or hundreds.
The first way to mitigate this risk is to compartmentalise the users and administrators. You may be able to prevent some people having command line access. Maybe they can be locked into a menu or launched into a single command (e.g. to change their password). In some cases, a chrooted environment may be suitable for users.
One method is to have each of your administrators responsible for certain groups of servers. If Bob only has access to 30 servers and his password is compromised, at least you've limited the damage that can be done. Unfortunately, this works less well in practise, because it clashes with a key aim of most IT departments to save money by consolidating support personnel. If you have your administrators compartmentalised into five teams, that's five different on-call rotas. Worse that this, you risk splitting up the Unix expertise and limiting communication between teams, which reduces everyone's effectiveness. This is a problem we'll see again: the contention between security and efficiency.
There are ways that compartmentalising can have a chance of working, but they are not very satisfactory. For example, separate teams could perform day-to-day administration on groups of systems, but with only one on-call rota shared between the teams. You set up a special on-call user account which is maintained across the servers, store the passwords securely, and allow the on-call engineer to access them as needed. If a password is accessed, change it the next day. In this way, any administrator can access any server.
Notice that identification and authentication are separated from authorisation in this model. Identification and authentication occur not on the server but when the password is accessed, which means that there needs to be a reasonably strong authorisation method for password access. When people access the server with this protocol, they are merely confirming that they are someone who has gained access to the password, which in itself is not too impressive. Whatever method of holding and accessing passwords is used, it is important that only approved people can access the on-call password and that the identity of the person who does so is correctly recorded.
With this route there's still a basic problem on the efficiency side. The people on call are required to fix problems, at all hours of the night, on systems that they do not normally log onto and so have no familiarity with. In the unlikely event that the servers are either very well documented or have very standarised builds, this might work. More likely, every server has its quirks and having a problem worked on by someone unfamiliar with those could add substantially to the time it takes to fix problems and get the production system working again.
There are other methods of compartmentalising, but these mostly restrict who has access to root commands (e.g. with the wheel group, user roles, or sudo). Because of all those security holes, this is rather like locking the stable door after the horse has bolted. To be fair, it's not pointless - not every attacker will have root exploits up his sleeve - but it's certainly not sufficient.
One other compartmentalisation method which is worth implementing is to ensure that users do not have write access to others' home directories. By default they should not, but it's common for someone to allow world write access for convenience. This allows attackers who break into one account to take control of others too, even before they have root access.
Whether or not you can compartmentalise your administrators and users, there are other techniques to make it more difficult for attackers to gain command-line user access.
One option is to enforce access by secure shell with public/private key pairs and pass-phrases. This puts a significant barrier in the way of an attacker. Instead of just getting a password, he now has to get a private key and pass-phrase: something I have and something I know. Of course for this to work, password access must be blocked.
In large environments, public/private key access are tough to manage. First there's the problem of key distribution. If there are 20 administrators and 200 servers, that's 4,000 individual keys to push around onto servers. If someone changes his key, the new key has to be pushed out to all the servers again. On the client side, each administrator's private key must be on each client machine the administrator might use to access servers (or on a network directory mounted onto every client machine). This may include home PCs and laptops that are easily stealable.
There are other problems with using public key access. Enforcing good pass-phrases and key aging is difficult. Revoking keys is also difficult. If an administrator leaves, his account should be locked or deleted. But because he had root access, he could have hidden his public key in someone else's account to give himself a back door onto the system, or created a fake user account. To some extent this can be checked for. For example, scripts can flag up accounts with more than one key, but this isn't easy, and is certainly hard work.
This is one situation where a public key infrastructure (PKI) is effective. Having a PKI allows keys to be revoked and can also overcome the key distribution problem. It doesn't make people choose strong pass-phrases though; and it is certainly non-trivial to implement.
One last thing to consider with the public key solution is the efficiency angle. A critical requirement for any solution is that administrators and users can get onto any server when they need to. As protocols get more complex, there are more things that can go wrong and prevent access, and a greater chance that the people using a product won't understand how it works and will break it, or compromise security by using it wrongly. For example, there are various file permissions which prevent secure shell working but do not cause telnet a problem. How confident are you that your solution will always allow the right people on while keeping the wrong people off? If you are not too confident, you may need an alternative method to access the servers.
Two possible server access methods that can be used in emergencies are TCP-wrappered telnet (so people can telnet to a server, but only from a specific other server within the same location, with the plain text password not leaving the computer room) and remote console access, again within a secure computer room.
All of these techniques are helpful in maintaining security
Comparing Linux and AIX
Linux can learn valuable lessons from its elder cousins in the enterprise, the proprietary Unixes from the likes of IBM, Sun, and HP. Those operating systems, in turn, can learn some lessons from Linux. Comparing the features of the more enterprise-ready Linux distros with AIX, one of the leading proprietary Unixes, helps identify some of those lessons.
AIX was developed primarily for administrators, whereas Linux has been developed for and by hackers. Right from the start, a key goal of commercial Unixes is to make things easy for the people running them (though they don't always succeed). Only recently has this been a major factor in the Linux world. Some deficiencies can be fixed with improved tools, while others are more fundamental to the operating systems.
The benefit of proprietary hardware
AIX runs only on IBM's own hardware, based around the POWER family of processors, of which the POWER5 is the latest. (Apple's G5 chip is the baby brother of the POWER4.) Pretty much all the adapters and components that run in those servers are either made or rebadged by IBM. In the past IBM has almost given AIX away, making money from the hardware and services instead of the operating system software.
Using a single hardware architecture removes a big headache for AIX developers. There is no struggling to write device drivers for thousands of obscure devices, for a start. By controlling the hardware platform IBM can offer high-end hardware features such as hot-swap adapters and logical partitioning, not to mention servers where the firmware (equivalent of the BIOS) can be accessed through a Web browser when the server is powered off.
There is a significant price premium for this hardware, but there are great benefits too. CPU and memory are not all that matters (though IBM's latest model comes with up to 512GB of RAM, which should be enough for most people). Many companies are happy to pay more, or sacrifice speed, to improve reliability, availability, and serviceability. If an hour of downtime costs your business tens of thousands of dollars, this is a big deal.
Luckily, Linux is coming to have the best of both worlds. Those who want to take advantage of IBM's fancy hardware features can now run SUSE or Red Hat Linux on just about any server than IBM makes and, with logical partitioning, can even run Linux and AIX on the same server at the same time.
Device management
Linux has always been somewhat clumsy at device management. I often find myself trawling through dmesg and playing "guess the device" to figure out if some device is there and how it has been configured. Whether a particular piece of information about a device is available often seems a matter of luck. A variety of other commands with different syntaxes and outputs help to cobble together an overall picture of the hardware on a system.
AIX is a breath of fresh air in comparison. Devices can be queried easily through a few commands. The syntax for amending device settings is clear and consistent across all devices, and the amount of information available on each device is huge.
If new devices are added to a running system, a single command configures them all and installs device drivers where needed.
On my home PC, with a handful of disks and adapters, maybe I don't need the device information to be so easy to access and update. On an enterprise server with 150 PCI adapters and a few hundred disks, however, it becomes a lot more important to have good accurate information about exactly what and where everything is and what it is all doing.
Systems management
For new and experienced AIX administrators alike, AIX's Systems Management Interface Tool (SMIT) is a useful (and often essential) tool. Think of it as YaST2 with fewer sexy graphics but more functionality. About 80% of administration tasks on an AIX system can be done using SMIT. It's simple, easy to understand, mature, and it works. One nice feature is that it always saves the command or script it has run to a file, so you can do something once in SMIT and then script it thereafter. You can even say "don't do this for real, but log the command you would have run."
AIX also has a Web administration tool which, while slow (accessing via the bundled Windows or Linux PC client speeds it up) and occasionally buggy, is still a long way ahead of anything Linux has to offer. Want to set up ipsec? AIX has a nice wizard that makes it easy.
Linux is improving quickly with systems management, but some developers still seem to feel that if is isn't obscure and complicated, there's something wrong. That's fine for hackers, but companies want to employ administrators to run their systems, not hackers, and administrators like things to be easy, especially when they've got a few hundred systems to manage.
Installation and upgrades
Major OS upgrades are still a weak point for Linux. I've tried upgrades on a number of different Linux distros. Sometimes they work, sometimes they don't, and more often than not, I end up installing from scratch.
In comparison, AIX very rarely has a problem with upgrades, even when jumping several versions. I go into an AIX upgrade confident that it will work, and I go into a Linux upgrade with a feeling that it's 50/50.
For new installations, the picture is more balanced. AIX has few problems with new installs. If Linux has a problem, it's normally with some odd hardware -- not a problem AIX has to deal with, of course. Where AIX falls down is the lack of installation options. Only in the latest version of AIX has it been possible to specify a graphics-free installation, and the ability to choose packages at installation time is very limited.
AIX includes the Network Installation Manager (NIM), which can perform new installations, upgrades, software installation, and a number of other tasks across the network. It is easy to set up (via command line, menu, or wizard) and it works well. Similar tools exist for Linux, but right now they lack some of the functionality.
Security
The proprietary Unixes have traditionally fallen down a little on security, and AIX is no exception. From a commercial perspective it makes sense to not alienate your users, so usability has always taken precedence over security. The last thing IBM or Sun wants is businesses performing upgrades that stop their applications working correctly.
The result of this corporate caution is that a fresh install of AIX has gaping security holes. Services such as telnet, ftp, and rshd are enabled by default. Secure Shell (SSH) and TCP Wrappers aren't even installed (IBM ships both, but on a separate CD). AIX does come with some basic packet filtering, but there's no firewall on by default and it isn't easy to configure. Filesystem and swap space encryption aren't there either.
Compare this to Linux, where SSH is the default, most insecure services are disabled, a wealth of security software is shipped with almost every distro, and much effort has been put into helping users secure their systems.
AIX can be configured securely. IBM has a nice white paper that guides you through a lot of the tasks, but it isn't trivial to do, and the result is that a lot of companies don't, and tools like telnet are still a lot more common than they should be.
Managing disks and filesystems
Disk and filesystem management is an area where AIX is still well ahead of Linux. AIX doesn't have partitions or slices -- it has a logical volume manager instead. Logical volumes and volume groups are fundamentals on AIX, not add-ons.
To show how this can help, let's look at some of the things than can be done on AIX while the system is running normally, all using software.
Data can be mirrored and unmirrored online between any two disks. Want to mirror data between a local SCSI disk and a NAS-attached iSCSI disk of different sizes? No problem. A mirror copy can be broken off to create a "point-in-time" backup of how the system looked at that moment, then re-integrated later on.
Whole filesystems can be moved between disks, or spread out over different disks, all while users carry on oblivious. How about setting up a group of disks and making one a spare, so if another fails the spare automatically takes over, the data being copied over to it? That's simple too.
The OS can even be upgraded on a running system. You can create a copy of the OS disks, upgrade the copy, and then reboot from the upgraded disk; if it doesn't work, just switch back to the old one.
All of these, and more, come standard with the AIX operating system and can be done from simple command lines and menus. By contrast, even something like software mirroring on Linux is complicated in comparison to the one-line AIX commands.
Workload management
A perpetual problem with high-end computers is that they have too much computing capacity. Both Sun and IBM believe that their servers are often no more than 20% utilised. Luckily, all the major vendors have come up with solutions to help customers make effective use of the computing power they've spent so much money on.
Logical partitioning is the flavour of the day, with the ability to split servers up and have multiple instances running. Sun have extended this with its N-1 Grid Containers, effectively an advanced chrooted environment with multiple instances running under the same OS environment.
IBM have achieved a similar result in a slightly different way. With IBM's latest hardware, what looks like a separate computer can run on as little as one-tenth of a CPU, meaning that twenty instances of AIX can run on a dual-processor server. Even better, they can share Ethernet adapters and disks, so there is no need to have hundreds of adapters (though you can if you want to). You can even have your partitions talk to each other over the network adapter, using a virtual switch (with VLAN functionality) held in the server firmware. These partitions do not sit on top of an underlying OS; they run directly on the server.
Most of these functions are available for SUSE and Red Hat Linux on the POWER5 platform too, for those with generous hardware budgets.
Conclusion
Linux has come a long way in the last few years, but for high-end functionality and maturity, the likes of AIX and other high-end Unixes still have a significant edge. When it comes to security, though, Linux is ahead of the game, so the catching up is on the other side.
AIX was developed primarily for administrators, whereas Linux has been developed for and by hackers. Right from the start, a key goal of commercial Unixes is to make things easy for the people running them (though they don't always succeed). Only recently has this been a major factor in the Linux world. Some deficiencies can be fixed with improved tools, while others are more fundamental to the operating systems.
The benefit of proprietary hardware
AIX runs only on IBM's own hardware, based around the POWER family of processors, of which the POWER5 is the latest. (Apple's G5 chip is the baby brother of the POWER4.) Pretty much all the adapters and components that run in those servers are either made or rebadged by IBM. In the past IBM has almost given AIX away, making money from the hardware and services instead of the operating system software.
Using a single hardware architecture removes a big headache for AIX developers. There is no struggling to write device drivers for thousands of obscure devices, for a start. By controlling the hardware platform IBM can offer high-end hardware features such as hot-swap adapters and logical partitioning, not to mention servers where the firmware (equivalent of the BIOS) can be accessed through a Web browser when the server is powered off.
There is a significant price premium for this hardware, but there are great benefits too. CPU and memory are not all that matters (though IBM's latest model comes with up to 512GB of RAM, which should be enough for most people). Many companies are happy to pay more, or sacrifice speed, to improve reliability, availability, and serviceability. If an hour of downtime costs your business tens of thousands of dollars, this is a big deal.
Luckily, Linux is coming to have the best of both worlds. Those who want to take advantage of IBM's fancy hardware features can now run SUSE or Red Hat Linux on just about any server than IBM makes and, with logical partitioning, can even run Linux and AIX on the same server at the same time.
Device management
Linux has always been somewhat clumsy at device management. I often find myself trawling through dmesg and playing "guess the device" to figure out if some device is there and how it has been configured. Whether a particular piece of information about a device is available often seems a matter of luck. A variety of other commands with different syntaxes and outputs help to cobble together an overall picture of the hardware on a system.
AIX is a breath of fresh air in comparison. Devices can be queried easily through a few commands. The syntax for amending device settings is clear and consistent across all devices, and the amount of information available on each device is huge.
If new devices are added to a running system, a single command configures them all and installs device drivers where needed.
On my home PC, with a handful of disks and adapters, maybe I don't need the device information to be so easy to access and update. On an enterprise server with 150 PCI adapters and a few hundred disks, however, it becomes a lot more important to have good accurate information about exactly what and where everything is and what it is all doing.
Systems management
For new and experienced AIX administrators alike, AIX's Systems Management Interface Tool (SMIT) is a useful (and often essential) tool. Think of it as YaST2 with fewer sexy graphics but more functionality. About 80% of administration tasks on an AIX system can be done using SMIT. It's simple, easy to understand, mature, and it works. One nice feature is that it always saves the command or script it has run to a file, so you can do something once in SMIT and then script it thereafter. You can even say "don't do this for real, but log the command you would have run."
AIX also has a Web administration tool which, while slow (accessing via the bundled Windows or Linux PC client speeds it up) and occasionally buggy, is still a long way ahead of anything Linux has to offer. Want to set up ipsec? AIX has a nice wizard that makes it easy.
Linux is improving quickly with systems management, but some developers still seem to feel that if is isn't obscure and complicated, there's something wrong. That's fine for hackers, but companies want to employ administrators to run their systems, not hackers, and administrators like things to be easy, especially when they've got a few hundred systems to manage.
Installation and upgrades
Major OS upgrades are still a weak point for Linux. I've tried upgrades on a number of different Linux distros. Sometimes they work, sometimes they don't, and more often than not, I end up installing from scratch.
In comparison, AIX very rarely has a problem with upgrades, even when jumping several versions. I go into an AIX upgrade confident that it will work, and I go into a Linux upgrade with a feeling that it's 50/50.
For new installations, the picture is more balanced. AIX has few problems with new installs. If Linux has a problem, it's normally with some odd hardware -- not a problem AIX has to deal with, of course. Where AIX falls down is the lack of installation options. Only in the latest version of AIX has it been possible to specify a graphics-free installation, and the ability to choose packages at installation time is very limited.
AIX includes the Network Installation Manager (NIM), which can perform new installations, upgrades, software installation, and a number of other tasks across the network. It is easy to set up (via command line, menu, or wizard) and it works well. Similar tools exist for Linux, but right now they lack some of the functionality.
Security
The proprietary Unixes have traditionally fallen down a little on security, and AIX is no exception. From a commercial perspective it makes sense to not alienate your users, so usability has always taken precedence over security. The last thing IBM or Sun wants is businesses performing upgrades that stop their applications working correctly.
The result of this corporate caution is that a fresh install of AIX has gaping security holes. Services such as telnet, ftp, and rshd are enabled by default. Secure Shell (SSH) and TCP Wrappers aren't even installed (IBM ships both, but on a separate CD). AIX does come with some basic packet filtering, but there's no firewall on by default and it isn't easy to configure. Filesystem and swap space encryption aren't there either.
Compare this to Linux, where SSH is the default, most insecure services are disabled, a wealth of security software is shipped with almost every distro, and much effort has been put into helping users secure their systems.
AIX can be configured securely. IBM has a nice white paper that guides you through a lot of the tasks, but it isn't trivial to do, and the result is that a lot of companies don't, and tools like telnet are still a lot more common than they should be.
Managing disks and filesystems
Disk and filesystem management is an area where AIX is still well ahead of Linux. AIX doesn't have partitions or slices -- it has a logical volume manager instead. Logical volumes and volume groups are fundamentals on AIX, not add-ons.
To show how this can help, let's look at some of the things than can be done on AIX while the system is running normally, all using software.
Data can be mirrored and unmirrored online between any two disks. Want to mirror data between a local SCSI disk and a NAS-attached iSCSI disk of different sizes? No problem. A mirror copy can be broken off to create a "point-in-time" backup of how the system looked at that moment, then re-integrated later on.
Whole filesystems can be moved between disks, or spread out over different disks, all while users carry on oblivious. How about setting up a group of disks and making one a spare, so if another fails the spare automatically takes over, the data being copied over to it? That's simple too.
The OS can even be upgraded on a running system. You can create a copy of the OS disks, upgrade the copy, and then reboot from the upgraded disk; if it doesn't work, just switch back to the old one.
All of these, and more, come standard with the AIX operating system and can be done from simple command lines and menus. By contrast, even something like software mirroring on Linux is complicated in comparison to the one-line AIX commands.
Workload management
A perpetual problem with high-end computers is that they have too much computing capacity. Both Sun and IBM believe that their servers are often no more than 20% utilised. Luckily, all the major vendors have come up with solutions to help customers make effective use of the computing power they've spent so much money on.
Logical partitioning is the flavour of the day, with the ability to split servers up and have multiple instances running. Sun have extended this with its N-1 Grid Containers, effectively an advanced chrooted environment with multiple instances running under the same OS environment.
IBM have achieved a similar result in a slightly different way. With IBM's latest hardware, what looks like a separate computer can run on as little as one-tenth of a CPU, meaning that twenty instances of AIX can run on a dual-processor server. Even better, they can share Ethernet adapters and disks, so there is no need to have hundreds of adapters (though you can if you want to). You can even have your partitions talk to each other over the network adapter, using a virtual switch (with VLAN functionality) held in the server firmware. These partitions do not sit on top of an underlying OS; they run directly on the server.
Most of these functions are available for SUSE and Red Hat Linux on the POWER5 platform too, for those with generous hardware budgets.
Conclusion
Linux has come a long way in the last few years, but for high-end functionality and maturity, the likes of AIX and other high-end Unixes still have a significant edge. When it comes to security, though, Linux is ahead of the game, so the catching up is on the other side.
AIX Affinity With Linux
Linux Background:
The Linux operating system has gained popularity through its close connection with
Internet computing and e-business. The operating system has gained a large share of this business because of the high number of applications that have been developed.
Linux’s initial attraction was that it was a “free” operating system, meaning that the source code was made available without charge. While the lack of a cost was an initial appeal, the real appeal is proving to be the applications that have been either developed or ported to Linux. Examples include Sendmail, Apache web server, and Samba (NT file and Print server emulator).
Every successful operating system has had a breakthrough application, the breakthrough application for Linux was (and still is) the Apache Web Server. Apache is by far the most widely used HTTP server on the Internet.
Further spurring the growth of Linux is the availability of the GNU tools. GNU is an
open source project that has developed a series of tools from compilers to text editors.These tools have been ported to Linux and are the tools of choice for many developers of Linux applications.
Why AIX Affinity with Linux?
The value is in the data and the applications. Developing applications is a costly and time-consuming process. If a company needed to move from a low end Intel based
system to a high performing IBM ^ pSeries or IBM RS/6000® system they
usually had to develop all new applications.
The first issue became how to assist companies that currently use Linux based application and need a mission critical system easily move to AIX 5L. The answer is to offer a set of integrated API’s and header files that will allow a Linux application to be recompiled to run on AIX 5L. AIX version 4.3.3 and AIX 5L version 5.0 today has many of the necessary APIs to run Linux application, with AIX 5L Version 5.1, there will be an even greater degree of compatibility between AIX and Linux.
The second issue is that applications are in a constant state of development, either
through enhancements or through fixing of bugs. Thus it is important that these
companies be able to work on their applications using familiar tools. The answer was to port key components of the GNU tool set, along with other open source tools, to AIX 5L.
GNU tools allows customers to work on existing applications, as well as develop new
applications using tools that they are familiar with. GNU tools are also the tools needed to recompile Linux applications to run on AIX 5L and AIX 4.3.3. This issue is addressed by the AIX Toolbox for Linux Applications, with GNU tools that have been recompiled for AIX as well as many other useful open source tools and utilities
AIX 4.3.3 and AIX 5L Version 5.0 already have affinity with Linux. Thus, you can
benefit today from AIX affinity with Linux with additional source compatibility available in AIX 5L Version 5.1.
When to use:
When considering how to best utilize AIX Affinity with Linux it is important to consider impacts to performance. AIX Affinity with Linux is designed to provide the best performance possible, however there are a couple of issues to consider that are outside the control of AIX Affinity with Linux that can influence performance.
The Linux application being deployed on AIX will have full access to all AIX
functionality, just like an application natively developed for AIX. AIX currently has a high level of compatibility with Linux, and with AIX 5L version 5.1, IBM plans to provide an even greater affinity between AIX and Linux. Thus, for a Linux application to take advantage of AIX it does not need to run through any additional layer or wrapper.
The question of performance is not one of the functionality of the recompiled Linux
application to take advantage of AIX and the IBM POWER architecture (and in the future the Intel Itanium architecture) but one of the performance of the compiler used to build the application. Most applications that have been developed natively for AIX use the IBM Visual Age compiler, while applications developed natively for Linux utilize the GNU compilers. Thus, you can expect to see a performance advantage for AIX applications that have been built using the IBM Visual Age compiler. At this time the IBM Visual Age compiler is not available for Linux applications.
The Application Programming Interface (API) method that AIX utilizes, provides a higher degree of integration between the application and the operating system than can be achieved using a layered or wrapper approach such as found in an Application Binary Interface (ABI) approach
When considering where to utilize AIX Affinity with Linux it is important to consider what applications you will be using for front-end and back-end. Many back-end applications such as databases are available on AIX. If the back-end application you are using is currently available natively on AIX, you should consider using that application rather than porting the Linux version to AIX. Another consideration is what applications in your portfolio are not performance sensitive, do not have a lot of computational requirements etc... that would benefit from the IBM Visual Age Compiler.
An example of how to utilize AIX Affinity with Linux technology is for front-end
applications. These are applications that are communicating with a back-end application.
Front-end applications typically have little or no areas where a compiler would make a significant performance advantage.
Thus, a company that develops its front-end applications on Linux can deploy them
across IBMs range of AIX and Linux enabled servers being it on Native Linux or AIX.
For back-end applications where performance is key, it is best to deploy an application that was developed for AIX. Most of these applications will have been developed utilizing the high performance IBM Visual Age compilers. However, there is nothing to preclude a back-end application from being developed on Linux and deployed on AIX. The performance difference will depend upon the application, and may be negligible.
The Linux operating system has gained popularity through its close connection with
Internet computing and e-business. The operating system has gained a large share of this business because of the high number of applications that have been developed.
Linux’s initial attraction was that it was a “free” operating system, meaning that the source code was made available without charge. While the lack of a cost was an initial appeal, the real appeal is proving to be the applications that have been either developed or ported to Linux. Examples include Sendmail, Apache web server, and Samba (NT file and Print server emulator).
Every successful operating system has had a breakthrough application, the breakthrough application for Linux was (and still is) the Apache Web Server. Apache is by far the most widely used HTTP server on the Internet.
Further spurring the growth of Linux is the availability of the GNU tools. GNU is an
open source project that has developed a series of tools from compilers to text editors.These tools have been ported to Linux and are the tools of choice for many developers of Linux applications.
Why AIX Affinity with Linux?
The value is in the data and the applications. Developing applications is a costly and time-consuming process. If a company needed to move from a low end Intel based
system to a high performing IBM ^ pSeries or IBM RS/6000® system they
usually had to develop all new applications.
The first issue became how to assist companies that currently use Linux based application and need a mission critical system easily move to AIX 5L. The answer is to offer a set of integrated API’s and header files that will allow a Linux application to be recompiled to run on AIX 5L. AIX version 4.3.3 and AIX 5L version 5.0 today has many of the necessary APIs to run Linux application, with AIX 5L Version 5.1, there will be an even greater degree of compatibility between AIX and Linux.
The second issue is that applications are in a constant state of development, either
through enhancements or through fixing of bugs. Thus it is important that these
companies be able to work on their applications using familiar tools. The answer was to port key components of the GNU tool set, along with other open source tools, to AIX 5L.
GNU tools allows customers to work on existing applications, as well as develop new
applications using tools that they are familiar with. GNU tools are also the tools needed to recompile Linux applications to run on AIX 5L and AIX 4.3.3. This issue is addressed by the AIX Toolbox for Linux Applications, with GNU tools that have been recompiled for AIX as well as many other useful open source tools and utilities
AIX 4.3.3 and AIX 5L Version 5.0 already have affinity with Linux. Thus, you can
benefit today from AIX affinity with Linux with additional source compatibility available in AIX 5L Version 5.1.
When to use:
When considering how to best utilize AIX Affinity with Linux it is important to consider impacts to performance. AIX Affinity with Linux is designed to provide the best performance possible, however there are a couple of issues to consider that are outside the control of AIX Affinity with Linux that can influence performance.
The Linux application being deployed on AIX will have full access to all AIX
functionality, just like an application natively developed for AIX. AIX currently has a high level of compatibility with Linux, and with AIX 5L version 5.1, IBM plans to provide an even greater affinity between AIX and Linux. Thus, for a Linux application to take advantage of AIX it does not need to run through any additional layer or wrapper.
The question of performance is not one of the functionality of the recompiled Linux
application to take advantage of AIX and the IBM POWER architecture (and in the future the Intel Itanium architecture) but one of the performance of the compiler used to build the application. Most applications that have been developed natively for AIX use the IBM Visual Age compiler, while applications developed natively for Linux utilize the GNU compilers. Thus, you can expect to see a performance advantage for AIX applications that have been built using the IBM Visual Age compiler. At this time the IBM Visual Age compiler is not available for Linux applications.
The Application Programming Interface (API) method that AIX utilizes, provides a higher degree of integration between the application and the operating system than can be achieved using a layered or wrapper approach such as found in an Application Binary Interface (ABI) approach
When considering where to utilize AIX Affinity with Linux it is important to consider what applications you will be using for front-end and back-end. Many back-end applications such as databases are available on AIX. If the back-end application you are using is currently available natively on AIX, you should consider using that application rather than porting the Linux version to AIX. Another consideration is what applications in your portfolio are not performance sensitive, do not have a lot of computational requirements etc... that would benefit from the IBM Visual Age Compiler.
An example of how to utilize AIX Affinity with Linux technology is for front-end
applications. These are applications that are communicating with a back-end application.
Front-end applications typically have little or no areas where a compiler would make a significant performance advantage.
Thus, a company that develops its front-end applications on Linux can deploy them
across IBMs range of AIX and Linux enabled servers being it on Native Linux or AIX.
For back-end applications where performance is key, it is best to deploy an application that was developed for AIX. Most of these applications will have been developed utilizing the high performance IBM Visual Age compilers. However, there is nothing to preclude a back-end application from being developed on Linux and deployed on AIX. The performance difference will depend upon the application, and may be negligible.
Sunday, March 30, 2008
perl - Practical Extraction and Report Language
Perl is a language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal).
Perl combines (in the author's opinion, anyway) some of the best features of C, sed, awk, and sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC-PLUS.) Expression syntax corresponds closely to C expression syntax. Unlike most Unix utilities, Perl does not arbitrarily limit the size of your data--if you've got the memory, Perl can slurp in your whole file as a single string. Recursion is of unlimited depth. And the tables used by hashes (sometimes called "associative arrays") grow as necessary to prevent degraded performance. Perl can use sophisticated pattern matching techniques to scan large amounts of data quickly. Although optimized for scanning text, Perl can also deal with binary data, and can make dbm files look like hashes. Setuid Perl scripts are safer than C programs through a dataflow tracing mechanism that prevents many stupid security holes.
If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must run a little faster, and you don't want to write the silly thing in C, then Perl may be for you. There are also translators to turn your sed and awk scripts into Perl scripts.
SYNOPSIS
perl [ -sTtuUWX ] [ -hv ] [ -V[:configvar] ] [ -cw ] [ -d[t][:debugger] ] [ -D[number/list] ] [ -pna ] [ -Fpattern ] [ -l[octal] ] [ -0[octal/hexadecimal] ] [ -Idir ] [ -m[-]module ] [ -M[-]'module...' ] [ -f ] [ -C [number/list] ] [ -P ] [ -S ] [ -x[dir] ] [ -i[extension] ] [ -e 'command' ] [ -- ] [ programfile ] [ argument ]...
If you're new to Perl, you should start with perlintro, which is a general intro for beginners and provides some background to help you navigate the rest of Perl's extensive documentation.
For ease of access, the Perl manual has been split up into several sections.
Overview
perl Perl overview (this section)
perlintro Perl introduction for beginners
perltoc Perl documentation table of contentsTutorials
perlreftut Perl references short introduction
perldsc Perl data structures intro
perllol Perl data structures: arrays of arrays perlrequick Perl regular expressions quick start
perlretut Perl regular expressions tutorial perlboot Perl OO tutorial for beginners
perltoot Perl OO tutorial, part 1
perltooc Perl OO tutorial, part 2
perlbot Perl OO tricks and examples perlstyle Perl style guide perlcheat Perl cheat sheet
perltrap Perl traps for the unwary
perldebtut Perl debugging tutorial perlfaq Perl frequently asked questions
perlfaq1 General Questions About Perl
perlfaq2 Obtaining and Learning about Perl
perlfaq3 Programming Tools
perlfaq4 Data Manipulation
perlfaq5 Files and Formats
perlfaq6 Regexes
perlfaq7 Perl Language Issues
perlfaq8 System Interaction
perlfaq9 NetworkingReference Manual
perlsyn Perl syntax
perldata Perl data structures
perlop Perl operators and precedence
perlsub Perl subroutines
perlfunc Perl built-in functions
perlopentut Perl open() tutorial
perlpacktut Perl pack() and unpack() tutorial
perlpod Perl plain old documentation
perlpodspec Perl plain old documentation format specification
perlrun Perl execution and options
perldiag Perl diagnostic messages
perllexwarn Perl warnings and their control
perldebug Perl debugging
perlvar Perl predefined variables
perlre Perl regular expressions, the rest of the story
perlrebackslash Perl regular expression backslash sequences
perlrecharclass Perl regular expression character classes
perlreref Perl regular expressions quick reference
perlref Perl references, the rest of the story
perlform Perl formats
perlobj Perl objects
perltie Perl objects hidden behind simple variables
perldbmfilter Perl DBM filters perlipc Perl interprocess communication
perlfork Perl fork() information
perlnumber Perl number semantics perlthrtut Perl threads tutorial
perlothrtut Old Perl threads tutorial perlport Perl portability guide
perllocale Perl locale support
perluniintro Perl Unicode introduction
perlunicode Perl Unicode support
perlunifaq Perl Unicode FAQ
perlunitut Perl Unicode tutorial
perlebcdic Considerations for running Perl on EBCDIC platforms perlsec Perl security perlmod Perl modules: how they work
perlmodlib Perl modules: how to write and use
perlmodstyle Perl modules: how to write modules with style
perlmodinstall Perl modules: how to install from CPAN
perlnewmod Perl modules: preparing a new module for distribution
perlpragma Perl modules: writing a user pragma perlutil utilities packaged with the Perl distribution perlcompile Perl compiler suite intro perlfilter Perl source filters perlglossary Perl GlossaryInternals and C Language Interface
perlembed Perl ways to embed perl in your C or C++ application
perldebguts Perl debugging guts and tips
perlxstut Perl XS tutorial
perlxs Perl XS application programming interface
perlclib Internal replacements for standard C library functions
perlguts Perl internal functions for those doing extensions
perlcall Perl calling conventions from C
perlreapi Perl regular expression plugin interface
perlreguts Perl regular expression engine internals perlapi Perl API listing (autogenerated)
perlintern Perl internal functions (autogenerated)
perliol C API for Perl's implementation of IO in Layers
perlapio Perl internal IO abstraction interface perlhack Perl hackers guideMiscellaneous
perlbook Perl book information
perlcommunity Perl community information
perltodo Perl things to do perldoc Look up Perl documentation in Pod format perlhist Perl history records
perldelta Perl changes since previous version
perl595delta Perl changes in version 5.9.5
perl594delta Perl changes in version 5.9.4
perl593delta Perl changes in version 5.9.3
perl592delta Perl changes in version 5.9.2
perl591delta Perl changes in version 5.9.1
perl590delta Perl changes in version 5.9.0
perl588delta Perl changes in version 5.8.8
perl587delta Perl changes in version 5.8.7
perl586delta Perl changes in version 5.8.6
perl585delta Perl changes in version 5.8.5
perl584delta Perl changes in version 5.8.4
perl583delta Perl changes in version 5.8.3
perl582delta Perl changes in version 5.8.2
perl581delta Perl changes in version 5.8.1
perl58delta Perl changes in version 5.8.0
perl573delta Perl changes in version 5.7.3
perl572delta Perl changes in version 5.7.2
perl571delta Perl changes in version 5.7.1
perl570delta Perl changes in version 5.7.0
perl561delta Perl changes in version 5.6.1
perl56delta Perl changes in version 5.6
perl5005delta Perl changes in version 5.005
perl5004delta Perl changes in version 5.004 perlartistic Perl Artistic License
perlgpl GNU General Public LicenseLanguage-Specific
perlcn Perl for Simplified Chinese (in EUC-CN)
perljp Perl for Japanese (in EUC-JP)
perlko Perl for Korean (in EUC-KR)
perltw Perl for Traditional Chinese (in Big5)Platform-Specific
perlaix Perl notes for AIX
perlamiga Perl notes for AmigaOS
perlapollo Perl notes for Apollo DomainOS
perlbeos Perl notes for BeOS
perlbs2000 Perl notes for POSIX-BC BS2000
perlce Perl notes for WinCE
perlcygwin Perl notes for Cygwin
perldgux Perl notes for DG/UX
perldos Perl notes for DOS
perlepoc Perl notes for EPOC
perlfreebsd Perl notes for FreeBSD
perlhpux Perl notes for HP-UX
perlhurd Perl notes for Hurd
perlirix Perl notes for Irix
perllinux Perl notes for Linux
perlmachten Perl notes for Power MachTen
perlmacos Perl notes for Mac OS (Classic)
perlmacosx Perl notes for Mac OS X
perlmint Perl notes for MiNT
perlmpeix Perl notes for MPE/iX
perlnetware Perl notes for NetWare
perlopenbsd Perl notes for OpenBSD
perlos2 Perl notes for OS/2
perlos390 Perl notes for OS/390
perlos400 Perl notes for OS/400
perlplan9 Perl notes for Plan 9
perlqnx Perl notes for QNX
perlriscos Perl notes for RISC OS
perlsolaris Perl notes for Solaris
perlsymbian Perl notes for Symbian
perltru64 Perl notes for Tru64
perluts Perl notes for UTS
perlvmesa Perl notes for VM/ESA
perlvms Perl notes for VMS
perlvos Perl notes for Stratus VOS
perlwin32 Perl notes for WindowsBy default, the manpages listed above are installed in the /usr/local/man/ directory.
Extensive additional documentation for Perl modules is available. The default configuration for perl will place this additional documentation in the /usr/local/lib/perl5/man directory (or else in the man subdirectory of the Perl library directory). Some of this additional documentation is distributed standard with Perl, but you'll also find documentation for third-party modules there.
You should be able to view Perl's documentation with your man(1) program by including the proper directories in the appropriate start-up files, or in the MANPATH environment variable. To find out where the configuration has installed the manpages, type:
perl -V:man.dirIf the directories have a common stem, such as /usr/local/man/man1 and /usr/local/man/man3, you need only to add that stem (/usr/local/man) to your man(1) configuration files or your MANPATH environment variable. If they do not share a stem, you'll have to add both stems.
If that doesn't work for some reason, you can still use the supplied perldoc script to view module information. You might also look into getting a replacement man program.
If something strange has gone wrong with your program and you're not sure where you should look for help, try the -w switch first. It will often point out exactly where the trouble is.
BUGS
Perl is at the mercy of your machine's definitions of various operations such as type casting, atof(), and floating-point output with sprintf().
If your stdio requires a seek or eof between reads and writes on a particular stream, so does Perl. (This doesn't apply to sysread() and syswrite().)
While none of the built-in data types have any arbitrary size limits (apart from memory size), there are still a few arbitrary limits: a given variable name may not be longer than 251 characters. Line numbers displayed by diagnostics are internally stored as short integers, so they are limited to a maximum of 65535 (higher numbers usually being affected by wraparound).
You may mail your bug reports (be sure to include full configuration information as output by the myconfig program in the perl source tree, or by perl -V ) to perlbug@perl.org . If you've succeeded in compiling perl, the perlbug script in the utils/ subdirectory can be used to help mail in a bug report
Perl combines (in the author's opinion, anyway) some of the best features of C, sed, awk, and sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC-PLUS.) Expression syntax corresponds closely to C expression syntax. Unlike most Unix utilities, Perl does not arbitrarily limit the size of your data--if you've got the memory, Perl can slurp in your whole file as a single string. Recursion is of unlimited depth. And the tables used by hashes (sometimes called "associative arrays") grow as necessary to prevent degraded performance. Perl can use sophisticated pattern matching techniques to scan large amounts of data quickly. Although optimized for scanning text, Perl can also deal with binary data, and can make dbm files look like hashes. Setuid Perl scripts are safer than C programs through a dataflow tracing mechanism that prevents many stupid security holes.
If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must run a little faster, and you don't want to write the silly thing in C, then Perl may be for you. There are also translators to turn your sed and awk scripts into Perl scripts.
SYNOPSIS
perl [ -sTtuUWX ] [ -hv ] [ -V[:configvar] ] [ -cw ] [ -d[t][:debugger] ] [ -D[number/list] ] [ -pna ] [ -Fpattern ] [ -l[octal] ] [ -0[octal/hexadecimal] ] [ -Idir ] [ -m[-]module ] [ -M[-]'module...' ] [ -f ] [ -C [number/list] ] [ -P ] [ -S ] [ -x[dir] ] [ -i[extension] ] [ -e 'command' ] [ -- ] [ programfile ] [ argument ]...
If you're new to Perl, you should start with perlintro, which is a general intro for beginners and provides some background to help you navigate the rest of Perl's extensive documentation.
For ease of access, the Perl manual has been split up into several sections.
Overview
perl Perl overview (this section)
perlintro Perl introduction for beginners
perltoc Perl documentation table of contentsTutorials
perlreftut Perl references short introduction
perldsc Perl data structures intro
perllol Perl data structures: arrays of arrays perlrequick Perl regular expressions quick start
perlretut Perl regular expressions tutorial perlboot Perl OO tutorial for beginners
perltoot Perl OO tutorial, part 1
perltooc Perl OO tutorial, part 2
perlbot Perl OO tricks and examples perlstyle Perl style guide perlcheat Perl cheat sheet
perltrap Perl traps for the unwary
perldebtut Perl debugging tutorial perlfaq Perl frequently asked questions
perlfaq1 General Questions About Perl
perlfaq2 Obtaining and Learning about Perl
perlfaq3 Programming Tools
perlfaq4 Data Manipulation
perlfaq5 Files and Formats
perlfaq6 Regexes
perlfaq7 Perl Language Issues
perlfaq8 System Interaction
perlfaq9 NetworkingReference Manual
perlsyn Perl syntax
perldata Perl data structures
perlop Perl operators and precedence
perlsub Perl subroutines
perlfunc Perl built-in functions
perlopentut Perl open() tutorial
perlpacktut Perl pack() and unpack() tutorial
perlpod Perl plain old documentation
perlpodspec Perl plain old documentation format specification
perlrun Perl execution and options
perldiag Perl diagnostic messages
perllexwarn Perl warnings and their control
perldebug Perl debugging
perlvar Perl predefined variables
perlre Perl regular expressions, the rest of the story
perlrebackslash Perl regular expression backslash sequences
perlrecharclass Perl regular expression character classes
perlreref Perl regular expressions quick reference
perlref Perl references, the rest of the story
perlform Perl formats
perlobj Perl objects
perltie Perl objects hidden behind simple variables
perldbmfilter Perl DBM filters perlipc Perl interprocess communication
perlfork Perl fork() information
perlnumber Perl number semantics perlthrtut Perl threads tutorial
perlothrtut Old Perl threads tutorial perlport Perl portability guide
perllocale Perl locale support
perluniintro Perl Unicode introduction
perlunicode Perl Unicode support
perlunifaq Perl Unicode FAQ
perlunitut Perl Unicode tutorial
perlebcdic Considerations for running Perl on EBCDIC platforms perlsec Perl security perlmod Perl modules: how they work
perlmodlib Perl modules: how to write and use
perlmodstyle Perl modules: how to write modules with style
perlmodinstall Perl modules: how to install from CPAN
perlnewmod Perl modules: preparing a new module for distribution
perlpragma Perl modules: writing a user pragma perlutil utilities packaged with the Perl distribution perlcompile Perl compiler suite intro perlfilter Perl source filters perlglossary Perl GlossaryInternals and C Language Interface
perlembed Perl ways to embed perl in your C or C++ application
perldebguts Perl debugging guts and tips
perlxstut Perl XS tutorial
perlxs Perl XS application programming interface
perlclib Internal replacements for standard C library functions
perlguts Perl internal functions for those doing extensions
perlcall Perl calling conventions from C
perlreapi Perl regular expression plugin interface
perlreguts Perl regular expression engine internals perlapi Perl API listing (autogenerated)
perlintern Perl internal functions (autogenerated)
perliol C API for Perl's implementation of IO in Layers
perlapio Perl internal IO abstraction interface perlhack Perl hackers guideMiscellaneous
perlbook Perl book information
perlcommunity Perl community information
perltodo Perl things to do perldoc Look up Perl documentation in Pod format perlhist Perl history records
perldelta Perl changes since previous version
perl595delta Perl changes in version 5.9.5
perl594delta Perl changes in version 5.9.4
perl593delta Perl changes in version 5.9.3
perl592delta Perl changes in version 5.9.2
perl591delta Perl changes in version 5.9.1
perl590delta Perl changes in version 5.9.0
perl588delta Perl changes in version 5.8.8
perl587delta Perl changes in version 5.8.7
perl586delta Perl changes in version 5.8.6
perl585delta Perl changes in version 5.8.5
perl584delta Perl changes in version 5.8.4
perl583delta Perl changes in version 5.8.3
perl582delta Perl changes in version 5.8.2
perl581delta Perl changes in version 5.8.1
perl58delta Perl changes in version 5.8.0
perl573delta Perl changes in version 5.7.3
perl572delta Perl changes in version 5.7.2
perl571delta Perl changes in version 5.7.1
perl570delta Perl changes in version 5.7.0
perl561delta Perl changes in version 5.6.1
perl56delta Perl changes in version 5.6
perl5005delta Perl changes in version 5.005
perl5004delta Perl changes in version 5.004 perlartistic Perl Artistic License
perlgpl GNU General Public LicenseLanguage-Specific
perlcn Perl for Simplified Chinese (in EUC-CN)
perljp Perl for Japanese (in EUC-JP)
perlko Perl for Korean (in EUC-KR)
perltw Perl for Traditional Chinese (in Big5)Platform-Specific
perlaix Perl notes for AIX
perlamiga Perl notes for AmigaOS
perlapollo Perl notes for Apollo DomainOS
perlbeos Perl notes for BeOS
perlbs2000 Perl notes for POSIX-BC BS2000
perlce Perl notes for WinCE
perlcygwin Perl notes for Cygwin
perldgux Perl notes for DG/UX
perldos Perl notes for DOS
perlepoc Perl notes for EPOC
perlfreebsd Perl notes for FreeBSD
perlhpux Perl notes for HP-UX
perlhurd Perl notes for Hurd
perlirix Perl notes for Irix
perllinux Perl notes for Linux
perlmachten Perl notes for Power MachTen
perlmacos Perl notes for Mac OS (Classic)
perlmacosx Perl notes for Mac OS X
perlmint Perl notes for MiNT
perlmpeix Perl notes for MPE/iX
perlnetware Perl notes for NetWare
perlopenbsd Perl notes for OpenBSD
perlos2 Perl notes for OS/2
perlos390 Perl notes for OS/390
perlos400 Perl notes for OS/400
perlplan9 Perl notes for Plan 9
perlqnx Perl notes for QNX
perlriscos Perl notes for RISC OS
perlsolaris Perl notes for Solaris
perlsymbian Perl notes for Symbian
perltru64 Perl notes for Tru64
perluts Perl notes for UTS
perlvmesa Perl notes for VM/ESA
perlvms Perl notes for VMS
perlvos Perl notes for Stratus VOS
perlwin32 Perl notes for WindowsBy default, the manpages listed above are installed in the /usr/local/man/ directory.
Extensive additional documentation for Perl modules is available. The default configuration for perl will place this additional documentation in the /usr/local/lib/perl5/man directory (or else in the man subdirectory of the Perl library directory). Some of this additional documentation is distributed standard with Perl, but you'll also find documentation for third-party modules there.
You should be able to view Perl's documentation with your man(1) program by including the proper directories in the appropriate start-up files, or in the MANPATH environment variable. To find out where the configuration has installed the manpages, type:
perl -V:man.dirIf the directories have a common stem, such as /usr/local/man/man1 and /usr/local/man/man3, you need only to add that stem (/usr/local/man) to your man(1) configuration files or your MANPATH environment variable. If they do not share a stem, you'll have to add both stems.
If that doesn't work for some reason, you can still use the supplied perldoc script to view module information. You might also look into getting a replacement man program.
If something strange has gone wrong with your program and you're not sure where you should look for help, try the -w switch first. It will often point out exactly where the trouble is.
BUGS
Perl is at the mercy of your machine's definitions of various operations such as type casting, atof(), and floating-point output with sprintf().
If your stdio requires a seek or eof between reads and writes on a particular stream, so does Perl. (This doesn't apply to sysread() and syswrite().)
While none of the built-in data types have any arbitrary size limits (apart from memory size), there are still a few arbitrary limits: a given variable name may not be longer than 251 characters. Line numbers displayed by diagnostics are internally stored as short integers, so they are limited to a maximum of 65535 (higher numbers usually being affected by wraparound).
You may mail your bug reports (be sure to include full configuration information as output by the myconfig program in the perl source tree, or by perl -V ) to perlbug@perl.org . If you've succeeded in compiling perl, the perlbug script in the utils/ subdirectory can be used to help mail in a bug report
Installation of PHP on Unix systems
There are several ways to install PHP for the Unix platform, either with a compile and configure process, or through various pre-packaged methods. This documentation is mainly focused around the process of compiling and configuring PHP. Many Unix like systems have some sort of package installation system. This can assist in setting up a standard configuration, but if you need to have a different set of features (such as a secure server, or a different database driver), you may need to build PHP and/or your web server. If you are unfamiliar with building and compiling your own software, it is worth checking to see whether somebody has already built a packaged version of PHP with the features you need.
Prerequisite knowledge and software for compiling:
Basic Unix skills (being able to operate "make" and a C compiler)
An ANSI C compiler
flex: Version 2.5.4
bison: Version 1.28 (preferred), 1.35, or 1.75
A web server
Any module specific components (such as gd, pdf libs, etc.)
The initial PHP setup and configuration process is controlled by the use of the command line options of the configure script. You could get a list of all available options along with short explanations running ./configure --help. Our manual documents the different options separately. You will find the core options in the appendix, while the different extension specific options are descibed on the reference pages.
When PHP is configured, you are ready to build the module and/or executables. The command make should take care of this. If it fails and you can't figure out why, see the Problems section.
Apache 1.3.x on Unix systems
This section contains notes and hints specific to Apache installs of PHP on Unix platforms. We also have instructions and notes for Apache 2 on a separate page.
You can select arguments to add to the configure on line 10 below from the list of core configure options and from extension specific options described at the respective places in the manual. The version numbers have been omitted here, to ensure the instructions are not incorrect. You will need to replace the 'xxx' here with the correct values from your files.
Example#1 Installation Instructions (Apache Shared Module Version) for PHP
1. gunzip apache_xxx.tar.gz
2. tar -xvf apache_xxx.tar
3. gunzip php-xxx.tar.gz
4. tar -xvf php-xxx.tar
5. cd apache_xxx
6. ./configure --prefix=/www --enable-module=so
7. make
8. make install
9. cd ../php-xxx
10. Now, configure your PHP. This is where you customize your PHP
with various options, like which extensions will be enabled. Do a
./configure --help for a list of available options. In our example
we'll do a simple configure with Apache 1 and MySQL support. Your
path to apxs may differ from our example.
./configure --with-mysql --with-apxs=/www/bin/apxs
11. make
12. make install
If you decide to change your configure options after installation,
you only need to repeat the last three steps. You only need to
restart apache for the new module to take effect. A recompile of
Apache is not needed.
Note that unless told otherwise, 'make install' will also install PEAR,
various PHP tools such as phpize, install the PHP CLI, and more.
13. Setup your php.ini file:
cp php.ini-dist /usr/local/lib/php.ini
You may edit your .ini file to set PHP options. If you prefer your
php.ini in another location, use --with-config-file-path=/some/path in
step 10.
If you instead choose php.ini-recommended, be certain to read the list
of changes within, as they affect how PHP behaves.
14. Edit your httpd.conf to load the PHP module. The path on the right hand
side of the LoadModule statement must point to the path of the PHP
module on your system. The make install from above may have already
added this for you, but be sure to check.
For PHP 4:
LoadModule php4_module libexec/libphp4.so
For PHP 5:
LoadModule php5_module libexec/libphp5.so
15. And in the AddModule section of httpd.conf, somewhere under the
ClearModuleList, add this:
For PHP 4:
AddModule mod_php4.c
For PHP 5:
AddModule mod_php5.c
16. Tell Apache to parse certain extensions as PHP. For example,
let's have Apache parse the .php extension as PHP. You could
have any extension(s) parse as PHP by simply adding more, with
each separated by a space. We'll add .phtml to demonstrate.
AddType application/x-httpd-php .php .phtml
It's also common to setup the .phps extension to show highlighted PHP
source, this can be done with:
AddType application/x-httpd-php-source .phps
17. Use your normal procedure for starting the Apache server. (You must
stop and restart the server, not just cause the server to reload by
using a HUP or USR1 signal.)
Alternatively, to install PHP as a static object:
Example#2 Installation Instructions (Static Module Installation for Apache) for PHP
1. gunzip -c apache_1.3.x.tar.gz | tar xf -
2. cd apache_1.3.x
3. ./configure
4. cd ..
5. gunzip -c php-5.x.y.tar.gz | tar xf -
6. cd php-5.x.y
7. ./configure --with-mysql --with-apache=../apache_1.3.x
8. make
9. make install
10. cd ../apache_1.3.x
11. ./configure --prefix=/www --activate-module=src/modules/php5/libphp5.a
(The above line is correct! Yes, we know libphp5.a does not exist at this
stage. It isn't supposed to. It will be created.)
12. make
(you should now have an httpd binary which you can copy to your Apache bin dir if
it is your first install then you need to "make install" as well)
13. cd ../php-5.x.y
14. cp php.ini-dist /usr/local/lib/php.ini
15. You can edit /usr/local/lib/php.ini file to set PHP options.
Edit your httpd.conf or srm.conf file and add:
AddType application/x-httpd-php .php
Note: Replace php-5 by php-4 and php5 by php4 in PHP 4.
Depending on your Apache install and Unix variant, there are many possible ways to stop and restart the server. Below are some typical lines used in restarting the server, for different apache/unix installations. You should replace /path/to/ with the path to these applications on your systems.
Example#3 Example commands for restarting Apache
1. Several Linux and SysV variants:
/etc/rc.d/init.d/httpd restart
2. Using apachectl scripts:
/path/to/apachectl stop
/path/to/apachectl start
3. httpdctl and httpsdctl (Using OpenSSL), similar to apachectl:
/path/to/httpsdctl stop
/path/to/httpsdctl start
4. Using mod_ssl, or another SSL server, you may want to manually
stop and start:
/path/to/apachectl stop
/path/to/apachectl startssl
The locations of the apachectl and http(s)dctl binaries often vary. If your system has locate or whereis or which commands, these can assist you in finding your server control programs.
Different examples of compiling PHP for apache are as follows:
./configure --with-apxs --with-pgsql
This will create a libphp5.so (or libphp4.so in PHP 4) shared library that is loaded into Apache using a LoadModule line in Apache's httpd.conf file. The PostgreSQL support is embedded into this library.
./configure --with-apxs --with-pgsql=shared
This will create a libphp4.so shared library for Apache, but it will also create a pgsql.so shared library that is loaded into PHP either by using the extension directive in php.ini file or by loading it explicitly in a script using the dl() function.
./configure --with-apache=/path/to/apache_source --with-pgsql
This will create a libmodphp5.a library, a mod_php5.c and some accompanying files and copy this into the src/modules/php5 directory in the Apache source tree. Then you compile Apache using --activate-module=src/modules/php5/libphp5.a and the Apache build system will create libphp5.a and link it statically into the httpd binary (replace php5 by php4 in PHP 4). The PostgreSQL support is included directly into this httpd binary, so the final result here is a single httpd binary that includes all of Apache and all of PHP.
./configure --with-apache=/path/to/apache_source --with-pgsql=shared
Same as before, except instead of including PostgreSQL support directly into the final httpd you will get a pgsql.so shared library that you can load into PHP from either the php.ini file or directly using dl().
When choosing to build PHP in different ways, you should consider the advantages and drawbacks of each method. Building as a shared object will mean that you can compile apache separately, and don't have to recompile everything as you add to, or change, PHP. Building PHP into apache (static method) means that PHP will load and run faster.
Prerequisite knowledge and software for compiling:
Basic Unix skills (being able to operate "make" and a C compiler)
An ANSI C compiler
flex: Version 2.5.4
bison: Version 1.28 (preferred), 1.35, or 1.75
A web server
Any module specific components (such as gd, pdf libs, etc.)
The initial PHP setup and configuration process is controlled by the use of the command line options of the configure script. You could get a list of all available options along with short explanations running ./configure --help. Our manual documents the different options separately. You will find the core options in the appendix, while the different extension specific options are descibed on the reference pages.
When PHP is configured, you are ready to build the module and/or executables. The command make should take care of this. If it fails and you can't figure out why, see the Problems section.
Apache 1.3.x on Unix systems
This section contains notes and hints specific to Apache installs of PHP on Unix platforms. We also have instructions and notes for Apache 2 on a separate page.
You can select arguments to add to the configure on line 10 below from the list of core configure options and from extension specific options described at the respective places in the manual. The version numbers have been omitted here, to ensure the instructions are not incorrect. You will need to replace the 'xxx' here with the correct values from your files.
Example#1 Installation Instructions (Apache Shared Module Version) for PHP
1. gunzip apache_xxx.tar.gz
2. tar -xvf apache_xxx.tar
3. gunzip php-xxx.tar.gz
4. tar -xvf php-xxx.tar
5. cd apache_xxx
6. ./configure --prefix=/www --enable-module=so
7. make
8. make install
9. cd ../php-xxx
10. Now, configure your PHP. This is where you customize your PHP
with various options, like which extensions will be enabled. Do a
./configure --help for a list of available options. In our example
we'll do a simple configure with Apache 1 and MySQL support. Your
path to apxs may differ from our example.
./configure --with-mysql --with-apxs=/www/bin/apxs
11. make
12. make install
If you decide to change your configure options after installation,
you only need to repeat the last three steps. You only need to
restart apache for the new module to take effect. A recompile of
Apache is not needed.
Note that unless told otherwise, 'make install' will also install PEAR,
various PHP tools such as phpize, install the PHP CLI, and more.
13. Setup your php.ini file:
cp php.ini-dist /usr/local/lib/php.ini
You may edit your .ini file to set PHP options. If you prefer your
php.ini in another location, use --with-config-file-path=/some/path in
step 10.
If you instead choose php.ini-recommended, be certain to read the list
of changes within, as they affect how PHP behaves.
14. Edit your httpd.conf to load the PHP module. The path on the right hand
side of the LoadModule statement must point to the path of the PHP
module on your system. The make install from above may have already
added this for you, but be sure to check.
For PHP 4:
LoadModule php4_module libexec/libphp4.so
For PHP 5:
LoadModule php5_module libexec/libphp5.so
15. And in the AddModule section of httpd.conf, somewhere under the
ClearModuleList, add this:
For PHP 4:
AddModule mod_php4.c
For PHP 5:
AddModule mod_php5.c
16. Tell Apache to parse certain extensions as PHP. For example,
let's have Apache parse the .php extension as PHP. You could
have any extension(s) parse as PHP by simply adding more, with
each separated by a space. We'll add .phtml to demonstrate.
AddType application/x-httpd-php .php .phtml
It's also common to setup the .phps extension to show highlighted PHP
source, this can be done with:
AddType application/x-httpd-php-source .phps
17. Use your normal procedure for starting the Apache server. (You must
stop and restart the server, not just cause the server to reload by
using a HUP or USR1 signal.)
Alternatively, to install PHP as a static object:
Example#2 Installation Instructions (Static Module Installation for Apache) for PHP
1. gunzip -c apache_1.3.x.tar.gz | tar xf -
2. cd apache_1.3.x
3. ./configure
4. cd ..
5. gunzip -c php-5.x.y.tar.gz | tar xf -
6. cd php-5.x.y
7. ./configure --with-mysql --with-apache=../apache_1.3.x
8. make
9. make install
10. cd ../apache_1.3.x
11. ./configure --prefix=/www --activate-module=src/modules/php5/libphp5.a
(The above line is correct! Yes, we know libphp5.a does not exist at this
stage. It isn't supposed to. It will be created.)
12. make
(you should now have an httpd binary which you can copy to your Apache bin dir if
it is your first install then you need to "make install" as well)
13. cd ../php-5.x.y
14. cp php.ini-dist /usr/local/lib/php.ini
15. You can edit /usr/local/lib/php.ini file to set PHP options.
Edit your httpd.conf or srm.conf file and add:
AddType application/x-httpd-php .php
Note: Replace php-5 by php-4 and php5 by php4 in PHP 4.
Depending on your Apache install and Unix variant, there are many possible ways to stop and restart the server. Below are some typical lines used in restarting the server, for different apache/unix installations. You should replace /path/to/ with the path to these applications on your systems.
Example#3 Example commands for restarting Apache
1. Several Linux and SysV variants:
/etc/rc.d/init.d/httpd restart
2. Using apachectl scripts:
/path/to/apachectl stop
/path/to/apachectl start
3. httpdctl and httpsdctl (Using OpenSSL), similar to apachectl:
/path/to/httpsdctl stop
/path/to/httpsdctl start
4. Using mod_ssl, or another SSL server, you may want to manually
stop and start:
/path/to/apachectl stop
/path/to/apachectl startssl
The locations of the apachectl and http(s)dctl binaries often vary. If your system has locate or whereis or which commands, these can assist you in finding your server control programs.
Different examples of compiling PHP for apache are as follows:
./configure --with-apxs --with-pgsql
This will create a libphp5.so (or libphp4.so in PHP 4) shared library that is loaded into Apache using a LoadModule line in Apache's httpd.conf file. The PostgreSQL support is embedded into this library.
./configure --with-apxs --with-pgsql=shared
This will create a libphp4.so shared library for Apache, but it will also create a pgsql.so shared library that is loaded into PHP either by using the extension directive in php.ini file or by loading it explicitly in a script using the dl() function.
./configure --with-apache=/path/to/apache_source --with-pgsql
This will create a libmodphp5.a library, a mod_php5.c and some accompanying files and copy this into the src/modules/php5 directory in the Apache source tree. Then you compile Apache using --activate-module=src/modules/php5/libphp5.a and the Apache build system will create libphp5.a and link it statically into the httpd binary (replace php5 by php4 in PHP 4). The PostgreSQL support is included directly into this httpd binary, so the final result here is a single httpd binary that includes all of Apache and all of PHP.
./configure --with-apache=/path/to/apache_source --with-pgsql=shared
Same as before, except instead of including PostgreSQL support directly into the final httpd you will get a pgsql.so shared library that you can load into PHP from either the php.ini file or directly using dl().
When choosing to build PHP in different ways, you should consider the advantages and drawbacks of each method. Building as a shared object will mean that you can compile apache separately, and don't have to recompile everything as you add to, or change, PHP. Building PHP into apache (static method) means that PHP will load and run faster.
PHP
What is PHP?
PHP is a widely-used general-purpose scripting language that is especially suited for Web development and can be embedded into HTML
Simple answer, but what does that mean? An example:
Example#1 An introductory example
Example
echo "Hi, I'm a PHP script!";
?>
Notice how this is different from a script written in other languages like Perl or C -- instead of writing a program with lots of commands to output HTML, you write an HTML script with some embedded code to do something (in this case, output some text). The PHP code is enclosed in special start and end tags that allow you to jump into and out of "PHP mode".
What distinguishes PHP from something like client-side JavaScript is that the code is executed on the server. If you were to have a script similar to the above on your server, the client would receive the results of running that script, with no way of determining what the underlying code may be. You can even configure your web server to process all your HTML files with PHP, and then there's really no way that users can tell what you have up your sleeve.
The best things in using PHP are that it is extremely simple for a newcomer, but offers many advanced features for a professional programmer. Don't be afraid reading the long list of PHP's features. You can jump in, in a short time, and start writing simple scripts in a few hours.
PHP is a widely-used general-purpose scripting language that is especially suited for Web development and can be embedded into HTML
Simple answer, but what does that mean? An example:
Example#1 An introductory example
echo "Hi, I'm a PHP script!";
?>
Notice how this is different from a script written in other languages like Perl or C -- instead of writing a program with lots of commands to output HTML, you write an HTML script with some embedded code to do something (in this case, output some text). The PHP code is enclosed in special start and end tags that allow you to jump into and out of "PHP mode".
What distinguishes PHP from something like client-side JavaScript is that the code is executed on the server. If you were to have a script similar to the above on your server, the client would receive the results of running that script, with no way of determining what the underlying code may be. You can even configure your web server to process all your HTML files with PHP, and then there's really no way that users can tell what you have up your sleeve.
The best things in using PHP are that it is extremely simple for a newcomer, but offers many advanced features for a professional programmer. Don't be afraid reading the long list of PHP's features. You can jump in, in a short time, and start writing simple scripts in a few hours.
Installing SquirrelMail on Unix and Linux systems
This chapter covers installation of SquirrelMail on generic Unix or Linux system. It does not cover installation of operating system or tools required to install web server or PHP.
Any version numbers used in examples are specific to the time when this documentation is written. If current version numbers differ, make sure that you are not using old, obsolete or vulnerable software.
Guide uses UW IMAP server as example. This IMAP server can be used in generic email setup when incoming mail is stored in /var/spool/mail directory. If you are planning to use webmail with big number of users or with bigger mailboxes, consider using different IMAP server and redesign entire email system.
Download required software
You will need:
Apache - http://httpd.apache.org/download.cgi
PHP - http://php.net/downloads.php
UW IMAP - http://www.washington.edu/imap/
SquirrelMail - http://squirrelmail.org/download.php
# install -d /usr/local/src/downloads
# cd /usr/local/src/downloads
# wget http://some-apache-mirror-server/apache/httpd/httpd-2.0.54.tar.gz
# wget http://some-php-mirror-server/get/php-4.3.11.tar.bz2/from/this/mirror
# wget ftp://ftp.cac.washington.edu/mail/imap.tar.Z
# wget http://some-sourceforge-mirror/some-path/squirrelmail-1.4.5.tar.bz2
Unpack and install apache
# cd /usr/local/src
# tar -xzvf /usr/local/src/downloads/httpd-2.0.54.tar.gz
# cd httpd-2.0.54
# ./configure --prefix=/usr/local/apache --enable-module=so
# make
# make install
Unpack and install php
# cd /usr/local/src
# tar --bzip2 -xvf /usr/local/src/downloads/php-4.3.11.tar.bz2
# cd php-4.3.11
# ./configure --prefix=/usr/local/php \
> --with-apxs2=/usr/local/apache/bin/apxs
# make
# make install
If you configure PHP compilation with --disable-all option, you must add --enable-session and --with-pcre-regex options.
Add PHP support to apache
AddType application/x-httpd-php .php
Restart apache and check if php is working
/usr/local/apache/bin/apachectl graceful
Unpack and install imap server
Unpack UW IMAP archive
# cd /usr/local/src
# tar -xzvf /usr/local/src/downloads/imap.tar.Z
Compile UW IMAP
cd /usr/local/src/imap-
make port-name EXTRADRIVERS='' SSLTYPE=unix
Replace port-name with name that matches your system. Check Makefile for possible values. If you haven't installed OpenSSL libraries and headers, use SSLTYPE=none instead of SSLTYPE=unix.
Install IMAP server binary
strip imapd/imapd
install -d /usr/local/libexec/
cp imapd/imapd /usr/local/libexec/
Enable IMAP server in inetd.conf
imap2 stream tcp nowait root /usr/sbin/tcpd /usr/local/libexec/imapd
Restart inetd
Prepare SquirrelMail directories
# mkdir /usr/local/squirrelmail
# cd /usr/local/squirrelmail
# mkdir data temp
# chgrp nogroup data temp
# chmod 0730 data temp
Unpack SquirrelMail
# cd /usr/local/squirrelmail
# tar --bzip2 -xvf /usr/local/src/downloads/squirrelmail-1.4.5.tar.bz2
# mv squirrelmail-1.4.5 www
Configure SquirrelMail
Start SquirrelMail configuration utility. Configure SquirrelMail with UW preset. Set data and attachment directories.
Configure access to SquirrelMail in Apache
Modify httpd.conf
Alias /squirrelmail /usr/local/squirrelmail/www
Options Indexes
AllowOverride none
DirectoryIndex index.php
Order allow,deny
allow from all
Log into SquirrelMail
After you add alias to SquirrelMail in apache configuration and restart apache, you should be able to access SquirrelMail by going to http://your-server/squirrelmail
Any version numbers used in examples are specific to the time when this documentation is written. If current version numbers differ, make sure that you are not using old, obsolete or vulnerable software.
Guide uses UW IMAP server as example. This IMAP server can be used in generic email setup when incoming mail is stored in /var/spool/mail directory. If you are planning to use webmail with big number of users or with bigger mailboxes, consider using different IMAP server and redesign entire email system.
Download required software
You will need:
Apache - http://httpd.apache.org/download.cgi
PHP - http://php.net/downloads.php
UW IMAP - http://www.washington.edu/imap/
SquirrelMail - http://squirrelmail.org/download.php
# install -d /usr/local/src/downloads
# cd /usr/local/src/downloads
# wget http://some-apache-mirror-server/apache/httpd/httpd-2.0.54.tar.gz
# wget http://some-php-mirror-server/get/php-4.3.11.tar.bz2/from/this/mirror
# wget ftp://ftp.cac.washington.edu/mail/imap.tar.Z
# wget http://some-sourceforge-mirror/some-path/squirrelmail-1.4.5.tar.bz2
Unpack and install apache
# cd /usr/local/src
# tar -xzvf /usr/local/src/downloads/httpd-2.0.54.tar.gz
# cd httpd-2.0.54
# ./configure --prefix=/usr/local/apache --enable-module=so
# make
# make install
Unpack and install php
# cd /usr/local/src
# tar --bzip2 -xvf /usr/local/src/downloads/php-4.3.11.tar.bz2
# cd php-4.3.11
# ./configure --prefix=/usr/local/php \
> --with-apxs2=/usr/local/apache/bin/apxs
# make
# make install
If you configure PHP compilation with --disable-all option, you must add --enable-session and --with-pcre-regex options.
Add PHP support to apache
AddType application/x-httpd-php .php
Restart apache and check if php is working
/usr/local/apache/bin/apachectl graceful
Unpack and install imap server
Unpack UW IMAP archive
# cd /usr/local/src
# tar -xzvf /usr/local/src/downloads/imap.tar.Z
Compile UW IMAP
cd /usr/local/src/imap-
make port-name EXTRADRIVERS='' SSLTYPE=unix
Replace port-name with name that matches your system. Check Makefile for possible values. If you haven't installed OpenSSL libraries and headers, use SSLTYPE=none instead of SSLTYPE=unix.
Install IMAP server binary
strip imapd/imapd
install -d /usr/local/libexec/
cp imapd/imapd /usr/local/libexec/
Enable IMAP server in inetd.conf
imap2 stream tcp nowait root /usr/sbin/tcpd /usr/local/libexec/imapd
Restart inetd
Prepare SquirrelMail directories
# mkdir /usr/local/squirrelmail
# cd /usr/local/squirrelmail
# mkdir data temp
# chgrp nogroup data temp
# chmod 0730 data temp
Unpack SquirrelMail
# cd /usr/local/squirrelmail
# tar --bzip2 -xvf /usr/local/src/downloads/squirrelmail-1.4.5.tar.bz2
# mv squirrelmail-1.4.5 www
Configure SquirrelMail
Start SquirrelMail configuration utility. Configure SquirrelMail with UW preset. Set data and attachment directories.
Configure access to SquirrelMail in Apache
Modify httpd.conf
Alias /squirrelmail /usr/local/squirrelmail/www
Options Indexes
AllowOverride none
DirectoryIndex index.php
Order allow,deny
allow from all
Log into SquirrelMail
After you add alias to SquirrelMail in apache configuration and restart apache, you should be able to access SquirrelMail by going to http://your-server/squirrelmail
SquirrelMail
SquirrelMail is a standards-based webmail package written in PHP. It includes built-in pure PHP support for the IMAP and SMTP protocols, and all pages render in pure HTML 4.0 (with no JavaScript required) for maximum compatibility across browsers. It has few requirements and is easy to configure and install. SquirrelMail has all the functionality you would want from an email client, including strong MIME support, address books, and folder manipulation.
This manual supports SquirrelMail 1.4.0 and up. The 1.2.x series has been obsoleted, and is only referenced in the upgrading notes of this manual.
There are only two requirements for SquirrelMail:
A web server with PHP installed. PHP needs to be at least 4.1.0.
Access to an IMAP server which supports IMAP 4 rev 1.
It doesn't really matter what OS or web server you use, as long as the combination thereof supports PHP in a stable way. Read the instructions and suggestions in the PHP documentation to see what they recommend.
If you're building your mail system from scratch, it might be a good idea to install and test all components one by one. If you install everything at once and things don't work, the troubleshoting will be more complex. If the web server doesn't work there's not much point in trying to install PHP, for instance. Make sure that everything is working before trying to install SquirrelMail.
Choosing an IMAP server
You don't actually have to run an IMAP server yourself, but you need to be able to connect to one for SquirrelMail to work. Since IMAP is an open standard, all IMAP products should be able to communicate with each other. SquirrelMail requires that the server supports IMAP 4 rev 1, but that's the only requirement there is.
Some IMAP servers support various extensions, which are developed as a complement to IMAP. Those extensions aren't required by SquirrelMail, but many of them are supported. It's recommended to have an IMAP server that supports SORT and THREAD if possible. The SORT extension allows for server side sorting, which is a lot more efficient than having to rely on PHP for sorting. This will improve SquirrelMail's performance. If the server doesn't support the THREAD extension, SquirrelMail can't show mail conversation as threads.
If possible, the IMAP server should support Unicode. Without it some translations might be unable to use sorting and threading. Courier IMAP must be compiled with the --enable-unicode option to have Unicode support.
SquirrelMail doesn't care about how the server stores the mails, but it's generally a good idea not to have an IMAP server that store mails in the mailbox (mbox) format. Mailbox performance is low when there are many mails in the same folder and it doesn't allow both mails and subfolders at the same time in the same folder.
Another good idea is to have an IMAP server that allows the use of virtual accounts. Virtual users don't have to be system users, which usually is a good thing. Again, this is not a SquirrelMail requirement, but something that you might want to consider when choosing an IMAP server.
Some systems are delivered with an IMAP server, but if it doesn't measure up to the suggestions above, you might want to replace it. There are plenty of IMAP servers at the market, so it might be difficult to decide which one to choose. It is also difficult to recommend something, since every organization has unique demands. The IMAP Connection has a searchable database of IMAP servers, as well as more information about IMAP, but that list may not cover the entire market. There are also several sites offering advice and opinions on this matter. Read them, but make your own decision since the information at some of those sites might be outdated or biased. Remember that some of the open source alternatives are well matured products that can compete, and even surpass, the commercial servers.
Configuring PHP
Without the PHP gettext extension you lose in performance.
The PHP mbstring extension is required for translations that use multibyte or character sets but ISO-8859-1. Without the PHP mbstring extension the interface will remain usable, but some internationalization features and fixes won't be enabled. It's a must if you want to read and write Japanese emails, and users who whish to do that must also set their language option to Japanese.
The PHP XML extension is required if the DIGEST-MD5 authentication is used.
1.2 Optional server programs
Perl. SquirrelMail is shipped with some Perl scripts. One of the most useful is config/conf.pl, which will help you configure your SquirrelMail installation.
An SQL database supported by the PEAR DB library, and the PEAR DB library itself. See Using database backends for more information.
Aspell or Ispell to be able to use the SquirrelSpell plugin.
These are not a must have, since SquirrelMail will function without them, but they are adding to the experience so you might want to consider them.
Directory layout in squirrelMail
SquirrelMail files are split into subdirectories according to file type and provided functions.
squirrelmail/
class/
config/
contrib/
data/
doc/
functions/
decode/
encode/
help/
images/
locale/
plugins/
po/
src/
templates/
themes/
css/
class directory stores various classes used with mime messages, email delivery, localizations and other interface functions.
config directory stores SquirrelMail configuration files and configuration utility. conf.pl script is a perl based utility used to manage SquirrelMail configuration. The config_default.php file stores default configuration values. The config.php file stores current configuration. The file config_local.php can store local site configuration overrides and configuration options that are not supported by configuration utility. default_pref stores default user preferences that are used when a new user logs in for the first time. default_pref file was stored in the data directory before SquirrelMail 1.5.1.
The contrib directory stores files that provide extra features to SquirrelMail package, but are not used directly in the webmail interface.
The data directory is default location for SquirrelMail users' preference files. You should move that directory outside of web tree or make sure that it can't be accessed by external users. This directory is not packaged anymore since SquirrelMail 1.5.1.
The doc directory stores some documentation about SquirrelMail.
functions directory stores SquirrelMail function files. The decode subdirectory stores charset decoding functions that are used to read emails encoded in different charsets. The encode directory stores charset encoding functions that are used to convert emails to charset used in interface when user replies or forwards email written in different charset.
Under help are SquirrelMail help files. Information from these files is displayed when a user clicks on Help link in SquirrelMail menu line. Help files use XML formating. They can be translated into different languages.
The images directory stores various image files that can be used in interface.
The locale directory stores SquirrelMail translations. A user can select their preferred translation in SquirrelMail Display Options.
The plugins directory stores plugins that can be used to extend SquirrelMail functionality. Activation of plugins is controlled through the SquirrelMail configuration utility. Some plugins might also use their own configuration files or functions provided by other plugins. See README and INSTALL files in each plugin's directory.
The po directory stores scripts that are used to work with SquirrelMail translation files. xgetpo script extracts translatable strings from SquirrelMail script. mergepo script combines default strings with selected translation. compilepo script compiles selected translation. These scripts are usually used only by SquirrelMail translators.
The src directory stores scripts that are used when user accesses the webmail interface.
The templates directory stores template files that can be used in SquirrelMail 1.5.1 and later versions.
The themes directory stores SquirrelMail colour themes, and the css subdirectory stores style sheet files available to end user.
User data storage
SquirrelMail stores users' preferences and address books in simple text files. The location of these files is set with the data directory setting in the SquirrelMail configuration. SquirrelMail can also use a database or some other storage facility (if the required backend is provided by a plugin) for managing user preferences.
Users' preferences are stored in .pref files. Address books are stored in .abook files. .sig and .si files store users' signatures. Some plugins might use other files to store users' data.
When the number of files in the data directory becomes somewhat large, directory access time can be affected. In such cases, the administrator can split preference files into subdirectories by enabling directory hashing in the SquirrelMail configuration.
Configuration utility
SquirrelMail can be configured with conf.pl, a Perl script that is stored in the config/ directory. You can start it by running the configure script in the SquirrelMail base directory or by running the conf.pl script in the config directory.
# cd /path/to/squirrelmail
# cd config
# ./conf.pl
This configuration utility provides menu based configuration options:
SquirrelMail Configuration : Read: config_default.php (1.4.0)
---------------------------------------------------------
Main Menu --
1. Organization Preferences
2. Server Settings
3. Folder Defaults
4. General Options
5. Themes
6. Address Books
7. Message of the Day (MOTD)
8. Plugins
9. Database
10. Languages
D. Set pre-defined settings for specific IMAP servers
C Turn color on
S Save data
Q Quit
Command >>
Menu is controlled by entering numbers or letters that are listed on the left side.
The address book format
By default SquirrelMail stores address books in files, one per address book, named [user account name].abook. These address book files are kept in the data directory. Address books can also be stored in a database or, if the required functions are provided by a plugin, another storage facility. SquirrelMail can also be configured to lookup addresses in LDAP directories, if the PHP installation contains LDAP support.
An address book file contains five fields, which are delimited by the vertical line (|): the first field stores nicknames, short names that are used to identify address book entries; the second field stores names; the third field stores surnames; the forth field stores mail addresses; and the fifth field stores additional information.
Additional address book fields and functions can be provided by the experimental vcard address book format and some address book plugins. You can find list of address book plugins at the SquirrelMail site.
This manual supports SquirrelMail 1.4.0 and up. The 1.2.x series has been obsoleted, and is only referenced in the upgrading notes of this manual.
There are only two requirements for SquirrelMail:
A web server with PHP installed. PHP needs to be at least 4.1.0.
Access to an IMAP server which supports IMAP 4 rev 1.
It doesn't really matter what OS or web server you use, as long as the combination thereof supports PHP in a stable way. Read the instructions and suggestions in the PHP documentation to see what they recommend.
If you're building your mail system from scratch, it might be a good idea to install and test all components one by one. If you install everything at once and things don't work, the troubleshoting will be more complex. If the web server doesn't work there's not much point in trying to install PHP, for instance. Make sure that everything is working before trying to install SquirrelMail.
Choosing an IMAP server
You don't actually have to run an IMAP server yourself, but you need to be able to connect to one for SquirrelMail to work. Since IMAP is an open standard, all IMAP products should be able to communicate with each other. SquirrelMail requires that the server supports IMAP 4 rev 1, but that's the only requirement there is.
Some IMAP servers support various extensions, which are developed as a complement to IMAP. Those extensions aren't required by SquirrelMail, but many of them are supported. It's recommended to have an IMAP server that supports SORT and THREAD if possible. The SORT extension allows for server side sorting, which is a lot more efficient than having to rely on PHP for sorting. This will improve SquirrelMail's performance. If the server doesn't support the THREAD extension, SquirrelMail can't show mail conversation as threads.
If possible, the IMAP server should support Unicode. Without it some translations might be unable to use sorting and threading. Courier IMAP must be compiled with the --enable-unicode option to have Unicode support.
SquirrelMail doesn't care about how the server stores the mails, but it's generally a good idea not to have an IMAP server that store mails in the mailbox (mbox) format. Mailbox performance is low when there are many mails in the same folder and it doesn't allow both mails and subfolders at the same time in the same folder.
Another good idea is to have an IMAP server that allows the use of virtual accounts. Virtual users don't have to be system users, which usually is a good thing. Again, this is not a SquirrelMail requirement, but something that you might want to consider when choosing an IMAP server.
Some systems are delivered with an IMAP server, but if it doesn't measure up to the suggestions above, you might want to replace it. There are plenty of IMAP servers at the market, so it might be difficult to decide which one to choose. It is also difficult to recommend something, since every organization has unique demands. The IMAP Connection has a searchable database of IMAP servers, as well as more information about IMAP, but that list may not cover the entire market. There are also several sites offering advice and opinions on this matter. Read them, but make your own decision since the information at some of those sites might be outdated or biased. Remember that some of the open source alternatives are well matured products that can compete, and even surpass, the commercial servers.
Configuring PHP
Without the PHP gettext extension you lose in performance.
The PHP mbstring extension is required for translations that use multibyte or character sets but ISO-8859-1. Without the PHP mbstring extension the interface will remain usable, but some internationalization features and fixes won't be enabled. It's a must if you want to read and write Japanese emails, and users who whish to do that must also set their language option to Japanese.
The PHP XML extension is required if the DIGEST-MD5 authentication is used.
1.2 Optional server programs
Perl. SquirrelMail is shipped with some Perl scripts. One of the most useful is config/conf.pl, which will help you configure your SquirrelMail installation.
An SQL database supported by the PEAR DB library, and the PEAR DB library itself. See Using database backends for more information.
Aspell or Ispell to be able to use the SquirrelSpell plugin.
These are not a must have, since SquirrelMail will function without them, but they are adding to the experience so you might want to consider them.
Directory layout in squirrelMail
SquirrelMail files are split into subdirectories according to file type and provided functions.
squirrelmail/
class/
config/
contrib/
data/
doc/
functions/
decode/
encode/
help/
images/
locale/
plugins/
po/
src/
templates/
themes/
css/
class directory stores various classes used with mime messages, email delivery, localizations and other interface functions.
config directory stores SquirrelMail configuration files and configuration utility. conf.pl script is a perl based utility used to manage SquirrelMail configuration. The config_default.php file stores default configuration values. The config.php file stores current configuration. The file config_local.php can store local site configuration overrides and configuration options that are not supported by configuration utility. default_pref stores default user preferences that are used when a new user logs in for the first time. default_pref file was stored in the data directory before SquirrelMail 1.5.1.
The contrib directory stores files that provide extra features to SquirrelMail package, but are not used directly in the webmail interface.
The data directory is default location for SquirrelMail users' preference files. You should move that directory outside of web tree or make sure that it can't be accessed by external users. This directory is not packaged anymore since SquirrelMail 1.5.1.
The doc directory stores some documentation about SquirrelMail.
functions directory stores SquirrelMail function files. The decode subdirectory stores charset decoding functions that are used to read emails encoded in different charsets. The encode directory stores charset encoding functions that are used to convert emails to charset used in interface when user replies or forwards email written in different charset.
Under help are SquirrelMail help files. Information from these files is displayed when a user clicks on Help link in SquirrelMail menu line. Help files use XML formating. They can be translated into different languages.
The images directory stores various image files that can be used in interface.
The locale directory stores SquirrelMail translations. A user can select their preferred translation in SquirrelMail Display Options.
The plugins directory stores plugins that can be used to extend SquirrelMail functionality. Activation of plugins is controlled through the SquirrelMail configuration utility. Some plugins might also use their own configuration files or functions provided by other plugins. See README and INSTALL files in each plugin's directory.
The po directory stores scripts that are used to work with SquirrelMail translation files. xgetpo script extracts translatable strings from SquirrelMail script. mergepo script combines default strings with selected translation. compilepo script compiles selected translation. These scripts are usually used only by SquirrelMail translators.
The src directory stores scripts that are used when user accesses the webmail interface.
The templates directory stores template files that can be used in SquirrelMail 1.5.1 and later versions.
The themes directory stores SquirrelMail colour themes, and the css subdirectory stores style sheet files available to end user.
User data storage
SquirrelMail stores users' preferences and address books in simple text files. The location of these files is set with the data directory setting in the SquirrelMail configuration. SquirrelMail can also use a database or some other storage facility (if the required backend is provided by a plugin) for managing user preferences.
Users' preferences are stored in .pref files. Address books are stored in .abook files. .sig and .si
When the number of files in the data directory becomes somewhat large, directory access time can be affected. In such cases, the administrator can split preference files into subdirectories by enabling directory hashing in the SquirrelMail configuration.
Configuration utility
SquirrelMail can be configured with conf.pl, a Perl script that is stored in the config/ directory. You can start it by running the configure script in the SquirrelMail base directory or by running the conf.pl script in the config directory.
# cd /path/to/squirrelmail
# cd config
# ./conf.pl
This configuration utility provides menu based configuration options:
SquirrelMail Configuration : Read: config_default.php (1.4.0)
---------------------------------------------------------
Main Menu --
1. Organization Preferences
2. Server Settings
3. Folder Defaults
4. General Options
5. Themes
6. Address Books
7. Message of the Day (MOTD)
8. Plugins
9. Database
10. Languages
D. Set pre-defined settings for specific IMAP servers
C Turn color on
S Save data
Q Quit
Command >>
Menu is controlled by entering numbers or letters that are listed on the left side.
The address book format
By default SquirrelMail stores address books in files, one per address book, named [user account name].abook. These address book files are kept in the data directory. Address books can also be stored in a database or, if the required functions are provided by a plugin, another storage facility. SquirrelMail can also be configured to lookup addresses in LDAP directories, if the PHP installation contains LDAP support.
An address book file contains five fields, which are delimited by the vertical line (|): the first field stores nicknames, short names that are used to identify address book entries; the second field stores names; the third field stores surnames; the forth field stores mail addresses; and the fifth field stores additional information.
Additional address book fields and functions can be provided by the experimental vcard address book format and some address book plugins. You can find list of address book plugins at the SquirrelMail site.
squint
squint is a Squid proxy log analyzer that generates a detailed report of who is spending the most time and resources browsing the Internet. The top offenders in terms of data transfer, number of files transferred, and on-line time are reported
squint is useful for discovering problems with internet usage patterns. To determine on-line time, it is guesstimated that after a "hit" the person is reading the page for the following two minutes. So the measurement of on-line time is unreliable, but the system does provide a warning that investigation may be warranted
squint is useful for discovering problems with internet usage patterns. To determine on-line time, it is guesstimated that after a "hit" the person is reading the page for the following two minutes. So the measurement of on-line time is unreliable, but the system does provide a warning that investigation may be warranted
What is 'Content Filtering'?
Your normal web filter such as Cyber Patrol, squidGuard, Net Nanny, etc, has a very large list of bad sites. If you try to go to these sites you will get blocked. I.e. your web access is filtered by web address.
The web is a fast changing place and even large web search engines such as Google or Altavista or Yahoo don't even know of half of it. This makes filtering by web address (URL) difficult as sites change and new ones come up all the time. It is impossible to have comprehensive filtering using just URLs. What is needed is something to check every page you (or your children) ever access for 'bad' subjects such as drugs, profanities, hate, pornography, etc, and disallow it if it's not suitable. This is called 'Content Filtering'.
This is why you need DansGuardian as it makes the web a cleaner, safer, place for you and your children.
As a side effect, DansGuardian also helps maintain freedom of speech by moving the censoring to the choice of the individual rather than imposing a specific ideal on the whole world.
The web is a fast changing place and even large web search engines such as Google or Altavista or Yahoo don't even know of half of it. This makes filtering by web address (URL) difficult as sites change and new ones come up all the time. It is impossible to have comprehensive filtering using just URLs. What is needed is something to check every page you (or your children) ever access for 'bad' subjects such as drugs, profanities, hate, pornography, etc, and disallow it if it's not suitable. This is called 'Content Filtering'.
This is why you need DansGuardian as it makes the web a cleaner, safer, place for you and your children.
As a side effect, DansGuardian also helps maintain freedom of speech by moving the censoring to the choice of the individual rather than imposing a specific ideal on the whole world.
DansGuardian
DansGuardian is an award winning web content filtering proxy(1) for Linux, FreeBSD, OpenBSD, NetBSD, Mac OS X, HP-UX, and Solaris that uses Squid(2) to do all the fetching. It filters using multiple methods. These methods include URL and domain filtering, content phrase filtering, PICS filtering, MIME filtering, file extension filtering, POST limiting.
The content phrase filtering will check for pages that contain profanities and phrases often associated with pornography and other undesirable content. The POST filtering allows you to block or limit web upload. The URL and domain filtering is able to handle huge lists and is significantly faster than squidGuard.
The filtering has configurable domain, user and source ip exception lists. SSL Tunneling is supported.
The configurable logging produces a log in an easy to read format which has the option to only log the text-based pages, thus significantly reducing redundant information such as every image on a page.
Pretty much all parts of DansGuardian are configurable thus giving the end administrator user total control over what is filtered and not some third-party company.
(1) Technically DansGuardian is more of a filtering pass-through than a true proxy - but don't let that worry you!
(2) DansGuardian should work with any proxy, not just Squid. For example, it is known to work with Oops
The main features of DansGuardian are as follows:
*Significantly cheaper than IGear (one of the best commercial filters).
*Can block adverts by the use of an advert URL block list.
*Can filter text and HTML pages for obscene (sexual, racial, violent, etc) content.
*Uses an advanced phrase weighting system to reduce over or under blocking.
*Can filter sites using the PICS labeling system.
*Can filter according to MIME type and file extension.
*Can filter according to URLs including Regular Expression URLs.
*URL filtering is compatible with squidGuard black lists.
*The URL filtering is able to filter https requests.
*Can work in a 'whitelist' mode where all sites except those listed are blocked.
*Can block all IP based URLs.
*Is able to block sites when users try using the IP address of the site instead.
*Produces a log in a very human readable format.
*Optionally produces a log in CSV format for easy import into databases etc.
*Is able to log the username using either Ident or basic proxy authentication.
*It has the ability to switch off filtering for specified sites, parts of sites, browser IPs and usernames.
*Can block specified source IPs and usernames.
*Can block or limit web uploading (e.g. attachments in Hotmail).
*Has the ability to work in a stealth mode where it logs sites that would have been blocked, but does not block them. This allows you to monitor your users without them knowing.
*Uses a very intelligent algorithm to match phrases in web pages mixed in with HTML code and white space.
*Big5, Unicode and top-bit set characters can be used in search phrases.
*URL filtering is significantly faster than squidGuard.
*The configuration lists use the same incredibly fast code that allows them all to be hundreds of thousands of entries long.
*100% C++ and can compile on GCC 3.
*Can be made to re-read config files with a HUP signal.
*Works perfectly in conjunction with Squid and Oops. Also see this important information.
*Has no 3rd party library requirements (no nb++ as was used in version 1) so can be installed much easier and so is also provided as an RPM.
*Supports (adds) the squid X-Forwarded-For header line.
*Supports compressed (Content-Encoding gzip and deflate) HTML.
*Can be made to only listen on 1 IP.
The content phrase filtering will check for pages that contain profanities and phrases often associated with pornography and other undesirable content. The POST filtering allows you to block or limit web upload. The URL and domain filtering is able to handle huge lists and is significantly faster than squidGuard.
The filtering has configurable domain, user and source ip exception lists. SSL Tunneling is supported.
The configurable logging produces a log in an easy to read format which has the option to only log the text-based pages, thus significantly reducing redundant information such as every image on a page.
Pretty much all parts of DansGuardian are configurable thus giving the end administrator user total control over what is filtered and not some third-party company.
(1) Technically DansGuardian is more of a filtering pass-through than a true proxy - but don't let that worry you!
(2) DansGuardian should work with any proxy, not just Squid. For example, it is known to work with Oops
The main features of DansGuardian are as follows:
*Significantly cheaper than IGear (one of the best commercial filters).
*Can block adverts by the use of an advert URL block list.
*Can filter text and HTML pages for obscene (sexual, racial, violent, etc) content.
*Uses an advanced phrase weighting system to reduce over or under blocking.
*Can filter sites using the PICS labeling system.
*Can filter according to MIME type and file extension.
*Can filter according to URLs including Regular Expression URLs.
*URL filtering is compatible with squidGuard black lists.
*The URL filtering is able to filter https requests.
*Can work in a 'whitelist' mode where all sites except those listed are blocked.
*Can block all IP based URLs.
*Is able to block sites when users try using the IP address of the site instead.
*Produces a log in a very human readable format.
*Optionally produces a log in CSV format for easy import into databases etc.
*Is able to log the username using either Ident or basic proxy authentication.
*It has the ability to switch off filtering for specified sites, parts of sites, browser IPs and usernames.
*Can block specified source IPs and usernames.
*Can block or limit web uploading (e.g. attachments in Hotmail).
*Has the ability to work in a stealth mode where it logs sites that would have been blocked, but does not block them. This allows you to monitor your users without them knowing.
*Uses a very intelligent algorithm to match phrases in web pages mixed in with HTML code and white space.
*Big5, Unicode and top-bit set characters can be used in search phrases.
*URL filtering is significantly faster than squidGuard.
*The configuration lists use the same incredibly fast code that allows them all to be hundreds of thousands of entries long.
*100% C++ and can compile on GCC 3.
*Can be made to re-read config files with a HUP signal.
*Works perfectly in conjunction with Squid and Oops. Also see this important information.
*Has no 3rd party library requirements (no nb++ as was used in version 1) so can be installed much easier and so is also provided as an RPM.
*Supports (adds) the squid X-Forwarded-For header line.
*Supports compressed (Content-Encoding gzip and deflate) HTML.
*Can be made to only listen on 1 IP.
Saturday, March 29, 2008
Linux Commands
UNIX Commands
Login and Exit
yppasswd Change password
rlogin machine Log into a remote machine
telnet machine Log into a remote machine
exit End a shell session
Ctrl-D End shell session
logout Log out of remote session
Help
man command Describes the command
man -k keyword Search for keyword
Display directory listing
ls Display directory listing
ls -l Display access permissions
ls -a Display hidden files
ls -d Display directory
ls -t Display files sorted by time
ls dir Display contents of directory
ls file Display file
ls ???* Display all files with more than 3 characters in name
Change, Create, Remove Directories
pwd Display current directory
cd Change to your home directory
cd .. Change to parent directory
cd dir Change to another directory
mkdir dir Create a new directory
rmdir dir Remove a directory (must be empty)
rm -r dir Remove a directory and everything within it
Create, Copy, Move, Delete Files
touch file Create an empty file
cp src-file dst-file Copy a file to another file
cp src-file dst-dir Copy a file to a directory
cp -r * dst-dir Copy all files and sub-directories to another directory
mv src-file dst-file Move a file to another file (renames file)
mv src-file dst-dir Move a file to another directory
mv src-dir dst-dir Move a directory to another directory
Note: filenames can be up to 255 characters long but should not contain special
characters, e.g., $ * [ ] & < >
Change file permissions
chmod o+r file Change file to allow read access to anyone
chown username file Change onwer of file to username
chgrp new-grp file Change group owner of file to new-grp
Note: -rwxr-x--- shows permissions on a regular file to be:
owner=read,write,execute; group=read,execute; others=nothing
Display Contents of a File
cat file Display contents of file
cat -v file Display non-printing characters
head file Display first 10 lines
tail file Display last 10 lines
more file Display file one screen at a time
wc file Count number of words in file
Sort, Compare, Convert, Compress Files
sort file1 > file2 Sort lines in file1 and output to file2
sort -1 -n file1 > file2 Sort by column 1 numerically and output to file2
uniq input-file Remove or report adjacent duplicate lines
unix2dos unx-file dos-file Convert from UNIX to DOS format
dos2unix dos-file unx-file Convert from DOS to UNIX format
cmp file1 file2 Compare byte-by-byte 2 files
diff file1 file2 Compare line by line 2 files
sdiff file1 file2 Compare 2 files by displaying them side by side
compress file Compress a file
gunzip file Uncompress a file
tar cf file.tar . Create a tar archive of current directory
tar xf file.tar Restore tar archive to current directory
Search for Strings within Files
grep string file(s) Display lines containing string in file(s)
grep -v string file(s) Display lines that do not contain string
Printing
lp file Print file to default printer
lpr file Print file to default printer
lpstat Check status of printer queue
lpstat -p Check status of all printer queues
lpq Check contents printer queue
lprm job# Remove job from printer queue
lprm - Remove all your jobs from printer queue
lp -Pprinter file Print to a specific printer queue
ls -l | lp Print directory listing
head file | lp Print first 10 lines of file
Redirect Output
cmd | lp Direct cmd output to printer
cmd > file Direct cmd output to file
cmd | tee file Direct output to screen and file
cmd >> file Direct cmd output to end of file
cmd1 | cmd2 Direct cmd1 output to input of cmd2
Conditional Execution
cmd1 && cmd2 Execute cmd2 only if cmd1 is successful
cmd1 || cmd2 Execute cmd2 only if cmd1 fails
BACKGROUND PROCESSES
Command Effect
cmd& Run job in background
jobs List jobs running in background in current shell session
fg Bring most recent background job into foreground
fg job# Bring job# into foreground
bg Put job into background
bg job# Put job# into background
Ctrl-z Suspend current job and put it in background (resume with fg or bg)
at timespec cmd
Ctrl-d Execute commands at a specified time
batch cmd Ctrl-d Execute commands in the batch queue
atq Display jobs running in at and batch queues
atrm job# Remove job from at or batch queue
Checking Processes
ps Display current shell processes
ps -ef Display all processes
ps -ef | grep username Display all processes for a user
kill -9 PID Stop a process
Disk Space, User and Environment Information
du -sk Display disk space of current directory in 1024k blocks
du -sk dir Display disk space of dir in 1024k blocks
df -k Display data for mounted file systems
env Display environment variable settings
setenv ENV-VAR XXX Set an environment variable
alias Display all defined aliases
umask Display current default file protection mask
whoami Display your username
id Display your user and group ids
users Display users logged in
groups Display which groups you belong to
w Display who is logged in and what they are doing
Login and Exit
yppasswd Change password
rlogin machine Log into a remote machine
telnet machine Log into a remote machine
exit End a shell session
Ctrl-D End shell session
logout Log out of remote session
Help
man command Describes the command
man -k keyword Search for keyword
Display directory listing
ls Display directory listing
ls -l Display access permissions
ls -a Display hidden files
ls -d Display directory
ls -t Display files sorted by time
ls dir Display contents of directory
ls file Display file
ls ???* Display all files with more than 3 characters in name
Change, Create, Remove Directories
pwd Display current directory
cd Change to your home directory
cd .. Change to parent directory
cd dir Change to another directory
mkdir dir Create a new directory
rmdir dir Remove a directory (must be empty)
rm -r dir Remove a directory and everything within it
Create, Copy, Move, Delete Files
touch file Create an empty file
cp src-file dst-file Copy a file to another file
cp src-file dst-dir Copy a file to a directory
cp -r * dst-dir Copy all files and sub-directories to another directory
mv src-file dst-file Move a file to another file (renames file)
mv src-file dst-dir Move a file to another directory
mv src-dir dst-dir Move a directory to another directory
Note: filenames can be up to 255 characters long but should not contain special
characters, e.g., $ * [ ] & < >
Change file permissions
chmod o+r file Change file to allow read access to anyone
chown username file Change onwer of file to username
chgrp new-grp file Change group owner of file to new-grp
Note: -rwxr-x--- shows permissions on a regular file to be:
owner=read,write,execute; group=read,execute; others=nothing
Display Contents of a File
cat file Display contents of file
cat -v file Display non-printing characters
head file Display first 10 lines
tail file Display last 10 lines
more file Display file one screen at a time
wc file Count number of words in file
Sort, Compare, Convert, Compress Files
sort file1 > file2 Sort lines in file1 and output to file2
sort -1 -n file1 > file2 Sort by column 1 numerically and output to file2
uniq input-file Remove or report adjacent duplicate lines
unix2dos unx-file dos-file Convert from UNIX to DOS format
dos2unix dos-file unx-file Convert from DOS to UNIX format
cmp file1 file2 Compare byte-by-byte 2 files
diff file1 file2 Compare line by line 2 files
sdiff file1 file2 Compare 2 files by displaying them side by side
compress file Compress a file
gunzip file Uncompress a file
tar cf file.tar . Create a tar archive of current directory
tar xf file.tar Restore tar archive to current directory
Search for Strings within Files
grep string file(s) Display lines containing string in file(s)
grep -v string file(s) Display lines that do not contain string
Printing
lp file Print file to default printer
lpr file Print file to default printer
lpstat Check status of printer queue
lpstat -p Check status of all printer queues
lpq Check contents printer queue
lprm job# Remove job from printer queue
lprm - Remove all your jobs from printer queue
lp -Pprinter file Print to a specific printer queue
ls -l | lp Print directory listing
head file | lp Print first 10 lines of file
Redirect Output
cmd | lp Direct cmd output to printer
cmd > file Direct cmd output to file
cmd | tee file Direct output to screen and file
cmd >> file Direct cmd output to end of file
cmd1 | cmd2 Direct cmd1 output to input of cmd2
Conditional Execution
cmd1 && cmd2 Execute cmd2 only if cmd1 is successful
cmd1 || cmd2 Execute cmd2 only if cmd1 fails
BACKGROUND PROCESSES
Command Effect
cmd& Run job in background
jobs List jobs running in background in current shell session
fg Bring most recent background job into foreground
fg job# Bring job# into foreground
bg Put job into background
bg job# Put job# into background
Ctrl-z Suspend current job and put it in background (resume with fg or bg)
at timespec cmd
Ctrl-d Execute commands at a specified time
batch cmd Ctrl-d Execute commands in the batch queue
atq Display jobs running in at and batch queues
atrm job# Remove job from at or batch queue
Checking Processes
ps Display current shell processes
ps -ef Display all processes
ps -ef | grep username Display all processes for a user
kill -9 PID Stop a process
Disk Space, User and Environment Information
du -sk Display disk space of current directory in 1024k blocks
du -sk dir Display disk space of dir in 1024k blocks
df -k Display data for mounted file systems
env Display environment variable settings
setenv ENV-VAR XXX Set an environment variable
alias Display all defined aliases
umask Display current default file protection mask
whoami Display your username
id Display your user and group ids
users Display users logged in
groups Display which groups you belong to
w Display who is logged in and what they are doing
RAID
1. What is RAID
* The basic idea behind RAID is to combine multiple small, inexpensive
disk drives into an array to accomplish performance or redundancy goals not
attainable with one large and expensive drive.
* This array of drives will appear to the computer as a single logical storage
unit or drive.
* RAID is a method in which information is spread across several disks, using
techniques such as :
* Disk Striping (RAID Level 0) [no redundancy,no FT]
* Disk Mirroring (RAID level 1) [redundancy,with FT]
* disk striping with parity on single disk (RAID Level 4) Not Supported
* disk striping with parity across disks (RAID Level 5)
* Linear RAID
to achieve redundancy, lower latency and/or increase bandwidth for reading or
writing to disks, and maximize the ability to recover from hard disk crashes.
* The underlying concept of RAID is that data may be distributed across each
drive in the array in a consistent manner.
* To do this, the data must first be broken into consistently-sized chunks
or strips (often 32K or 64K in size, although different sizes can be used).
* Each chunk is then written to a hard drive in RAID according to the RAID level
used.
* When the data is to be read, the process is reversed, giving the illusion
that multiple drives are actually one large drive.
2. Who Should Use RAID
* Anyone who needs to keep large quantities of data on hand
(such as a sysadmin) would benefit by using RAID technology.
* The Primary reasons to use RAID include:
* Enhanced speed
* Increased storage capacity using a single virtual disk
* Lessened impact of a disk failure
3. Hardware RAID versus Software RAID
* There are two possible RAID approaches: Hardware RAID and Software RAID.
Hardware RAID
* H/W systems manages the RAID subsystem independently from
the host and presents to the host only a single disk per RAID array.
* An example of a Hardware RAID device would be one that connects to a SCSI
controller and presents the RAID arrays as a single SCSI drive.
* An external RAID system moves all RAID handling "intelligence" into a
controller located in the external disk subsystem.
* The whole subsystem is connected to the host via a normal SCSI controller and
appears to the host as a single disk.
* RAID controllers also come in the form of cards that act like a SCSI
controller to the operating system but handle all of the actual drive
communications themselves.
* In these cases, you plug the drives into the RAID controller just like you
would a SCSI controller, but then you add them to the RAID controller's
configuration, and the operating system never knows the difference.
* Many controllers have their own BIOS and can be configured independantly
of the host computer to which they are attached, just like you use the CMOS
to configure your system.
Software RAID
* Software RAID implements the various RAID levels in the kernel disk
(block device) code. It offers the cheapest possible solution, as expensive
disk controller cards or hot-swap chassis are not required.
* Software RAID also works with cheaper IDE disks as well as SCSI disks.
With today's fast CPUs, Software RAID performance can excel against Hardware
RAID.
* The MD driver in the Linux kernel is an example of a RAID solution that is
completely hardware independent. The performance of a software-based array is
dependent on the server CPU performance and load.
Software RAID has to offer some important features:
* Threaded rebuild process
* Kernel-based configuration
* Portability of arrays between Linux machines without reconstruction
* Backgrounded array reconstruction using idle system resources
* Hot-swappable drive support
* Automatic CPU detection to take advantage of certain CPU
optimizations
Note: A hot-swap chassis allows you to remove a hard drive without having to
power-down your system.
4. RAID Level 0 [Striping]
* RAID-0 is also called Striping. In this level two or more Hard Disks are
combined to appear as one large one to the OS
Example :
=======
I have two HDD of 8GB and 20GB. In RAID-0, both are combined and you get a
combined disk space of 28GB. There is no Data Redudancy and Fault Tolerance.
If one of the HDD fails, you lose all your data.
ADVANTAGES
==========
* One advantage of this is speed since a file is spread [strips] across the
two disks and can be read twice as fast
* Can accomodate very large files
* Can accomodate disks of unequal sizes. When RAID runs out of space on the
smaller [8GB] disk, it then continues the striping using the available space
on the remaining drives. When this occurs, the data access speed is lower for
this portion of data, because the total number of RAID drives available is
reduced. For this reason, RAID 0 is best used with drives of equal size.
DISADVANTAGES
=============
* There is no Data Redundancy and Fault Tolerance.
If one of the HDD fails, you lose all your data.
* You could, however, use one HDD too, if you are stupid enough to do that
+---------+ +---------+
| Block 1 | | Block 1 |
+---------+ +---------+
+---------+ Physical +---------+
| Block 2 | Disk 1 | Block 3 |
+---------+ +---------+
+---------+ +---------+
| Block 3 | RAID-0 | Block 5 |
+---------+ +---------+
==========>
+---------+ +---------+
| Block 4 | | Block 2 |
+---------+ Physical +---------+
Disk 2
+---------+ +---------+
| Block 5 | | Block 4 |
+---------+ +---------+
+---------+ +---------+
| Block 6 | | Block 6 |
+---------+ +---------+
5. RAID Level 1 [Mirroring]
* RAID 1, or "mirroring," has been used longer than any other form of RAID.
* Level 1 provides redundancy by writing identical data to each member disk of
the array, leaving a "mirrored" copy on each disk.
* Here you use two hard disks such that both of them contain exactly the same
information.
* In case of failure of one disk, the server will boot through the second disk.
When the failed disk is replaced, the data is automatically cloned to the new
disk from the surviving disk.
* Level 1 operates with two or more disks that may use parallel access for high
data-transfer rates when reading but more commonly operate independently to
provide high I/O transaction rates.
* RAID 1 also offers the possibility of using a hot standby spare disk that will
be automatically cloned in the event of a disk failure on any of the primary
RAID devices.
* Mirroring remains popular due to its simplicity and high level of data
availability.
ADVANTAGES
==========
* Offers redundancy and more Fault Tolerance.
* Provides very good data reliability and improves performance for
read-intensive applications
DISADVANTAGES
=============
* Total RAID size in GB is equal to that of the smallest disk in the RAID set.
Unlike RAID 0, the extra space on the larger device isn't used.
* RAID 1 offers data redundancy, without the speed advantages of RAID 0.
The server has to send data twice to be written to each of the mirrored disks.
This can saturate data busses and CPU use.
With a hardware-based solution, the server CPU sends the data to the RAID
disk controller once, and the disk controller then duplicates the data to
the mirrored disks.
This makes RAID-capable disk controllers the preferred solution when
implementing RAID 1.
* The storage capacity of the level 1 array is equal to the capacity of one
of the mirrored hard disks in a Hardware RAID or one of the mirrored
partitions in a Software RAID.
+---------+ +---------+ +---------+
| Block 1 | | Block 1 | | Block 1 |
+---------+ RAID-1 +---------+ +---------+
| Block 2 | | Block 2 | | Block 2 |
+---------+ +---------+ +---------+
| Block 3 | | Block 3 | | Block 3 |
+---------+ ========> +---------+ ========> +---------+
| Block 4 | | Block 4 | | Block 4 |
+---------+ +---------+ +---------+
| Block 5 | | Block 5 | | Block 5 |
+---------+ +---------+ +---------+
| Block 6 | | Block 6 | | Block 6 |
+---------+ +---------+ +---------+
Physical Disk 1 Physical Disk 2
6. RAID Level 4 = 0 + 1 w/o parity
* Linux RAID 4 requires a minimum of three disks or partitions and can
survive the loss of one disk only
* RAID 4 combines the high speed provided of RAID 0 with the redundancy of
RAID 1.
* Level 4 uses parity concentrated on a single disk drive to protect data.
RAID 4 operates likes RAID 0 but inserts a special error-correcting or parity
chunk on an additional disk dedicated to this purpose.
* Its major disadvantage is that the data is striped, but the parity info is
not. In other words, any data written to any section of the data portion
of the RAID set must be followed by an update of the parity disk. The parity
disk can therefore act as a bottleneck. For this reason, RAID 4 isn't used
very frequently.
Because the dedicated parity disk represents an inherent bottleneck,
level 4 is seldom used without accompanying technologies such as
write-back caching.
* RAID 4 requires at least three disks in the RAID set and can survive the loss
of a single drive only. When this occurs, the data in it can be recreated on
the fly with the aid of the information on the RAID set's parity disk. When
the failed disk is replaced, it is re-populated with the lost data with the
help of the parity disk's information.
* It is better suited to transaction I/O rather than large file transfers.
* Although RAID level 4 is an option in some RAID partitioning schemes, it is
not an option allowed in Red Hat Linux RAID installations.
* The storage capacity of Hardware RAID level 4 is equal to the capacity of
member disks, minus the capacity of one member disk.
* The storage capacity of Software RAID level 4 is equal to the capacity of
the member partitions, minus the size of one of the partitions if they
are of equal size.
* RAID 4 is not supported by Fedora Linux.
7. RAID 5 = 4 with parity
* Linux RAID 5 requires a minimum of three disks or partitions and can
survive the loss of one disk only
* This is the most common type of RAID. By distributing parity across some
or all of an array's member disk drives, RAID level 5 eliminates the
write bottleneck inherent in level 4.
* The only performance bottleneck is the parity calculation process. With
modern CPUs and Software RAID, that usually is not a very big problem.
* As with level 4, the result is asymmetrical performance, with reads
substantially outperforming writes.
* Level 5 is often used with write-back caching to reduce the asymmetry.
* The storage capacity of Hardware RAID level 5 is equal to the capacity of
member disks, minus the capacity of one member disk.
* The storage capacity of Software RAID level 5 is equal to the capacity
of the member partitions, minus the size of one of the partitions if they
are of equal size.
* RAID 5 improves on RAID 4 by striping the parity data between all the
disks in the RAID set. This avoids the parity disk bottleneck, whilst
maintaining many of the speed features of RAID 0 and the redundancy of
RAID 1. Like RAID 4, RAID 5 can survive the loss of a single disk only.
* RAID 5 is supported by Fedora Linux. Figure below illustrates the data
allocation process in RAID 5.
+---------+
| Block 1 |
+---------+
+---------+ +----------+ +---------+ +-----------------+
| Block 2 | | Block 1 | | Block 2 | |ErrChk Blocks 1+2|
+---------+ +---------+ +---------+ +-----------------+
+---------+ +----------+-----------+ +---------+ +---------+
| Block 3 | RAID-5 | ErrChk Blocks 3+4 | | Block 1 | | Block 3 |
+---------+ +-Block 3+4----------+ +---------+ +---------+
+---------+ +---------+ +-------------------+ +---------+
| Block 4 | | Block 6 | | ErrChk Blocks 6+5 | | Block 5 |
+---------+ +---------+ +-------------------+ +---------+
+---------+ +---------+ +---------+ +------------------+
| Block 5 | | Block 7 | | Block 8 | |ErrChk Blocks 7+8 |
+---------+ +---------+ +---------+ +------------------+
+---------+
| Block 6 |
+---------+
Linear RAID
Linear RAID is a simple grouping of drives to create a larger virtual drive.
In linear RAID, the chunks are allocated sequentially from one member drive,
going to the next drive only when the first is completely filled.
This grouping provides no performance benefit, as it is unlikely
that any I/O operations will be split between member drives. Linear RAID also
offers no redundancy and, in fact, decreases reliability if any one member
drive fails, the entire array cannot be used. The capacity is the total of all
member disks.
Note : RAID level 1 comes at a high cost because you write the same info to
all of the disks in the array, which wastes drive space.
For example, if you have RAID level 1 set up so that your
root (/) partition exists on two 40G drives, you have 80G total but
are only able to access 40G of that 80G. The other 40G acts like a
mirror of the first 40G.
Note : Parity information is calculated based on the contents of the rest of
the member disks in the array. This information can then be used to
reconstruct data when one disk in the array fails. The reconstructed
data can then be used to satisfy I/O requests to the failed disk before
it is replaced and to repopulate the failed disk after it has been
replaced.
Note : RAID level 4 takes up the same amount of space as RAID level 5, but
level 5 has more advantages. For this reason, level 4 is not supported.
* The basic idea behind RAID is to combine multiple small, inexpensive
disk drives into an array to accomplish performance or redundancy goals not
attainable with one large and expensive drive.
* This array of drives will appear to the computer as a single logical storage
unit or drive.
* RAID is a method in which information is spread across several disks, using
techniques such as :
* Disk Striping (RAID Level 0) [no redundancy,no FT]
* Disk Mirroring (RAID level 1) [redundancy,with FT]
* disk striping with parity on single disk (RAID Level 4) Not Supported
* disk striping with parity across disks (RAID Level 5)
* Linear RAID
to achieve redundancy, lower latency and/or increase bandwidth for reading or
writing to disks, and maximize the ability to recover from hard disk crashes.
* The underlying concept of RAID is that data may be distributed across each
drive in the array in a consistent manner.
* To do this, the data must first be broken into consistently-sized chunks
or strips (often 32K or 64K in size, although different sizes can be used).
* Each chunk is then written to a hard drive in RAID according to the RAID level
used.
* When the data is to be read, the process is reversed, giving the illusion
that multiple drives are actually one large drive.
2. Who Should Use RAID
* Anyone who needs to keep large quantities of data on hand
(such as a sysadmin) would benefit by using RAID technology.
* The Primary reasons to use RAID include:
* Enhanced speed
* Increased storage capacity using a single virtual disk
* Lessened impact of a disk failure
3. Hardware RAID versus Software RAID
* There are two possible RAID approaches: Hardware RAID and Software RAID.
Hardware RAID
* H/W systems manages the RAID subsystem independently from
the host and presents to the host only a single disk per RAID array.
* An example of a Hardware RAID device would be one that connects to a SCSI
controller and presents the RAID arrays as a single SCSI drive.
* An external RAID system moves all RAID handling "intelligence" into a
controller located in the external disk subsystem.
* The whole subsystem is connected to the host via a normal SCSI controller and
appears to the host as a single disk.
* RAID controllers also come in the form of cards that act like a SCSI
controller to the operating system but handle all of the actual drive
communications themselves.
* In these cases, you plug the drives into the RAID controller just like you
would a SCSI controller, but then you add them to the RAID controller's
configuration, and the operating system never knows the difference.
* Many controllers have their own BIOS and can be configured independantly
of the host computer to which they are attached, just like you use the CMOS
to configure your system.
Software RAID
* Software RAID implements the various RAID levels in the kernel disk
(block device) code. It offers the cheapest possible solution, as expensive
disk controller cards or hot-swap chassis are not required.
* Software RAID also works with cheaper IDE disks as well as SCSI disks.
With today's fast CPUs, Software RAID performance can excel against Hardware
RAID.
* The MD driver in the Linux kernel is an example of a RAID solution that is
completely hardware independent. The performance of a software-based array is
dependent on the server CPU performance and load.
Software RAID has to offer some important features:
* Threaded rebuild process
* Kernel-based configuration
* Portability of arrays between Linux machines without reconstruction
* Backgrounded array reconstruction using idle system resources
* Hot-swappable drive support
* Automatic CPU detection to take advantage of certain CPU
optimizations
Note: A hot-swap chassis allows you to remove a hard drive without having to
power-down your system.
4. RAID Level 0 [Striping]
* RAID-0 is also called Striping. In this level two or more Hard Disks are
combined to appear as one large one to the OS
Example :
=======
I have two HDD of 8GB and 20GB. In RAID-0, both are combined and you get a
combined disk space of 28GB. There is no Data Redudancy and Fault Tolerance.
If one of the HDD fails, you lose all your data.
ADVANTAGES
==========
* One advantage of this is speed since a file is spread [strips] across the
two disks and can be read twice as fast
* Can accomodate very large files
* Can accomodate disks of unequal sizes. When RAID runs out of space on the
smaller [8GB] disk, it then continues the striping using the available space
on the remaining drives. When this occurs, the data access speed is lower for
this portion of data, because the total number of RAID drives available is
reduced. For this reason, RAID 0 is best used with drives of equal size.
DISADVANTAGES
=============
* There is no Data Redundancy and Fault Tolerance.
If one of the HDD fails, you lose all your data.
* You could, however, use one HDD too, if you are stupid enough to do that
+---------+ +---------+
| Block 1 | | Block 1 |
+---------+ +---------+
+---------+ Physical +---------+
| Block 2 | Disk 1 | Block 3 |
+---------+ +---------+
+---------+ +---------+
| Block 3 | RAID-0 | Block 5 |
+---------+ +---------+
==========>
+---------+ +---------+
| Block 4 | | Block 2 |
+---------+ Physical +---------+
Disk 2
+---------+ +---------+
| Block 5 | | Block 4 |
+---------+ +---------+
+---------+ +---------+
| Block 6 | | Block 6 |
+---------+ +---------+
5. RAID Level 1 [Mirroring]
* RAID 1, or "mirroring," has been used longer than any other form of RAID.
* Level 1 provides redundancy by writing identical data to each member disk of
the array, leaving a "mirrored" copy on each disk.
* Here you use two hard disks such that both of them contain exactly the same
information.
* In case of failure of one disk, the server will boot through the second disk.
When the failed disk is replaced, the data is automatically cloned to the new
disk from the surviving disk.
* Level 1 operates with two or more disks that may use parallel access for high
data-transfer rates when reading but more commonly operate independently to
provide high I/O transaction rates.
* RAID 1 also offers the possibility of using a hot standby spare disk that will
be automatically cloned in the event of a disk failure on any of the primary
RAID devices.
* Mirroring remains popular due to its simplicity and high level of data
availability.
ADVANTAGES
==========
* Offers redundancy and more Fault Tolerance.
* Provides very good data reliability and improves performance for
read-intensive applications
DISADVANTAGES
=============
* Total RAID size in GB is equal to that of the smallest disk in the RAID set.
Unlike RAID 0, the extra space on the larger device isn't used.
* RAID 1 offers data redundancy, without the speed advantages of RAID 0.
The server has to send data twice to be written to each of the mirrored disks.
This can saturate data busses and CPU use.
With a hardware-based solution, the server CPU sends the data to the RAID
disk controller once, and the disk controller then duplicates the data to
the mirrored disks.
This makes RAID-capable disk controllers the preferred solution when
implementing RAID 1.
* The storage capacity of the level 1 array is equal to the capacity of one
of the mirrored hard disks in a Hardware RAID or one of the mirrored
partitions in a Software RAID.
+---------+ +---------+ +---------+
| Block 1 | | Block 1 | | Block 1 |
+---------+ RAID-1 +---------+ +---------+
| Block 2 | | Block 2 | | Block 2 |
+---------+ +---------+ +---------+
| Block 3 | | Block 3 | | Block 3 |
+---------+ ========> +---------+ ========> +---------+
| Block 4 | | Block 4 | | Block 4 |
+---------+ +---------+ +---------+
| Block 5 | | Block 5 | | Block 5 |
+---------+ +---------+ +---------+
| Block 6 | | Block 6 | | Block 6 |
+---------+ +---------+ +---------+
Physical Disk 1 Physical Disk 2
6. RAID Level 4 = 0 + 1 w/o parity
* Linux RAID 4 requires a minimum of three disks or partitions and can
survive the loss of one disk only
* RAID 4 combines the high speed provided of RAID 0 with the redundancy of
RAID 1.
* Level 4 uses parity concentrated on a single disk drive to protect data.
RAID 4 operates likes RAID 0 but inserts a special error-correcting or parity
chunk on an additional disk dedicated to this purpose.
* Its major disadvantage is that the data is striped, but the parity info is
not. In other words, any data written to any section of the data portion
of the RAID set must be followed by an update of the parity disk. The parity
disk can therefore act as a bottleneck. For this reason, RAID 4 isn't used
very frequently.
Because the dedicated parity disk represents an inherent bottleneck,
level 4 is seldom used without accompanying technologies such as
write-back caching.
* RAID 4 requires at least three disks in the RAID set and can survive the loss
of a single drive only. When this occurs, the data in it can be recreated on
the fly with the aid of the information on the RAID set's parity disk. When
the failed disk is replaced, it is re-populated with the lost data with the
help of the parity disk's information.
* It is better suited to transaction I/O rather than large file transfers.
* Although RAID level 4 is an option in some RAID partitioning schemes, it is
not an option allowed in Red Hat Linux RAID installations.
* The storage capacity of Hardware RAID level 4 is equal to the capacity of
member disks, minus the capacity of one member disk.
* The storage capacity of Software RAID level 4 is equal to the capacity of
the member partitions, minus the size of one of the partitions if they
are of equal size.
* RAID 4 is not supported by Fedora Linux.
7. RAID 5 = 4 with parity
* Linux RAID 5 requires a minimum of three disks or partitions and can
survive the loss of one disk only
* This is the most common type of RAID. By distributing parity across some
or all of an array's member disk drives, RAID level 5 eliminates the
write bottleneck inherent in level 4.
* The only performance bottleneck is the parity calculation process. With
modern CPUs and Software RAID, that usually is not a very big problem.
* As with level 4, the result is asymmetrical performance, with reads
substantially outperforming writes.
* Level 5 is often used with write-back caching to reduce the asymmetry.
* The storage capacity of Hardware RAID level 5 is equal to the capacity of
member disks, minus the capacity of one member disk.
* The storage capacity of Software RAID level 5 is equal to the capacity
of the member partitions, minus the size of one of the partitions if they
are of equal size.
* RAID 5 improves on RAID 4 by striping the parity data between all the
disks in the RAID set. This avoids the parity disk bottleneck, whilst
maintaining many of the speed features of RAID 0 and the redundancy of
RAID 1. Like RAID 4, RAID 5 can survive the loss of a single disk only.
* RAID 5 is supported by Fedora Linux. Figure below illustrates the data
allocation process in RAID 5.
+---------+
| Block 1 |
+---------+
+---------+ +----------+ +---------+ +-----------------+
| Block 2 | | Block 1 | | Block 2 | |ErrChk Blocks 1+2|
+---------+ +---------+ +---------+ +-----------------+
+---------+ +----------+-----------+ +---------+ +---------+
| Block 3 | RAID-5 | ErrChk Blocks 3+4 | | Block 1 | | Block 3 |
+---------+ +-Block 3+4----------+ +---------+ +---------+
+---------+ +---------+ +-------------------+ +---------+
| Block 4 | | Block 6 | | ErrChk Blocks 6+5 | | Block 5 |
+---------+ +---------+ +-------------------+ +---------+
+---------+ +---------+ +---------+ +------------------+
| Block 5 | | Block 7 | | Block 8 | |ErrChk Blocks 7+8 |
+---------+ +---------+ +---------+ +------------------+
+---------+
| Block 6 |
+---------+
Linear RAID
Linear RAID is a simple grouping of drives to create a larger virtual drive.
In linear RAID, the chunks are allocated sequentially from one member drive,
going to the next drive only when the first is completely filled.
This grouping provides no performance benefit, as it is unlikely
that any I/O operations will be split between member drives. Linear RAID also
offers no redundancy and, in fact, decreases reliability if any one member
drive fails, the entire array cannot be used. The capacity is the total of all
member disks.
Note : RAID level 1 comes at a high cost because you write the same info to
all of the disks in the array, which wastes drive space.
For example, if you have RAID level 1 set up so that your
root (/) partition exists on two 40G drives, you have 80G total but
are only able to access 40G of that 80G. The other 40G acts like a
mirror of the first 40G.
Note : Parity information is calculated based on the contents of the rest of
the member disks in the array. This information can then be used to
reconstruct data when one disk in the array fails. The reconstructed
data can then be used to satisfy I/O requests to the failed disk before
it is replaced and to repopulate the failed disk after it has been
replaced.
Note : RAID level 4 takes up the same amount of space as RAID level 5, but
level 5 has more advantages. For this reason, level 4 is not supported.
Subscribe to:
Posts (Atom)