Using your .htaccess file on the site hosting
If you have surfed the web for more than a couple months, I am sure you have heard of this exotic and powerful file humbly known as .htaccess. Although the more commonly known utilities for .htaccess are password protected directories, custom error pages and directory indexes, the file is also practical for many other web site functions and features.
This article journeys beyond the commonalities of the file and interrogates other features that may be lesser known to webmasters. For those who are not intimate or experienced with .htaccess, we spring into action from the very basics of the file.
Our agenda for today is as follows:
• .htaccess naming conventions and file creation
• What is the .htaccess file?
• Uses of the .htaccess file
.htaccess naming conventions and file creation
Firstly, do not let the period preceding the name intimidate you. The .htaccess name, in a technical sense, is simply a file extension. The difference here is no file name is cited.
Anyone running the Apache web server can generate and use the .htaccess file. Creating the file is as effortless as opening a text editor, like Notepad or Homesite, and saving the file as .htaccess. The file is uploaded into a particular directory on the server in ASCII mode, using permissions of 644. We choose 644 to ensure adequate security from the outside world. If an intruder seizes the .htaccess file, the webmaster of that particular web site can be in big trouble.
Keep in mind that the .htaccess file is not only interpreted in the directory you place it in, but also all directories underneath it. For example, if the .htaccess file was placed in the root directory of your site, say thecoolsite.com, other directories like thecoolsite.com/subdir and thecoolsite.com/subdir/subdir will use that file. To override the root directory’s .htaccess file, simply place a blank .htaccess file within the respective directory. The directory will search for and consult the closest .htaccess file to it. So, what will this file allow us to do? Read on.
What is the .htaccess file?
To fully comprehend what the .htaccess file and appreciate its usefulness, we first must discuss the main Apache configuration file, called httpd.conf. The httpd.conf file, known simply as the config file, is where the Apache web server’s configurations are held, including modules, directives, port numbers and others, and resides on the administrator’s machine.
Every time an Apache-based web site is loaded, the httpd.conf file is consulted and interpreted. Whenever the file is modified, the administrator restarts (or bounces) the web server, which will compile the httpd.conf file again with the modifications intact.
However, when web servers are built to support outside clients, like a web hosting company’s web servers, administrators do not want clients to have access to the httpd.conf file. If they did, customers could inflict extreme damage on the server. However, at the same time, there are many utilities that customers should have the freedom to explore for their web site.
What is the solution? You guessed it, the .htaccess file. This file is merely an extension to the httpd.conf file; only this time, customers have access to it. Restrictions are placed on the .htaccess file to prevent customers from intentionally or inadvertently damaging default server configurations. Each time an Apache-based web page is loaded, the nearest .htaccess file is consulted and interpreted. The web server does not need to be restarted (or bounced) after each change of the .htaccess file. The file simply needs to be re-uploaded.
Since we now know a little about what the .htaccess file is and what it does, we can examine the magic of the file and look at some typical implementations.
Uses of the .htaccess file
Do not fret, the fun has arrived. Here is where we take a look at actually implementing the functions that make up the .htaccess file. To quickly locate a particular section of this document, see the table below.
ERROR DOCUMENTS,COMMON ONLINE CODES
200 – OK (don’t do this)
401 – Authorization Required
403 – Forbidden
404 – Page Not Found
405 – Method Not Allowed
500 – Internal Server Error
When an error occurs, Apache will consult the .htaccess file to determine the proper response. If no response is found in the .htaccess file, or if no .htaccess file is present, it will revert to the browser’s default error document page.
Before you begin, consider the table to the left, which depicts the most common online codes. Never, and I do stress never, implement 200 in the .htaccess file. If you do, you can create some extremely funky looped results.
Let us go ahead and create the error document coding for a 404 error, the most common error on the Internet, which occurs when a web page cannot be found.
ErrorDocument 404 /errorpages/404.html
Looking at the above code, we can see that:
• ErrorDocument: Tells the server that this line contains an error document, and an error document number ensues.
• 404: The particular error document code.
• /errorpages/404.html: The page that the server should display when the error code is encountered.
In the above example, when the server cannot find a page, the .htaccess file instructs the server to display the page located at /errorpages/404.html. Notice how I began the relative link from root (/). Since this .htaccess file will be interpreted in each directory below it, we should always begin our links from the root, or the server may not find the error document page.
For other error document pages, simply use the same format with the respective error code. For example:
ErrorDocument 403 /errorpages/403.html
ErrorDocument 500 /errorpages/500.html
MODIFYING DIRECTORY INDEXES
This is particularly useful if you have a directory cram packed of images or files that you do not want visitors to directly access. Since all web servers are setup differently, first check to see if your server allows or denies directory listings. If your server allows directory listing, but you wish to change that, the respective line in your .htaccess file should look like this:
IndexIgnore *
The *, or asterisk, instructs the server to ignore absolutely everything within the directory. Using the above code, then, will not display any files that are within the directory.
But hey, what happens if your server has directory browsing turned off but you would like it turned on within a directory or two? The line within your .htaccess file looks like this:
Options +Indexes
This code simply adds an index (file) listing to the directory. Simple enough, right? Getting a tad more detailed, what if you want to only display particular file types? Htaccess gives you the ability to specify particular file extensions to ignore. For example, this code will ignore all files ending in .jpg, .gif, .png, and .txt:
IndexIgnore *.jpg *.gif *.png *.txt
Again, the * wildcard is used to specify any file before that particular extension. So, the above line of code will display every file within the directory except those ending in the 4 extensions.
MAKING OUR OWN INDEX PAGES
Index pages are those pages that are automatically loaded whenever a file is not explicitly called for within a directory. Take the following as an example:
yourcoolsite.com/content1/
Because no web page was specifically called, (there is no page listed after content1), the web server will load the default page, like index.html, or default.html. But, being the quite demanding person that you are, what if you want another file to load as default?
DirectoryIndex yoursite.html
If that line of code was placed within the .htaccess file within the content1 directory, then the web server would look for yourcoolsite.com/content1/yoursite.html if it exists. If it does not exist, it will display the directory listing of the directory (unless, of course, directory browsing is turned off).
Still not satisfied? Okay, we can specify more than one index file, taking priority from left to right. Take the below code:
DirectoryIndex yoursite.html /cgi-bin/index.pl /index.html
Reading from left to right, if yoursite.html is not found, the server will look for the index.pl file within the cgi-bin directory (from root). If that is not found, then the root index.html file will be loaded. Again, if none of the files are found, a directory listing (unless you specified IndexIgnore *) will be displayed.
Remember that all directories below your present working directory will take this .htaccess file and, accordingly, the DirectoryIndex code that you just wrote, which you may or may not want. If you do not want other directories to take those configurations, simply upload a blank .htaccess file within the respective directory (you can populate the additional .htaccess file if you like).