When you’re constructing or sustaining a dynamic web site, you could have thought of the issue of how you can do away with unfriendly URLs. You
may also have learn Invoice Humphries’s ALA article on the subject, which presents one (excellent) answer to this drawback.
Article Continues Under
The primary distinction between Invoice Humphries’s article and the
answer I’ll current right here is that I made a decision to do the precise
URL transformations with a PHP script, whereas his answer makes use of
common expressions in an .htaccess
file.
When you choose working with PHP as a substitute of utilizing common
expressions, and if you wish to combine your answer along with your
dynamic PHP websites, this is likely to be the best methodology for you.
Why fear about URLs?#section2
Good URLs ought to have a type like /merchandise/vehicles/bmw/z8/
or/articles/january.htm
and not one thing like index.php?id=12
. However the latter is the form of URL most publishing programs generate. Are we caught with unhealthy URLs? No.
The thought is to create “digital” URLs that look good and could be listed
by bots (for those who set your hyperlinks this manner additionally) – actually, the
URLs in your dynamic content material can have any type you want, however at
the identical time static content material (that may even be in your server)
could be reached by its common URL.
Once I constructed my new web site, I used to be searching for a method to maintain my URLs
pleasant by following these steps:
- A person enters a URL like www.mycars.com/vehicles/bmw/z8/
- The code checks to see if the entered URL maps to an present static HTML file
- If sure, the file is loaded, if no, step 4 is executed
- The URL string is used to examine if there’s dynamic content material comparable to the entered URL (e.g. in a database).
- If sure, the article will likely be displayed
- If no, an Error 404 or a customized error message will likely be displayed.
A Assortment of instruments#section3
This text will offer you all the knowledge crucial
to implement this answer, but it surely’s extra a group of instruments
than an entire step-by-step information to a completed answer. Earlier than you begin, ensure you have the next:
- mod_rewrite and .htaccess information
- PHP (and a primary understanding of PHP programming)
- a database like mySQL (optionally available)
The index takes all of it#section4
After searching the online and checking some boards, I discovered the
following answer to be probably the most highly effective: All requests (with
some vital exceptions – see under) for the server will likely be
redirected to a single PHP script, which is able to deal with the
requested URL and resolve which content material to load, if any.
This redirection is finished utilizing a file named .htaccess
that
incorporates the next instructions::
RewriteEngine on RewriteRule !.(gif|jpg|png|css)$ /your_web_root/index.php
The primary line switches the rewrite engine (mod_rewrite)
on. The
second line redirects all requests to a file index.php EXCEPT
for requests for picture information or CSS information.
(You have to to enter the trail to your web-root listing
as a substitute of “your_web_root”. Vital: That is one thing like”/house/internet/”
moderately than one thing like“http://www.mydomain.com.”
)
You possibly can put the .htaccess file both in your root listing or
in a sub-directory, however for those who put the file in a sub-directory,
solely requests for information and directories “under” this explicit
listing will likely be affected.
The magic inside index.php#section5
Now that we’ve redirected all requests to index.php, we have to
resolve how you can take care of them.
Take a look on the following PHP Code, explanations observe under.
<?php //1. examine to see if a "actual" file exists..if(file_exists($DOCUMENT_ROOT.$REQUEST_URI) and ($SCRIPT_FILENAME!=$DOCUMENT_ROOT.$REQUEST_URI) and ($REQUEST_URI!="/")){ $url=$REQUEST_URI; embrace($DOCUMENT_ROOT.$url); exit(); }//2. if not, go forward and examine for dynamic content material. $url=strip_tags($REQUEST_URI); $url_array=explode("/",$url); array_shift($url_array); //the primary one is empty anywayif(empty($url_array)){ //we received a request for the index embrace("contains/inc_index.php"); exit(); }//Look if something within the Database matches the request //That is an empty prototype. Insert your answer right here. if(check_db($url_array)==true()){ do_some_stuff(); output_some_content(); exit(); }//3. nothing in DB both Error 404! }else{ header("HTTP/1.1 404 Not Discovered"); exit(); }
Step 1, strains 1-9: examine to see if a “actual” file exists:#section6
First we need to see if a present file matches the request.
(This is likely to be a static html file but additionally a php or cgi script.)
If there’s such a file, we simply embrace it.
On line 3, we examine to see if a corresponding file is within the
listing tree utilizing $DOCUMENT_ROOT
and $REQUEST_URI
. If a
request is one thing like www.mycars.com/bmw/z8/
, then
$REQUEST_URI incorporates /bmw/z8/
. $DOCUMENT_ROOT is a continuing
which incorporates your doc root – the listing the place your internet
information are situated.
Line 4 is very vital: We examine to see if
the request was not for the file index.php itself – if it have been,
and we simply went forward, it might result in an limitless loop!
On line 5, we examine for one more particular case: a REQUEST_URI that
incorporates a “/” solely – that might even be a request for the
precise index file. When you don’t do that examine, it’ll result in a
PHP Error. (We’ll take care of this case in a while.)
If a request passes all these checks, we load the file utilizing
embrace() and cease the execution of index.php utilizing exit().
Step 2, strains 14-28: examine for dynamic content material:
First, we rework the $REQUEST_URI to an array which is less complicated
to deal with:
We use strip_tags()
to take away HTML or JavaScript tags from the
Question String (primary hack safety), after which use explode()
to
break up the $REQUEST_URI on the slashes (”/”). Lastly, utilizingarray_shift()
, we take away the primary array entry as a result of it’s
at all times empty. (Why? As a result of $REQUEST_URI at all times begins with a
“/”).
All the weather of the request string are actually saved in$url_array
. If the request was for www.mycars.com/bmw/z8/, then
$url_array[0] incorporates “bmw
” and $url_array[1] incorporates “z8
.”
There may be additionally a 3rd entry $url_array[2]
which is empty – if
the person didn’t overlook the trailing slash.
The way you take care of this third entry is dependent upon what you need to
do; simply do no matter suits your wants.
What if that $url_array is empty? You’ll have realized that this
corresponds to the case of the $REQUEST_URI containing solely a
slash (”/”), which I discussed above.
That is the case when the
request is for the index file (www.mycars.com or
www.mycars.com/). My answer is to simply embrace the content material for
the mainpage, however you might additionally load an entry from a database.
Some other request is now prepared to make use of. At this level your
creativity comes into play – now you should utilize the URL components to
load your dynamic content material. You may, for instance, examine your
database for content material that matches the question string; that is
sketched in pseudo code on strains 25-28.
Suppose you will have a string like /articles/january.htm. On this
case, $url_array[0] incorporates “articles” and $url_array[1]
incorporates “january.htm.” When you retailer your articles in a desk
“articles” that features a column “month,” your code could lead on
to a question like this:
str_replace (".htm","", $url_array[1]); //removes .htm from the url $question="SELECT * FROM $url_array[0] WHERE month="$url_array[1]"";
You may additionally rework the $url_array and name a script, a lot
as Invoice Humphries suggests in his article. (You want to name the
script through the embrace()
perform.)
Step 3, strains 30-32: nothing discovered.#section7
The final step offers with the case that we neither discovered a
matching static file in the first step, nor did we discover dynamic content material
matching the request – that implies that we have now to output an
Error 404. In PHP that is performed utilizing the header()
perform. (You
can see the syntax to output the 404 above.)
One a part of this process creates a number of vulnerabilities. In step
one, once you examine for a present file, you truly entry the
file system of your server.
Normally, requests from the online ought to have very restricted rights,
however this is dependent upon how rigorously your server is ready up. If
somebody entered ../../../
or one thing like/.a_dangerous_script
, this may enable them to entry
directories under your web-root or execute scripts in your
server. It’s often not that straightforward, however remember to examine a few of
these potential vulnerabilities.
It’s a good suggestion to strip HTML, JavaScript (and possibly SQL) tags
from the querystring; HTML and Javascript tags can simply be
eliminated utilizing strip_tags()
. One other clever factor to do is restrict the
size of the question string, which you might do with this code:
if(strlen($REQUEST_URI)>100){ header("HTTP/1.1 404 Not Discovered"); exit; }
If any individual enters a question string of greater than 100 symbols, a 404
is returned and the script execution is stopped. You possibly can simply add
these (and different safety associated capabilities) at the start of
the script.
Easy methods to take care of password protected directories and cgi-bin#section9
After I had applied the entire thing, I spotted that there
was one other drawback. I’ve some password protected directories,
e.g. for my entry statistics. If you need to embrace a file
in certainly one of these directories, it gained’t work as a result of the PHP Module
has a special person which can’t entry this listing.
To unravel this drawback, it’s essential to add some strains to your
.htaccess file, one for every protected listing (on this instance
the listing /stats/):
RewriteEngine on RewriteRule ^stats/.*$ - [L] RewriteRule !.(gif|jpg|png|css)$ /your_web_root/index.php
The brand new rule on the second line excludes all entry for /stats/
from our redirection rule. The “-
” implies that nothing is finished
with the request, and the [L]
stops execution of the .htaccess if
the rule at this explicit line was utilized. The unique rule
on the third line is utilized to all different requests.
I like to recommend the identical answer in your cgi-bin listing or different
directories the place scripts that take GET
queries reside.
- PHP/mySQL
- Official PHP Web site and Language Reference
- Some glorious Tutorials on PHP and
mySQL - Mod Rewrite
- Official mod_rewrite Docs
- Examples for mod_rewrite