Articles about .htaccess
This is a follow on post to the 'Using Apache to block Spammers' post.
It shows how to use Includes in your Apache configuration to re-use useful rules.
Log files can get filled up with repeated calls to files such as favicon, robots.txt, images, css js etc
This can be a pain when you need to scan the logs for issues and they are full of unimportant requests.
This is especially so if you use Ultimate Cron in Drupal and run cron every minute - the logs get swamped with the cron calls.
Mostly you want to log the initial request for a page and not all of the resources subsequently requested.
Troubleshooting other issues may mean you would log files such as favicon, images etc - but generally they needlessly fill up your logs.
There are certain PHP files that you want access to but don't want to make public.
Common examples of these are:
You also don't really want to deploy these on all of your sites on a server nor have them in your git repositories for sites.
A neat way of dealing with this is to use rewriting in your web server config files (e.g. Apache, NGINX, IIS etc) to do the following:
Last month I wrote a post about blocking spammers from a Drupal (or any Linux / Apache) site by identifying the originating IP address from the watchdog table.
By way of an update I thought I'd share a way you can do this using the Apache configuration. Ideally this would be done in the vHosts/Httpd files if you manage your own server but works equally well within the .htaccess files that most people have access to on shared hosting.
If you are developing commerce sites and review your logs regularly, chances are you will come across 404 errors looking for crossdomain.xml. We get a lot from the plugins that looks for coupons on e-commerce sites (e.g. Drop Down Deals). In fact you are likely to get them on any sites you develop - but we have seen them more frequently on ecommerce sites.
A general housekeeping task for CMS systems such as Wordpress and Drupal and other websites and good practice to keep your site SEO high is to make sure you are gracefully handling missing pages (404 errors).
One of the routine tasks to carryout is checking for crawl errors in Google Webmaster tools. If you see any missing pages in the list it is worth making sure you have some measures in place to handle these and ideally issue a 301 redirect so that Google and other search engines update their indexes.
Whether you are running Drupal,Wordpress, Expression engine, Joomla or in fact any web site one of the regular tasks you should carryout on your web site is a bit of log analysis. It is often left up to modules, plug ins or someone else to protect your web site until it too late.
We all rely on Google Analytics to tell us about visitors and maybe use our log analysis software (AWStats, Webaliser etc) to report on log entries - but it is always worth using tools locally to dig deeper into your logs. These can range from simple reports on accesses to your site to more detailed forensic analysis of site activity.
By doing this we get to know better how visitors are accessing our site and can uncover some interesting answers to questions such as:
- How often is Google actually spidering my site?
- How many errors am I getting and what are they?
- Who is stealing my content?
- Is anyone trying to crack my site?
In this post I will briefly cover some useful techniques to analyse you logs and see if any one is abusing your hospitality.
Issue: In a Drupal site a logged in user gets a 403 page (access denied) if they browse to the user/login page. Well? (I hear you ask) Why do you want to see the login page if you are logged in?? Good question - but it is not really an error or an access denial issue is it - it is more a 'user path/flow' issue. A good solution would be to see the user's own profile page.
It is easy to forget that the files in your web site are visible to anyone even if they are not linked to or are not files normally requested. In this post we look at how to use the.htaccess file to control access to your site.
One of the trends these days is to lose the www. from your domain name.
Arguments for are usability (e.g. you don't have as much typing to do - it is easier to not have to say dubyadubyadubya every time you give out your web address).
There are counter arguments of course to do with cookie control and sites that have sub-domains etc.
What is important though is that you choose one and stick to it. In any event you should decide for one and not allow both.
Supporting both will cause duplicate content in Google and you may suffer in SEO terms.
I use the .htaccess file a lot on hosted servers. On our own servers I prefer to use the httpd.conf as it performs better and is not reevaluated on every request. But if you are on a hosted server the .htaccess is your earliest port of call for handling incoming traffic and can be more efficient than using modules for certain tasks. One common gotcha is how to discard the querystring for a redirect.
Quick note: On a site I worked on recently I made a change to the Pathauto settings and needed to create a load of Redirects for previous URLs. Fortunately there is a quick way to do this using wildcards in the htaccess.