1 | 2 | 3

6 - 10   [11]

Harmonising Link Prefetching And Logging

Also relates to PHP and Firefox and Co

As part of my ongoing redevelopment of this website and in particular my weblog itself I have been building a set of classes to provide visitor statistics. The objective is a simple lightweight solution to summarise referrers, page counts, countries and other similar information per request and without the overhead of trawling the Combined Log Format. With PEAR_DB providing the database abstraction, storing the data is a simple case of compacting all the informative variables and running an auto-execution statement. Of course on the first test drive in Firefox, doubling up of log entries reared its head. This is due to the use of the <link rel="next" href="" title=""/> element providing accessible navigation to the next entry. All the Mozilla browsers (since the days of Netscape 7.x and Mozilla 1.2) will send an HTTP request for the URI in the href attribute to improve performance, thus appending an additional, yet misleading, log entry.

Fortunately there is a simple solution to this which can be resolved in the PHP routine. The browser sends a custom header to tell the server that this is a prefetch request:

X-Moz: prefetch

So in the PHP routine, prefetched pages can be excluded with the following conditional:


if (isset($_SERVER["HTTP_X_MOZ"]) &&
    $_SERVER["HTTP_X_MOZ"] == "prefetch") {
  // do not log the request
  return false;      
}
else {
  // log the request
}

This discovery, all thanks to Live HTTP Headers has been a great salvation to me, since the double logging had caused me many headaches in the past. Accurate logs are essential for good SEO and providing valuable and relevant future content to visitors. But with the current growth in Mozilla based browsers (Firefox in particular) I have found logs becoming more and more distorted. Neither removing relative links nor resorting to Internet Exploder are desirable solutions.

Information on Link Prefetching seems to be sparse but here are the main pages from Mozilla:

Unfortunately this solution does not resolve the general quibble I had with overstated logs and Webalizer. The last release of Webalizer actually preceded the introduction of the prefetch HTTP header. This is a hugely popular log analyzer that seems to be present across a wide range of web servers and hosting packages - makes me wonder how many people are optimistically interpreting over-stated logs? I suppose, with accessibility still on a gradual advance to the mainstream, and the use of the link element only appearing on progressive and forward thinking sites, probably not actually that many.

I have currently been test driving an excellent and very detailed alternative log file analyzer called AW Stats. Well worth a test drive.

Posted on Jul 12, 2004 at 01:33:14. [Comments for Harmonising Link Prefetching And Logging- 1]

Next Link Misleads Logs

Also relates to Web Standards and Accessibility, Firefox and Co

This is a very annoying quirk that took me far too long to track down! When calling pages on my test platform using a new caching system and a package of builder classes for parsing XML based metadata a second call for a page was being made to the server. In short, the page in question was that defined as next in the relative links:

<link rel="next"
           href="/uri_of_page/"
           title="Description of the page/>

At first I thought this was down to a coding error in one of the PHP classes. Fortunately browsing the HTTP headers and a timely flick through the HTML 4.01 Specification abated my growing frustration. The specification states the following for the next link type:

Refers to the next document in a linear sequence of documents. User agents may choose to preload the "next" document, to reduce the perceived load time. http://www.w3.org/TR/html401/types.html#type-links

Well it is the second sentence that is clearly significant here. The second call for a page to the server abides to the above recommendation. In fact on further scrutiny I discovered this problem is distinctive to Mozilla family which utilise the link toolbar (I did not take further time out to test on Opera).

I see a major conflict in this. The relative links have been adopted as important elements for accessible navigation and are a recommended technique for achieving several guidelines in the WCAG at level 2 and 3. Yet, as long as user-agents abide to the above statement regarding the next link type, there is a conflict of interest between attaining accessibility and maintaining reliable web server logs! This is especially so with Webalizer, the most popular open source software for producing tabular and graphical statistics from Apache's Common Logfile Format. For each pre-loading call to the webserver will register an additional page hit for that page in the access log. If the user actually visits the page via the next link a second page impression will be registered!

Fortunately for once, Exploder is beneficial here, since it has no idea whatsoever about relative links, and therefore the above over-logging will not occur. Sadly this may explain why on a number of my clients sites (this site included) Mozilla and co are falsely striving into the lead on the browser stakes! One solution may be to flush out the pre-loading entries by looking for calls for two separate pages from the same IP address within a second or two of each other. If I can find a minute may call for some Awk and Sed.

Posted on May 21, 2004 at 23:42:26. [Comments for Next Link Misleads Logs- 0]

View PHP Source

Also relates to PHP and DOM Scripting

Here is a cute little bookmarklet to tie in with the source file for viewing underlying PHP as outlined in the PHP Manual. The source file should be placed in the server root directory for the main server or a virtual host, and the Location directive set accordingly in the Apache configuration file (See the PHP Manual). This code snippet simply performs a regular expression replace to change the location.href to point to the source file.


javascript:var re=/\/\/([-a-z\.]+)\//i; _
window.location.href= _
window.location.href.replace(re,"//$1/source/");

This code can be pasted into a new bookmark ( _ just represents line continuation), or drag this PHP Source link onto your bookmarks. I find this quite useful when walking through a site on my test platform, and it complements the many client side developer utilities in the Firefox Web Developer toolbar and these comprehensive Web Development Bookmarklets.

Posted on Mar 27, 2004 at 04:49:49. [Comments for View PHP Source- 1]

PHP 5 On Demand

Also relates to PHP and DOS

For all its anachronistic tendencies, my Windows 98 test platform is surprisingly robust when a new challenge arises. I have been itching to try out the PHP 5 Release Candidate 1, and have had the binary on my desktop for the past few days waiting to be unzipped. My sole apprehension was an install that did not interfere with the current PHP 4 configuration and all the in-progress work that relied on it. Acquiring a few tips from the articles already available on concurrently running PHP 4 and 5, the solution was actually quite simple and very quick to initiate.

Since I only boot Apache server on demand, all that was needed was a separate httpd configuration file to load the PHP 5 Apache module, with updated <a href="http://httpd.apache.org/docs/windows.html#cmdline" title="Online Apache Documentation for LoadModule Directive">LoadModule</a> and <a href="http://httpd.apache.org/docs/mod/core.html#addmodule" title="Apache Documentation for AddModule Directive">AddModule</a> directives:


LoadModule php5_module c:/php5/php5apache.dll
AddModule mod_php5.c

Then it was just a case of handling the php.ini file. When PHP is run as a module in Apache, the ini file must be either located in the Apache root folder or the System root folder. I did not want to interfere with the current ini file for the PHP 4 configuration (already located in the System root folder), and wanted to avoid performing a rename/copy/paste action every time Apache was booted with PHP 5.

So I placed the PHP 5 ini file in the Apache root folder (read before the System root folder) under a different name, and set up the following batch file to rename it before booting Apache with the alternate httpd configuration file:


@ECHO OFF
CLS
RENAME php.ini.v5 php.ini
"C:\Program Files\Apache Group\Apache\Apache.exe" _
  -f "C:\Program Files\Apache Group\Apache\conf\httpd.php5.conf"

The reverse is required when Apache is shut down, to ensure the version 5 ini file is not called next time Apache is booted with version 4:


@ECHO OFF
CLS
CD "C:\Program Files\Apache Group\Apache"
apache -k shutdown
RENAME php.ini php.ini.v5
EXIT

Finally, with a shortcut to each of these two batch files, placed somewhere nice and accessible, I can now revel in the delights of the newly improved OOP and XML features, and the in built SQLite database, and revert back to PHP 4 when work needs to be completed.

Posted on Mar 27, 2004 at 04:48:14. [Comments for PHP 5 On Demand- 0]

PHP Session Management

Also relates to PHP

I have rarely been content with session management in PHP. Since the web is stateless, a session must be maintained on a login site. The optimal way to achieve this is to place a session cookie, in the form of a session ID (a 32 byte alpha-numeric string), on the client. When the client has cookies disabled PHP appends the session ID to the end of each URI on the page. This latter practice comes with inherent complications. Server redirects must have the session ID hard coded into the source, which I sometimes find complicated to manage in complex validation and redirection scripts over multiple pages. Security Focus also highlights the major vulnerability of passing the session ID in the URI for PHP versions prior to 4.3.2.

When testing with cookies disabled I have also found browsing Back a number of pages can inadvertently destroy the session. If not coded carefully, the outcome can be a blank page, leaving the user confused and likely to just leave the site. My opinion is that the safest and most usable solution is to only allow the login process to proceed if cookies are enabled. If they are not a message is presented to the user with recommendations and advice on enabling session cookies and the implications of doing so.

So, the challenge is creating a user friendly process, where the user is informed immediately if they are unable to log in, and directions are given on enabling session cookies. The problem is that when a page first loads, the cookie is only passed to the client, so there is no way of knowing if the cookie has actually been set, and hence whether cookies are enabled. The following code snippet resolves this.


if (! isset($_COOKIE["PHPSESSID"]) && ! isset($_GET["PHPSESSID"]))
{  
  $sess_id = substr(SID,(strrpos(SID,"=")+1));
  $redirect_url = CMS . "init.session_" . $sess_id;
  header("Location: " . $redirect_url);
  exit;
}
elseif (! isset($_COOKIE["PHPSESSID"]) && isset($_GET["PHPSESSID"]))
{  
  $cookies_disabled = true;
}
else
{
  $cookie_disabled = false;
}

First the code tests for the cookie, and if it does not exists it looks for the PHPSESSID variable in the global $_GET array. If the tests fails, then the user has arrived at the page for the first time so it is not possible to tell if cookies are enabled or not. So, a server redirect is done to the same page (here defined by CMS in the initialisation file) with a custom string appended to the end. A custom string is used to enhance security, and the Apache mod_rewrite module is used to rebuild the URI as follows:


RewriteEngine On
RewriteRule init.session_([a-z0-9]{32})$ /cms/?PHPSESSID=$1

When the page reloads it runs the same test again. This time it passes the second condition and will only fail on the first condition if cookies are disabled. At this point a notice can be displayed to the user advising them that cookies must be enabled to actually log-in. To enhance the usability, if $cookies_enabled $cookies_disabled is true, a class is added to the login form, which overrides the normal CSS declarations, disabling the input fields and striking out corresponding labels. To aid users of assistive technologies, a title is also appended to each label stating that fields are disabled because cookies are required.

Perhaps this is common practice, but I recall in the past having trouble finding useful information for PHP session management best practices, and this seems to be an effective approach. The whole process is transparent to the user, and incorporating mod_rewrite to cloak the session id raises the level of security for the initial server redirect in which it is passed, as well as replacing the aesthetically displeasing URI string:

http://mysite/login/?PHPSESSID=sessionid

with the more comprehendable:

http://mysite/login/init.session,sessionid

If the initialisation variable session.name is changed to something other than PHPSESSID (or session_name(string) is called) then the URI becomes much harder to hack.

Posted on Nov 10, 2003 at 02:19:13. [Comments for PHP Session Management- 6]

Breadcrumbs Trail

[ Home ] -> TW Blog -> Apache
Site Map

The Severn Solutions website achieves the following standards:

[ XHTML 1.0 ] [ CSS 2 ] [ WAI AA ] [ Bobby AA ]

Page compiled in 0.013 seconds