howtos » 2007 » May
A few sites I administer have recently had the misfortune of having spambots visit their enquiry and contact pages. These pages usually have a contact form, where an enquirer can leave their name, e-mail, and request or comment. When they submit the form, a copy of the message is e-mailed to the site owner. The spambots try and submit messages that usually contain gibberish but also multiple URLs to spam sites. Something had to be done to prevent site owners from receiving hundreds of spam messages a day.
I considered a few methods for preventing the bots from visiting the enquiry page. These included firewall configuration, user-agent detection, rudimentary parsing of the messages, captcha systems, and so forth. These methods were either too cumbersome to implement, could be circumvented, or spoiled the user experience for a genuine user. The latter was a critical concern.
Enter Akismet
The Akismet API is an open API used to assess the spam score of comments left or enquiries made on a site. It is in widespread use as a plugin for WordPress blogs. Its effectiveness has become a must-have plugin for WordPress installations. The Akismet API however can be applied to any site or application capable of making HTTP requests.
First you need an API key. You can obtain one by registering for a WordPress.com user account (you do not need to have an active WordPress blog). Your API key will be e-mailed to you once you have activated your account.
Akismet and PHP5
Download the PHP5 Akismet Library.
Extract the contents of the downloaded package and place them in a location that your application can access when required.
Here's how to use the Akismet API in your PHP5 code.
-
require_once('Akismet.class.php');
-
-
$API_key = 'xxxxxxxxxxxx';
-
$source_url = 'http://www.mysite.com/contact.php';
-
-
$akismet = new Akismet($source_url, $API_key);
-
$akismet->setCommentAuthor($enquirer_name);
-
$akismet->setCommentAuthorEmail($enquirer_email);
-
$akismet->setCommentContent($enquiry);
-
-
if ($akismet->isCommentSpam()){
-
//Enquiry is spammy - log it for later review by site owner
-
//If false positive, be sure to submit to Akismet so that it can learn from
-
// its mistake. Use Akismet::submitHam()
-
} else {
-
//Enquiry is not spammy - e-mail it to the site owner
-
//If false, be sure to submit to Akismet so that it can train itself better.
-
// Use Akismet::submitSpam()
-
}
-
-
//Below are other Akismet methods that you could call
-
$akismet->setCommentAuthorURL($enquirer_url);
-
$akismet->setCommentType($enquiry_type); //{'blank', 'comment', 'trackback', 'pingback', or custom}
-
$akismet->setPermalink($url); //A permanent URL referencing the resource for which a comment is being left for
Akismet and PHP4
Download the PHP4 Akismet library. Extract the contents of the downloaded package and place them in a location that your application can access when required.
Here's how to use the Akismet API in your PHP4 code.
-
require_once('Akismet.class.php');
-
-
$API_key = 'xxxxxxxxxxxx';
-
$source_url = 'http://www.mysite.com/contact.php';
-
-
'email' => $enquirer_email,
-
'website' => $enquirer_uri,
-
'body' => $enquiry,
-
'permalink' => $this_page_uri,
-
'user_ip' => $referrer_ip, // optional, defaults to $_SERVER['REMOTE_ADDR']
-
'user_agent' => $client_ua, // optional, defaults to $_SERVER['HTTP_USER_AGENT']
-
);
-
-
$akismet = new Akismet($source_url, $API_key, $comment);
-
-
// test for errors before submitting to Akismet
-
if($akismet->errorsExist()) {
-
if($akismet->isError('AKISMET_INVALID_KEY')) {
-
//...
-
} elseif($akismet->isError('AKISMET_RESPONSE_FAILED')) {
-
//...
-
} elseif($akismet->isError('AKISMET_SERVER_NOT_FOUND')) {
-
//...
-
}
-
} elseif ($akismet->isSpam()) {
-
//Enquiry is spammy - log it for later review by site owner
-
//If false positive, be sure to submit to Akismet so that it can learn from
-
// its mistake. Use Akismet::submitHam()
-
} else {
-
//Enquiry is not spammy - e-mail it to the site owner
-
//If false, be sure to submit to Akismet so that it can train itself better.
-
// Use Akismet::submitSpam()
-
}
If you are dealing with multiple users in different timezones or simply want to display times in a timezone other than your server's settings, it is best to store timestamps as their UTC (~ GMT) equivalents. When you read those timestamps later, you can convert them to local time.
Local time to UTC time
date_default_timezone_set sets the default timezone for all date & time operations.
gmmktime is analogous to mktime except it takes in local time values and creates the corresponding UTC timestamp.
gmdate similarly takes in local time values and creates the corresponding UTC date & time.
UTC time to Local time
-
$ts_utc = read_timestamp_from_db(); //Some custom function in your script
-
date_default_timezone_set('Australia/Sydney');
-
$ts_local = $ts_utc + $offset;
The following is a list of HTTP Status Codes returned by a web server and what they mean. The information is summarised from RFC2616 Section 10.
Informational 1xx
Informational responses that contain no message bodies.
100 Continue - The client should continue with the request. Either the request is in a queue or the client should proceed with transmitting the rest of the request.
101 Switching Protocols - sent if the client requests a switch of protocols (e.g. from HTTP/1.0 to HTTP/1.1) and the server can comply.
Successful 2xx
Client's request was successfully received, understood and accepted.
200 OK - The request was successful and here is the result of the request.
- GET -> entity corresponding to the requested resource
- POST -> entity describing or containing the result of the request
- HEAD -> entity-header fields for the requested resource (no message-body)
- TRACE -> an entity containing the request message as received by the end server
201 Created - A new resource has been created in response to the request. The URI of the new resource is contained in the Location header field. Additional URIs may be defined in the message-body.
202 Accepted - The request is in queue to be processed or is currently being processed. There is no way to find out later as to the status of your queued request. So the entity body should contain information such as: a URI to go to check status, estimated time of completion, number in queue, etc.
203 Non-Authorative Information - "The returned metainformation in the entity-header is not the definitive set as available from the origin server, but is gathered from a local or a third-party copy"
204 No Content - The server has fulfiled the request, but does not need to return a message-body. Meta information may be included in the entity header. If the client is a user agent, it should not change the current document view.
205 Reset Content - The server has fulfilled the request, and the user agent should reset the document view (e.g. clear all form field input values).
206 Partial Content - If the Range or If-Range header fields were used, the request returns partial content corresponding to the requested range.
Redirection 3xx
User agent needs to take further action to fulfil the request.
300 Multiple Choices - includes an entity containing a list of resource characteristics and locations from which the user-agent makes the appropriate selection. The entity format is determined by the value of the Content-Type header field.
301 Moved Permanently - indicates that the requested resource has been permanently moved to a new location. The new location should be included in a Location field.
302 Found - The requested resource resides temporarily at another location. The temporary location should be indicated in the Location field.
303 See Other - The response to the request can be found at another location and should be retrieved using a GET method. Used for a POST-activated script to redirect the user-agent to a status (or similar) page. New location should be indicated in the Location field.
304 Not Modified - in response to a conditional GET request. Must not contain a message-body.
305 Use Proxy - the requested resource must be accessed via the proxy indicated in the Location field.
307 Temporary Redirect - similar to the 302 code. Temporary location is indicated in the Location field.
Client Error 4xx
4xx errors indicate an error that occurred at the client's end (eg. malformed URI, bad request, etc).
400 Bad Request - the request contained malformed syntax.
401 Unauthorized - request requires authorization. The client may repeat the request with a suitable Authorization header field. If an Authorization field was included, the 401 response indicates that the authorization failed.
403 Forbidden - the resource cannot be accessed even with authorization.
404 Not Found - the server could not find anything matching the request URI.
405 Method Not Allowed - the method specified in the Request-Line is not allowed for the requested resource.
406 Not Acceptable - the resource is only capable of generating response entities with a Content-Type that is different to what is indicated in the accept header fields.
407 Proxy Authentication Required - the client must authenticate itself with the proxy.
408 Request Timeout - client did not produce a request within the time that the server was prepared to wait.
409 Conflict - request could not be completed due to a conflict with the current state of the resource. For example, race conditions with a PUT request.
410 Gone - the resource is no longer located at the URI and a new location is not known.
411 Length Required - Content-Length field must be included.
412 Precondition Failed
413 Request Entity Too Large - request entity is larger than what the server is willing to handle.
414 Request-URI Too Long
415 Unsupported Media Type - entity of the request is in a format not supported by the requested resource for the resource method.
416 Requested Range Not Satisfiable
417 Expectation Failed - expectation in the Expect field could not be met.
Server Error 5xx
An error occurred on the server's side.
500 Internal Server Error
501 Not Implemented - server does not recognise the request method OR cannot support the request method for the requested resource.
502 Bad Gateway - the server while acting as a proxy received an invalid response from an upstream server.
503 Service Unavailable - server is currently unable to handle the request. Length of the delay can be indicated in the Retry-After header field.
504 Gateway Timeout - the server while acting as a proxy did not receive a timely response from an upstream server.
505 HTTP Version Not Supported
To prevent the caching of a web page on your client's end, use the following snippet of PHP to ensure that the appropriate HTTP headers are sent.
The first header tells the client that the page must not be cached.
The second header is a backup, and tells the client that the page expired a long time ago in the past, and it should fetch a more recent version.
The same effect can be achieved by placing corresponding meta tags in your HTML document
Where possible, place your cache control directives in the HTTP header, because some clients and proxies rely on your server's HTTP response to determine caching.