Category Archives: World Wide Web

Net neutrality should be about user experience

stopI just read this article about how Cisco believes that net neutrality rules need to allow for bandwidth shaping.

I believe they’re missing the point entirely.

Right now the issue is that infrastructure owners are playing games with the prioritization of bits, in order to provide leverage for charging tolls to content providers (I’m coining the word “trollboothing“, if it doesn’t exist already, to describe this). The result is a loss for consumers of content, because their internet experience is degraded (sometimes severely). Continue reading

WordPress comments on home page

I recently built a modified version of WordPress’s Twenty Eleven theme for a graphic designer on behalf of their customer.

One of the things that the customer wanted was to put the “comments” form onto the bottom of each post on the home page of the blog.

That turned out to be pretty easy to do. In the index.php file inside the template, just add the following line of code under the part that outputs the post content:

 <?php comments_template( '', true ); ?>

The problem, as we discovered, is that when users replied to comments, the new comments were all appearing under the first post on the page.

My initial guess – which turned out to be wrong – was that there was some kind of bug in WordPress which had recently been introduced.

I dug a little deeper and looked at the stock comment-template.php file that comes with WordPress. When it builds the “reply-to” links, it creates a relative link based on the current page (i.e. normally that would be the blog post page itself), and then also adds in some javascript which keeps the user on the page and defaults the comment form correctly.

The problem was that the javascript necessary wasn’t included into the current page.

The fix consisted of adding the following line into the top of the index.php file:

wp_enqueue_script( 'comment-reply' );

That still leaves a side issue – what happens if a user has javascript disabled?

The correct fix would be to include a comments-template file into the theme, including a fix that includes the post id (when necessary) in the link url. The side issue with that (not really a big deal, but anyhow) is that you would then need to modify any other code in the theme that calls the comments template, to ensure it is calling the new one in the theme instead.

i.e. search for:

<?php comments_template( '', true ); ?>

and replace with:

<?php comments_template( '/newcomments-template.php'', true ); ?>

Not a big deal, but annoying.

Ideally though, this is something that WordPress should fix in their default code and/or something that theme developers should include by default.

Yahoo should merge with RIM

I’m been bouncing this idea off people at both companies for the past week, with mixed feedback. I think the idea could work though. I’m interested in hearing feedback.

The two companies are roughly the same size, so this would be a merger of equals.

It provides some temporary bandaid solutions for both companies executive teams and boards (I think there’s enough talent at the top between the two companies to address some of the gaps).

Yahoo! (correct me if I’m wrong) was part of the team that bought Nortel’s patents, so there’s already some kind of mobile intent. And RIM looks like it could use some bolstering.

The real rationale is fairly simple though – the combined company would have a number of options for strategic direction, and would be large enough to stand on its own if it so chose.

If it decided to sell out (hint: Microsoft), the combined patent portfolio (in addition to Y!’s advertising business) would ensure a far more equitable price.

When will virtual currencies be useful?

I currently have small amounts of money floating around in a variety of virtual currencies. In some cases, I can convert those currencies to other virtual currencies or to real world money (i.e. there’s a slow process to move paypal money into my bank account).

It occurs to me though that it would be very useful if I could pay real world bills (think groceries or mortgage) directly using virtual currency.

Before that can happen, there would need to be a lot more transparency (i.e. no grocery will accept magicbuxx if they don’t know how much they are worth, or whether they can in turn get value out of them), and a whole lot of big institutions like banks and payment portals would need to sign on too. There would also need to be physical mechanisms that can transfer the payments (i.e. the new mobile payment technology that is slowly being adopted by cellphone manufacturers would be helpful).

I wonder how we can make that happen. It would be very nice to be able to go to a restaurant with a pile of Facebook credits, or Bitcoins.

Heavy Traffic – Lessons Learned

In the past 15 or 16 years, I’ve worked on a number of websites that had fairly significant traffic (mostly in the form of unique daily visitors – there’s many ways to measure traffic). In one specific case, the traffic on a well-known author’s website spiked significantly (several thousand unique visitors per day) after his appearance on a television talk show. The website, although database driven, primarily consisted of articles, along with a store – and even on shared hosting, this wasn’t a problem.

Recently, my company built an online “live auction” website for a customer, a project which posed a number of interesting challenges and learning experiences (the hard way, of course) regarding how to build a site that has heavy traffic. In this case, the nature of the website requires that all users see information that is current and accurate – resulting in a need for AJAX calls that run repeatedly on a per second basis per user. This project is the first one that I have worked on that required serious optimization work; typically even the heaviest custom development that my team works on is primarily focused on business use cases rather than things like speed or algorithm design; not so here.

The “coming soon” page, long before the site was launched, already received several hundred unique visitors per day (based on Google Analytics). The site launched with more than 500 registered users (pre-registration via the coming soon page), and traffic spiked heavily following launch. The initial traffic spike actually forced the site to close for several days, in order for our team to rework code. The re-launch was preceded by several Beta tests that involved registered users. Bear in mind that a registered user on most sites isn’t individually responsible for much server load. On this particular site, each user is receiving at least one update per second, each of which may involve multiple database calls.

The following is a description of some of the issues we encountered, and how they were addressed or mitigated. In some cases, work is ongoing, in order to adapt to continued growth. In many cases, the challenges that we encountered forced me to revise some assumptions I had held about how to approach traffic. Hopefully the following lessons will save a few people the weeks of sleep deprivation that I went through in order to learn them.

Project Description:

  • Penny Auction website
  • Technology: PHP (Zend Framework), Javascript
  • Server: Various VPS packages (so far)
  • Description of traffic: All users receive one data update per second; there are additional data updates every 3 seconds, and once per minute.

1. Don’t Rely Too Much On Your Server

Many web developers build code that simply assumes that the server will work properly. The problem is that under heavy load, it isn’t at all uncommon for servers to actually not function in the way that might be expected. Examples include things like file resources dropping, database calls dropping – sometimes without intelligible error codes, and even things like system time being unreliable. The following are a couple of specific examples we encountered:

a) PHP time() – When developing in PHP, it is very common to rely on function calls such as time() (to obtain system time, in UNIX timestamp form) for algorithms to work properly. Our setup involved a VPS with multiple CPUs dedicated to our use, and the ability to “burst” to more CPUs as needed. As it turned out, whenever our server went into burst mode, the additional CPUs reported different system times than “our” CPUs did. This is probably an issue with the underlying VPS software, but we didn’t have the luxury of investigating fully. This meant that rows were frequently (as in: about one quarter of the time) saved in the wrong order into the database, which is a serious issue for an auction website! When possible, use a timestamp within SQL code (i.e. MySQL’s TIMESTAMP() function) instead. Fixing the system time on the other VPS partitions wasn’t feasible, since they “belonged” to a different customer.

b) Not every database call will work. Under heavy load, it isn’t at all unusual for a SQL insert or update statement to be dropped. Unless your code is designed to check error statements, and handle retries properly, your site will not work.

2. Pick Your Hosting Company Wisely

We launched the project on one of our hosting company’s smaller VPS packages. We quickly went to one of the middle-range packages, discovered it was also insufficient, and then switched to the largest package that they offer.

In the process, we also entered a number of second tier or higher tickets into their system, including serious operating system level problems.

Luckily, we chose a hosting company that responds quickly to issues, and whose staff are familiar with the types of issues we encountered.

This isn’t something to take for granted. Not every hosting company has the ability to quickly and seamlessly transition a site through different packages on different servers, nor do they necessarily have tier 3 support staff who can address unusual support requests.

In this case, our conversations with the company seem to indicate that they have never seen a new site with this level of load in the past; they still worked valiantly to assist us in keeping things running.

3. Shared Hosting, VPS, Dedicated, Cloud Hosting?

In our previous experience, when a hosting company sells somebody a dedicated server, the notion is that the customer knows what they are doing, and can handle most issues. This occurs even where a SLA (service level agreement) is in place, and can seriously effect response time for trouble tickets.

As a result, our first inclination was to use a VPS service. Our decision was further supported by the level of backup provided by default with VPS packages at our chosen vendor. A similar backup service on a dedicated server of equivalent specifications appeared to be much more expensive.

One of the larger competitors of our customer’s site currently runs under a cloud hosting system. We are continuing to look at a variety of “grid” and cloud hosting options; the main issue is that it is extremely hard to estimate the monthly costs involved in cloud hosting, without having a good handle on how much traffic a site will receive. It isn’t unusual for hosting costs to scale in such a way as to make an otherwise profitable site lose money. That said, we will likely have to transition over to a cloud hosting service of some kind at some point in time.

4. Database Keys Are Your Friend

At one point, we managed to reduce server load from > 100% load, down to around 20%, by adding three keys into the database. This is easy for many web developers to overlook (yes I know, serious “desktop” application developers are used to thinking of this stuff).

5. Zend Framework is Good For Business Logic – But It Isn’t Fast

We initially built the entire code base using Zend Framework 1.10. Using Zend helped build the site in a lot less time than it would otherwise have taken, and it also allows for an extremely maintainable and robust code base. It isn’t particularly fast, however, since there’s significant overhead involved in everything it does.

After some experimentation, we removed any code that supported AJAX calls from Zend, and placed it into a set of “gateway” scripts that were optimized for speed. By building most of the application in Zend, and moving specific pieces of code that need to run quickly out of it, we found a compromise that appears to work – for now.

The next step appears to be to build some kind of compiled daemon to handle requests that need speed.

6. Javascript

Our mandate was to support several of the more common browsers currently in use (mid-2010), including Firefox, IE7-9, Opera, and – if feasible – Safari.

The site is extremely Javascript-intense in nature, although the scripting itself isn’t particularly complex.

We used Jquery as the basis for much of the coding, and then created custom code on top of this. Using a library – while not a magic solution in itself – makes cross-browser support much, much easier. We’re not very picky / particular about specific libraries, but have used Jquery on a number of projects in the past couple of years, to generally good results.

Specific issues encountered included IE’s tendancy to cache AJAX posts, which had to be resolved by tacking a randomized variable onto resources; this, unfortunately, doesn’t “play nice” with Google Speedtest (see below).

We also had a serious issue with scripts that do animated transitions, which resulted in excessive client-side load (and thus poor perceived responsiveness) in addition to intermittantly causing Javascript errors in IE.

Javascript debugging in IE isn’t easy at the best of times, and is made more complex by our usage of minify (see below) to compress script size. One tool that occasionally helped was FireBug Lite, which essentially simulates Firefox’s Firebug plugin in other browsers (but which also sometimes can change the behaviour of the scripts being debugged). The underlying issue is that IE does a poor job of pointing coders to exactly where a script crashed, and the error messages tend to be unhelpful. The debugging method in IE basically boils down to a) downloading a copy of the minified resource in the form that the browser sees it, b) using an editor with good row/column reporting (I often use Notepad++) to track down roughly where the error occurs, and c) put in debug statements randomly to try and isolate the problem. After working with Firebug for a while, this is an unpleasant chore.

7. Testing Server

Long before your site launches, set up a separate testing server with as close to a duplicate of the live environment as possible. Keep the code current (we usually try to use SVN along with some batch scripts to allow quick updating), and test EVERY change on the test site before pushing the code over to the live server. Simple, but frequently overlooked (I’m personally guilty on occasion).

8. CSS

Designers and web developers often think of CSS purely in terms of cross-browser compatibility. Building sites that actually work in major browsers goes without saying, and based on personal experience, CSS issues can lead to a lot of customer support calls (“help, the button is missing”) that could be easily avoided. In the case of this specific project, we actually had to remove or partially degrade some CSS-related features, in order to provide for a more uniform experience across browsers. Attempting to simulate CSS3 functionality using Javascript is not a solution for a heavy-traffic, speed-intensive site; we tried this, and in many cases had to remove the code due to poor performance.

An often overlooked CSS issue (which Google and Yahoo have started plugging – see below) has to do with render speed. Browsers view documents essentially like a multi-dimensional array of elements, and specifying elements in an inefficient way can actually have a significant effect on the apparent page load time for users. It is well worth your while to spend some time with Google Speed Tester (or Yahoo’s competing product) in order to optimize the CSS on your site for speed.

9. Why Caching Doesn’t Always Work

Caching technology can be a very useful way of obtaining additional performance. Unfortunately, it isn’t a magic bullet, and in some cases (i.e. our project specifically), it can not only hurt performance – but can actually make a site unreliable.

High traffic websites tend to fall into one of two categories:

On the one hand, there are sites such as Facebook, whose business model is largely based on advertising; what this means is that if user data isn’t completely, totally current and accurate, it is at most an annoyance (“where’s that photo I just uploaded?”). Facebook famously uses a modified version of memcached to handle much of their data, and this kind of caching is probably the only way they can (profitably) serve half a billion customers.

On the other hand, financial types of websites (think of your bank’s online portal, or a stock trading site) have business models that pertain directly to the user’s pocketbook. This means that – no matter how many users are available, or the volume of data – the information shown on the screen has to be both accurate and timely. You would not want to login to your bank’s site and see an inaccurate account balance, right? In many cases, sites of this nature use a very different type of architecture to “social media” sites. Some banks actually have supercomputers running their websites in order to accomodate this.

Underlying the dichotomy above, is the fundamental notion of what caching is all about – “write infrequently, view often”. Caches work best in situations where there are far fewer updates to data than views.

The initial version of our code actually implemented memcached, in an attempt to try to reduce the number of (relatively expensive) database calls. The problem is that our underlying data changes so rapidly (many times per second, for a relatively small number of resources, that are actively being viewed and changed by many users), that caching the data was happening extremely frequently. The result in practice was that some users were seeing out of date cached data, at least some of the time. Abandoning caching in our specific case helped resolve these issues.

10. Speed Optimization

We used Google Speed Test in order to optimize our project. There is a similar competing product from Yahoo as well. These tools provide a wealth of information about how to make websites load faster – in many cases significantly faster.

Among the many changes that we made to the site, based on the information from the tester, were the following:

a) Use minify to combine and compress Javascript and CSS files. No kidding – this works. Not only that, but if you have a large number of CSS files that are loaded in each page, you can run into odd (and very hard to trace) problems in IE, which appears to only be able to handle approximately 30 external CSS files on a page. Compressing and combining these files using minify and/or yui can save you more than bandwidth.

b) Use sprites to combine images into large files. This does not work well in some cases (i.e. some kinds of buttons), but this technique can save precious seconds of load time. We used a Firefox plugin called Spriteme to automate this task, although we didn’t follow all of its suggestions.

c) Validate your HTML. Again, another “no brainer”. The load time saved by having valid HTML will actually surprise many readers. The process of validation is a nuisance, particularly if your site serves up dynamic, user-contributed content. Set aside a few hours for this process, and just do it though. It makes a difference.

11. Don’t Forget Algorithms 101

I took several courses on algorithm design at university, and then did nothing with that knowledge for more than a decade. Surprise, surprise – a complex, multi-user site actually needs proper thought in this regard.

One example from our experience – the data that tracks the status of an auction (i.e. whether it is currently running, paused, won etc etc) can be “touched” by 9 different pieces of code in the site, including “gateway” code that responds to users, and background tasks.

It took significant effort to build a reliable algorithm that can determine when an auction has actually ended, and the task was complicated by the fact that some of the code runs relatively slowly, and it is quite possible for another operation to attempt to modify the underlying data while the first task is still operating. Furthermore, “locking” in this case may have negative ramifications for user experience, since we did not want to unduly reject or delay incoming “bids” from users.

Conclusions

  1. It is very hard to plan ahead of time for growth in a web environment. Sometimes steps taken specifically to try and address traffic (i.e. caching in our case) can actually be detrimental. The process of adapting to growth can actually involve a surprising amount of trial and error experimentation.
  2. Using frameworks can be very helpful for writing maintainable code. Unfortunately its sometimes necessary to work around them, when specific optimization is needed. Proper documentation and comments can help – I try to write as if I’m explaining to somebody really dumb, years in the future, what is going on in my code – and then I’m often surprised when I need my own comments later on…
  3. Work with the right people. Not just your internal team, but also your hosting company etc. This can make a big difference when you are under pressure.
  4. Prepare yourself for periods of high stress. Not much you can do about this, unfortunately. In most cases, it will be unlikely that you will actually have access to the resources you really need to get the job done. Make sure you schedule breaks too. Its hard. Burnout is much harder though.

Website Launch Checklist

A few websites that my company has been working on have launched in the past few weeks. I’ve got a few “secret sauce” activities that I do whenever I launch a new website, such as:

  • Setup Google Analytics
  • Install webmaster tools and sitemap for the site
  • Bookmark it on some social clipping sites
  • Tweet it
  • Create a ping.fm account and some of the key accounts that it supports
  • Try to create a few incoming links using free directory sites
  • Put out a press release

I’d be interested to heard what things you do when you launch a new website.

Reblog this post [with Zemanta]

Microsoft – Twitter Deal

Nathan forwarded me this link from Mashable, with the subject line prefaced with the word “HUGE”.

From what I can tell, it looks like Microsoft is finally starting to put together the pieces of an overall web strategy: determine what Google would like to do and put roadblocks in their way. Hence the previous Yahoo deal.

Its obvious far to early to see if this helps them out. I’m fairly sure though that it means search engines will be displaying a lot more “current” or trending data pulled from profiles and micro-blogging posts.

Reblog this post [with Zemanta]

Google Up To Something Large?

We’ve been noticing a few odd things lately with Google:

  • New sites aren’t getting spidered – or not as quickly as earlier this year. Webmaster tools gives a generic message about the website not being listed in Google’s index, along with a link to a video that seems to mostly be about websites that get themselves banned for violating Google’s terms of service. Also existing websites that are growing aren’t always having the new content added as quickly as before – or rather it happens inconsistantly lately.
  • Google PageRank tools don’t seem to be working any more. I’ve tried a number of them lately.
  • While we’re on the topic of PageRank, it seems to be even less relevant than before. In one controlled scenario where we have many listings showing for a specific search term, a PR5 page is showing on the third page, while much lower PR websites are showing on the first page. Sorry, I can’t be more specific, but it is a fairly controlled scenario. All of the pages involved are similar in size with similar numbers of occurrences of this keyword.
  • Searches are often slow. As far as I can tell, this isn’t just my internet connection. Its been years since I’ve had the Google homepage time out on me.
  • Search listings sometimes change dramatically in short time periods.

All of the above seems to indicate that Google is gearing their entire system up for something big. Speculation among my staff says they’re going to try to make everything realtime (or close to it) in order to compete with Twitter. That means that they’re going to try and reorder all of their indexes very quickly (rather than weekly or possibly daily) in order to try and provide something closer to the immediate zeitgeist that one can obtain through Twitter.

Having some idea of the size of Google’s indices, and a vague notion that the number of servers in their demesne is in the low millions, the scale of this boggles my mind.

Reblog this post [with Zemanta]

How to Setup a WordPress Blog Properly

Image representing WordPress as depicted in Cr...
Image via CrunchBase

Over the past few months, we’ve averaged around one new blog setup per day.

Recently, Nathan (one of our SEO experts – see his article on Buddy Press) and I started putting together a list of the standard things that we do after we install a WordPress blog.

The following assumes some familiarity with WordPress. We’ve started playing around with the latest version (2.8), and I suggest that you use that unless there’s a pressing reason not to (i.e. incompatible plugins).

1. General Config

  • Make sure that you have configured clean URLs in Settings -> Permalinks.
  • Under Settings -> Writing, put in additional locations to ping whenever you update your blog. There is a decent list here.

2. Themes

  • We try to make small changes to all stock themes that we use. This means that search engines are less likely to group your site along with every other blog that is using the same theme.
  • Even better: use a premium theme, or make your own one.

3. Plugins

Our objective with plugins is to automate the process of creating quality meta information for blog entries to the largest extent possible, and to make sure that our blogs talk nicely to search engines. We install the following set of plugins:

  • TagThePress
  • TagMeta
  • PingPressFM
  • Google Sitemap (there’s a few good options)
  • Ultimate Google Analytics

Make sure you configure all of the above. You may need to create some accounts in various places in order for some of the above to work.

If you’re running Firefox, we highly recommend installing the Zemanta plugin.

We used to put tag clouds into the sidebars of all new blogs, but if Sitemaps is working correctly that isn’t necessary (and it can take up a lot of important real estate).

4. SEO Stuff

  • Make sure you have accounts for Google Analytics and Google Webmaster Tools. Use them. Play around with them. Learn how to use them inside and out.
  • Make sure every new blog has one or two posts containing YouTube videos.
  • Getting the right number of tags per post is critical – we try to hit a sweet spot between 10 and 15 tags for each post. This may change depending on search engines.
  • Make sure that your blog is configured to use different page titles and meta tags for each page. Use HeadSpace if necessary to automate this process.

There’s probably a ton of important things I’m missing here (please let me know!), but this is a minimal list of things that you should be doing whenever you setup a new blog (if you want it to perform well).

Reblog this post [with Zemanta]

How Much Can a Blogger Earn?

I saw an interesting article via Slashdot today on how much bloggers make. Couldn’t resist throwing in my two cents. The numbers below are based on a wide range of websites that I’ve either run myself, or helped in the creation thereof.

To reiterate something that Evan Carmichael frequently talks about, the amount earned from Google Adwords is equal to the number of click-throughs, multiplied by the dollarvalue of a click-through. Sounds obvious enough, but there’s a huge divergence in the quality of ads, and that is somewhat dependant on the blogger themselves, since Google tries to place ads topically. You’ll see what I mean below.

Let’s talk about traffic quickly first. Building traffic to a website takes a lot of hard work and tremendous patience, which is why many website owners simple throw up their hands and accept whatever comes their way (or try to drive revenue by paying for traffic themselves – which is a tricky proposition for a blog). I’ve seen many websites that have built up to the low thousands of unique visitors per day though, through a ton of sweat equity. Anything beyond that may be a black swan event, so let’s set that as the upper bar of what the average individual can achieve through hard labour.

The value of an ad on a website is largely driven by topic and industry. There are people making higher than average rates using other ad placement systems (or by selling ad space themselves), but Google AdSense is the most accessible system to the average blogger, so let’s use some examples from there. The majority of click-throughs that I get on this site (and others I’ve run in the past) varies between $0.10 and $2.00. In one extreme example, I think I once received $5 for a single click-through. I know of specific topics that pay significantly higher (life insurance being one such).

Click-through rates tend to depend a lot on where people place ads on a page. Having high quality ads can help as well, but since Google tries to tie ads into the contents of a page, bloggers have some control over the sorts of things that generally appear. Spending some time experimenting with placement can have a large payoff. Editor’s Note: I’m guilty here; I do have ads, but I really can’t be bothered where they show up, since ad revenue isn’t what I’m after.

Therefore, the expected average earnings for a statistically significant number of hard-working bloggers could be calculated as being in the following range:

Low End: Assume 1000 visitors per day, 3% click-through rate and $0.10 per click = $3/day or $90/month.

High End: Assume 5% click-through rate and $1 per click = $50/day or $1500/month.

Bear in mind that the above figures are for somebody with average knowledge of how search engines work, a good work ethic, a willingness to experiment, and the patience to build things up over time. I don’t know how many people this covers.

Like I said before though, there’s a black swan or power law effect that’s at work here. What will typically happen is that the vast majority of bloggers will earn next to nothing through ad revenue, a small but well defined set will make enough to make it worthwhile to do full time, and a tiny (and exceptionally well-known) group will make a fortune. Similar to other kinds of creative efforts right? Think authors or musicians.

Disclaimers (I think they’re needed here):

a) I use Google AdWords on this site. I’ve made $10 in the past 6 months. I’m too busy with other things to care too much. I’ve run sites that made $50 to $100 per month in the past, with minimal effort on my part.

b) I know of several people who make a decent living blogging (by decent I mean more than I make!). There are some interesting differentiators between them and other bloggers. They all approach it as a business. Most of them seem to have found ways to make other people do the hard work for them. They also all find real-world outlets (i.e. seminars, consulting, selling product) that neatly tie in to their blogs, in such a way as to create a reinforcing upward spiral of activity. Believe it or not, only a few of the ones I know are “famous” or are active mainstream journalists. The people I know aren’t a big enough set to be statistically significant.

Reblog this post [with Zemanta]