Improving Web Site Performance

andargor

Rule Lawyer Groupie
Supporter
The boards have been very sluggish lately. I'd like to offer some tips on some things that can be done to improve web site performance.

I don't want to sound condescending, it is not my purpose. But my work experience has been networks and the Internet, and particularily the architecture and implementation of high-end hosting data centers providing multi-gigabit connectivity and 5/9+ high-availability (5/9: that's 99.999% and better).

Here are the steps I propose, with "quick hits" identified which require a minimum of effort and expenses:

  • Caching: I've mentioned it in this thread. Use it whenever you can, especially on images. If you can afford it, use services like Akamai which cache images and other "weighy" objects throughout the world so they are accessed locally by users, so you save bandwidth and processing power (this is also especially useful with streaming content and downloads). Another alternative is a reverse proxy, which is basically a cache server in front of your web server which does the grunt work of transferring images, CSS, javascript, zips and static HTML.

    Quick Hit: Remove any "no-cache" pragmas from your web server configuration and HTML templates, as well as any "max-age: 0" or other expiration directives. This will quickly reap benefits in both bandwidth, number of requests, and processing power by allowing the user's browser to cache frequently fetched but seldom changing elements. There may be some items that cannot be cached (I haven't seen those on EN World, but you never know), so you may have to have a few separate templates. Alternatively, you can also specify an expiration ("max age") of several minutes, which should still provide significant performance improvements, since it will cut down on transfers during a user's session.

  • Separation of Function: Most high-bandwidth, high-performance sites separate all functions into layers: presentation (HTTP), application logic (PHP or other), and database typically. This allows a "funnel" architecture where you may have several low-cost front-end web servers (e.g. 2-10 500 MHz linux boxes) whose sole task is serving files (HTML, javascript, images, etc.). The smallest of linux boxes can serve an impressive amount of files per second. If a dynamic request is required, the request is proxied back to the application logic layer which just returns the dynamically generated HTML, and all other static HTML elements are returned by the front-end. This is a 15:1 ratio typically (e.g. for every dynamic request, there are an average of 15 requests for the static HTML elements in the page, such as images, CSS, javascript, etc.).

    This is the most significant performance boost. Although there is a 15:1 ratio, application logic is usually more CPU intensive, and require about 1 box for each 2-5 front-end servers (depending on the application architecture). The boxes are usually mid-size (say, 1 GHz compared to the 500 MHz front-ends). If a database query is required, then that is passed on to the database layer. How many database requests there are depends entirely on the application architecture. From what I can tell, VBulletin goes to database at every request, so there would be a 1:1 relationship with EN World, which is high. However, a performance boost here is the use of a pool of persistent connections so that the application layer doesn't have to bring up and tear down connections at every request. Having the database on a separate machine allows for better capacity planning and tuning, since you see what is causing CPU drain.

    Quick Hit: The full separation of function is usually a non-negligible expense, and I don't know what are your financial resources. You can also go hybrid, by combining either the front-end servers with the application layer, or the application layer with the database (meaning two layers). That all depends on where the load is coming from. You can also do "poor man's" separation of function, which is what I recommend as a quick hit. If you can only have one server, and use a web server like Apache, you can separate HTTP services and application logic by running two different versions of Apache on the same server. I'm not talking version numbers, but rather one version that is "light" and is compiled with only the most basic modules, and another which is "heavy" with all the PHP, Perl, et. al., modules.

    What this does is allow you to have the 15:1 ratio in Apache processes to serve static elements with the "light" version (typically 200-500K per process), and do an internal proxy (using Rewrite Rules) for dynamic elements (such as the HTML resulting from PHP) with the "heavy" version (typically a few megs per process). This gives you a good performance boost, since you don't have to spawn "heavy" processes during high load periods, you conserve memory, you respond much more quickly since there are many "light" processes for all the static elements, and you keep a relatively constant number of pooled persistent connections to the database from the "heavy" processes (which is significant). I would still separate out the database, for the reasons mentioned above, but that's your call if you want to leave it all on the same machine.

  • Minimize dynamic requests: This is a tough one because of VBulletin's and EN World's nature. As long as you will have a home page that is dynamically served, you will require tons of processing power. Our customers would complain that "the hosting architecture is slow", until we pointed this out to them and made corrections for them. The bottom line: serve as much static content as you can. We improved performance (simultaneous connections) by as much as 5000% in one case. Not easy with a constantly changing site, but there is a method: static publication. Static publication is basically the periodic automated fetching of dynamic pages, which are saved as static HTML and served to users. How periodic depends on how often the information on the page changes.

    For example, can the home page be statically published every hour? Every day? A good example is the main forum page. Personally, I would not mind if it is updated every minute or so. It is probably one of the most requested pages other than the home page. How many page requests do you get in one minute during peak periods? 10? 100? 1000? 10000? If the page were static, you would go to dynamic 10%, 1%, 0.1%, or 0.01% of the time, respectively, with the associated savings in processing power and database connections.

    Quick Hit: Implement a set of cron jobs for the home page and the main forum page, set at intervals you are comfortable with, and use a simple program like wget to fetch and save the result as static pages. Make sure all appropriate links point to those static files. The simplest way is a Rewrite Rule in the "light" Apache processes. Or, if you want to go more radical, save all "archived" content to static HTML and redirect any queries using Rewrite rules to those pages. That way they are still indexed, but you don't go to PHP/database for these unchanging threads.

  • Request throttling: I do not know if you have DoS or protections from inappropriate use. This is a drain on any dynamic site: the harvesting of threads for either offline viewing or other purposes. Or the wanton generation of requests for the sole purpose of making your life more difficult. Most web spiders play nice, but some play not so nice, and will spider your site as fast as you can serve it. What you can do is limit the number of request per second to any one IP address, or any range of IP addresses (better, since it protects from distributed DoS bots in any one network). These is a module called mod_throttle that does exactly this, and probably others.

    Quick hit : Installing a module like mod_throttle is relatively easy, and should be implemented as soon as possible.

  • Firewall optimization: I don't know your firewall setup, but during my experience, I have found that the firewall can be a critical performance factor. I really don't know if this is an issue, but I'll give ou a few leads. Where I saw performance drain was when the firewall had stateful inspection rules (meaning it checks both sides of the connection or session, the request and the return). Stateful inspections are slow, compared to simple single-sided "drop" rules. Also, unfortunately, it can be DoSed relatively easily by opening an inordinate amount of connections until the firewall runs out. All further connections are dropped. This can also occur accidentally during high load periods.

    The solution: turn off stateful inspection (which is a higher risk, but its a tradeoff), or put the front-end "light" servers outside the firewall. Linux servers can be well-protected on their own using the built-in Netfilter firewall, and BSD servers are said to be the most secure. Then you refuse any connections from the outside of your firewall, except for HTTP connections from the front-end servers to the application servers. Because of the 15:1 ratio in request, you increase your available connections by that much.

    Quick Hit: I don't know until I know what your hosting setup is.

The bottom line:

All of the above is nice and good, but until you know what is causing your performance drain, you are shooting in the dark. Get some good (free) monitoring tools and take a look at your server activity: memory, HTTP requests, database connections, disk activity, etc.. Analyze web logs for the most number of requests or suspected DoS attacks (and firewall them out).

Hopefully there are a few gems in this post that will enhance the community's experience of this great site.

Andargor
 

log in or register to remove this ad

Thanks. I'll keep these in mind if I can figure out which one applies. I think it's a logging problem but frankly, I don't know. The admin we had to handle this part of the site has quit and these issues are outside my area of expertise.
 

Michael Morris said:
Thanks. I'll keep these in mind if I can figure out which one applies. I think it's a logging problem but frankly, I don't know. The admin we had to handle this part of the site has quit and these issues are outside my area of expertise.

What's the "logging problem"? Too many writes to disk?

Andargor
 


Michael Morris said:
Apache maintains an error log. It needs to be rotated periodically cause if it gets too big it will slow the system down tremendously.

This is what I do for my site: I use logrotate. In the /etc/logrotate.d directory, create a file called apache. This is what I have in it:

Code:
/home/httpd/logs/www.andargor.com/access {
    missingok
    daily
    postrotate
        /home/apache/bin/apachectl graceful
        /home/apache_cgi/bin/apachectl graceful
        mv /home/httpd/logs/www.andargor.com/access.1 /home/httpd/logs/www.andargor.com/access.`date +%Y-%m-%d`
    endscript
}

/home/httpd/logs/www.andargor.com/error {
    missingok
    daily
    postrotate
        /home/apache/bin/apachectl graceful
        /home/apache_cgi/bin/apachectl graceful
        mv /home/httpd/logs/www.andargor.com/error.1 /home/httpd/logs/www.andargor.com/error.`date +%Y-%m-%d`
    endscript
}

And that's it... Notice I have the "light" and "heavy" versions of apache running (/home/apache and /home/apache_cgi).

Andargor
 

Michael,

I don't know if you've changed the templates, but I see less meta-http "no-cache" tags. However, the server still responds with:

Expires: 0
Cache-Control: private, post-check=0, pre-check=0, max-age=0
Pragma: no-cache

The only directive in the Apache configuration that you should have is Cache-Control: private

Andargor
 

Sorry title of thread makes me do this...

This is Bob, Bob use to be slow and sluggish but Bob has now improved his performance with Webazime, the new natural way to have have a bigger and more satisfying web performance. Bob has happier users at home...

Could not help myself...sorry. :\
 

VBulletin and MySQL config

I did a little research on the VBulletin boards and Usenet, and it seems that these settings are optimal for performance.

VBulletin
In the Admin CP:

Use Forum Jump Menu
If you have a lot of forums you may want to disable the Forum Jump menu, although it makes only a marginal performance hit, it can generate a large amount of HTML.
No

Enable Access Masks
Access masks are a simple way to manage forum permissions for individual users. If you don't use any access masks, turn this option off.
No

Add Template Name in HTML Comments
Add the template name at the beginning and end of every template rendered. This is useful for debugging and analyzing the HTML code, but turn it off to save bandwidth when running in a production environment.
No

GZIP HTML Output
Selecting yes will enable vBulletin to GZIP compress the HTML output of pages, thus reducing bandwidth requirements. This will be only used on clients that support it, and are HTTP 1.1 compliant. There will be a small performance overhead.
This feature the ZLIB library.
If you are already using mod_gzip on your server, do not enable this option.
Yes

Add Standard HTTP Headers
This option does not work with some combinations of web server, so is off by default. However, some IIS setups may need it turned on.
It will send the 200 OK HTTP headers if turned on.
No

Add No-Cache HTTP Headers
Selecting yes will cause vBulletin to add no-cache HTTP headers. These are very effective, so adding them may cause server load to increase due to an increase in page requests.
No

Remove Redirection Message Pages
Enabling this option will remove the update pages that are displayed after a user makes a post, starts a search, etc. These pages provide assurance to the user that their information has been processed by the forum. Disabling these pages will save you bandwidth and may lessen the load of the forum on your server.
Note: Some pages will still use the redirection page when cookies are involved to prevent some potential problems.
Yes

Cached Posts Lifespan
Number of days to maintain a cached copy of a post. This makes threads faster to display, but means that each post takes approximately twice the storage space.
10

Update Thread Views Immediately
Option to update thread views immediately, or once an hour. If you have a large board, you will probably want to disable immediate updates, since they are quite server intensive.
No

Update Attachment Views Immediately
Option to update attachment views immediately, or once an hour. If you have a large number of inline graphic attachments, you will probably want to disable immediate updates, since they are quite server intensive.
No

Check Thread Rating
If enabled, this option will check if a user voted on a thread and show their vote if they have. Otherwise, they will see the voting options even if they aren't able to vote again. This can have an effect on performance.
No



MySQL

These are the recommended MySQL settings (my.cnf/my.ini):

Code:
 [mysqld]
basedir = D:/mysql
datadir = D:/mysql/data
skip-innodb
max_connections = 650
key_buffer = 64M
myisam_sort_buffer_size = 64M
join_buffer_size = 1M
read_buffer_size = 1M
sort_buffer_size = 2M
table_cache = 1500
thread_cache_size = 64
wait_timeout = 3600
connect_timeout = 10
tmp_table_size = 128M
read_rnd_buffer_size = 524288
bulk_insert_buffer_size = 8M
max_allowed_packet = 16M
max_connect_errors = 10
query_cache_limit = 2M
query_cache_size = 64M
query_cache_type = 1

[mysqld_safe]
open_files_limit = 8192

[mysqldump]
quick
max_allowed_packet = 16M

[myisamchk]
key_buffer = 64M
sort_buffer = 64M
read_buffer = 16M
write_buffer = 16M

HTH,

Andargor
 

Remove ads

Top