Month: September 2016

Speeding up Magento Server Response a journey into High Performance

I have been working on quite a highly visited magento store by South African standards. The task is to improve the performance and response times inclusing the magento server response and page load of the site as that had never been a high priority. Furthermore lots of custom development and dare I say it magento module and even core overrides were made to the site.

So here is a pragmatic guide to analysing a magento 1.9 site for performance and a guide on how to speed up the site, specifically magento server response.

Lots of Variables

One of the most difficult things is actually isolating problems. There are many factors to take into account and many reason as to why the site is slow. There are so many interconnected parts that may be the bottleneck. It could be server hardware, non-optimised or index database, bad modules, web server configuration, file system read and writes, external scripts, render blocking js, location of the server and many other things.

Page Load Time vs Server Response Time

A very important thing to understand is the difference between server response time and page load time. As the ways to fix the problems are different in their natures.

Page load time is the time it takes to download and display the entire content of a web page in the browser window (measured in seconds). MaxCDN Page load time

Server response time is the amount of time it takes for a web server to respond to a request from a browser. Varvy server response time

When you request the webpage, the server starts building the page by querying the database and any external services for data and then building the page. That is your server response. So server response time is actually a constituent or subset of page load time.

Finding out if your Magento server response is irregular

The hardest part of making any findings is using a benchmarking or testing tool that is consistent and that does not depend on your network speed or some other factor. The best thing I have found for the magento server response test is the built in magento profiler used in conjunction with the AoeProfiler, which is a tool that makes readibility of the profiler much easier. Also you want to host the site locally. Fabrizio Branca describes how to setup the Aoe profiler.

Pitting your site against a standard magento with sample data at the same version

If your site is responding locally on any page with a response time of greater than 2000 ms you can safely say that something is wrong.

magento-profiler-long-response-time

If your response is not so high, then make sure your local site you are enhancing has cache off and test it against your local standard magento install with cache off. Also test with cache on and see if there is a dramatic difference (> 200ms)

If so you will need to profile your site as to why it has a slower server response

Profiling and Fixing your Magento Server Response

This is the most difficult part. So test your site against the standard magento with both cache on or both cache off, and test on the same pages. So wither home page/ cms page, product view page and category page.

Then expand the profiler and find any extra calls that are done on your site that aren’t done on the magento site. These are usually your bottlenecks. Also any big red blocks needs to be looked at in detail.

Now for the difficult part, you need to find out why these calls are made and best case remove these modules/inefficiencies. Second best is to fix these inefficiencies and also make sure that if cache is turned on and things are not caching you need to find you why.

Your first goal is to have similar server response times to the standard magento site.

The important thing to remember is everything after the server response is frontend and can be optimised, like stop render blocking js, minifying js and css and enahancing image load. The server response is the time where the user is waiting for content and is very important and recommended to aim for < 200ms.

Leveraging Cache

If cache is not turned on in your production site, you are losing a lot of the speed gains you should be getting. Just take note that cache needs to be warmed up so after clearing your cache folder var/cache the first request will be slow and subsequent request will be much faster.

The Cache settings are found at: System -> Cache Management

magento-cache-settings

I have done a few tests on my local craptop…using the built in php server. Remember to server the site with php -S vanilla.dev:4444 router.php so that the links work. More info on this post.

Cache is the top one, uncached is below.

Home screen:

magento-fresh-product-screen-cache-offmagento-fresh-product-screen-cache-on

Category Page:

magento-fresh-category-screen-cache-offmagento-fresh-home-screen-cache-on

Product Page:

magento-fresh-home-screen-cache-offmagento-fresh-category-screen-cache-on

So caching will at its worst will double the speed of the site normally, running locally. You may find when deploying to production environments that usually have more resources than your local pc the gain may halve.

So a 600 ms gain, will only be 300 ms on production environments.

Be wary of your custom templates not being cached

If you have a custom block for example MyModule/Block/Page/Html/TopMenu.php to override the standard top menu, but you are not using a template file called topmenu.phtml and have not made the correct provisions for this it will not be cached. For example you go and call your file megamenu.phtml and call $this->setTemplate('megamenu/page/html/megamenu.phtml).

That can really catch you out especially if it on a template that is on every page.

To check your blocks that are being cached you can use another great tool by Aoe called AoeTemplateHints, it will give you allot more information about your blocks and templates and is particularly good when looking at caching.

Results from fixing the Menu that was not being cached

We found that the megamenu at the top of the site was never cached server side because the template, never enabled cache on initialisation of the block. We fixed this and deployed and the results are quite astounding.

We implemented the fix on October the 5th in the morning.

Category view the improvement varies because of another inefficient module on the page. The gain here was about 300ms – 400ms.

category-view-menu-cache-fix

Search results improved about 300ms

search-results-view-menu-cache-fix

CMS home page about 350ms, ignore the spike

cms-home-page-view-menu-cache-fix

Product view page improved by 400ms, just under half

product-view-menu-cache-fix

The index page gains in excess of 400ms

cms-home-page-view-menu-cache-fix

So a huge and significant improvement in speed. Honing in on the 200ms server response time. But there is still more work to be done on certain pages.

Another slow module: Product badges and Panels

There was a piece of horrendously slow and badly architectured code:

  1. Is passing the product id to the template block, then retrieiving the entire product by Id a good way?

<?php echo $this->getChild('badges')->setData("product_id", $_product->getId())->toHtml() ?>

 

  1. Using a LIKE query for each product in the catalog list, and not using cache in the block.
<?php
 $productId = $this->getProductId();
 if (!$productId)
 {
 if ($product = Mage::registry('product'))
 {
 $productId = $product->getId();
 }
 }
?>
<?php if ($productId): ?>
 <?php
 $product = Mage::getModel('catalog/product')->load($productId);
 $attributes = Mage::getResourceModel('catalog/product_attribute_collection')->addFieldToFilter('attribute_code', array('like' => 'badge_%'))->getItems();
 foreach ($attributes as $attribute):
 ?>
 <?php if (!$product->getData($attribute->getAttributeCode())) continue; ?>
 <?php echo $this->getLayout()->createBlock('cms/block')->setBlockId($attribute->getAttributeCode())->toHtml(); ?>
 <?php endforeach; ?>
<?php endif; ?>

After fixing the catalog pages and search results are below:

catalog-page-improvement search-results-speed-enhancement

Reports: Should you turn it off?

The magento reports observer does add some time to the product view page magento server response time…about 50 – 150ms from my local testing.

reports-profiler

Right at the bottom of the image is OBSERVER: reports

There is a gist that gives you the instructions, but rather add those changes in config.xml: I have tested locally and it removes that time from

Is AppTrain Minify HTML CSS JS Making Server Response Slow?

apptrain-minify-css-html-js-slowing-down-server-response

As you can see The Apptrain Minify_HTML::process and JSMin::min methods are slowing down server response by about 22%.

Strangely these methods do not show up in the magento profiler, as a custom library is used.

After disabling the AppTrain Minify HTML CSS JS the html was larger but a few measily kilobytes but server response time gains ranged from 150ms to 450 ms. A staggering result. The verdict is Do not use Apptrain Minify HTML CSS and JS, it will slow down your site. Check the results below.

Minify enabled:
Product: 25.8 KB = 1.22s
Category: 24.8kb = 1.06s
Home: 21.2kb = 612ms

Minify Disabled:
Product: 28.4 Kb = 717ms
Category: 26.7kb = 814ms
Home: 22.6Kb = 479ms

Take homes

So Looking at the last 3 months, I have significantly increased the average server response from about 1200ms to just over 400ms. So that is a 67% improvement. The apdex score has also increased significantly.

improving-server-response

Stoked with that but the page load still needs a bit of an enhancement and using varnish cache could probably drop this significantly. Enhancing SSL and moving the server closer to the customers would increase that time to first byte.

Common Coding Mistakes and Making Magento Fly

There is a good slideshow showing silly mistakes and how to correct them, and making standard magento fly.

Enable PHP Zend OPCache

There is an excellent article from amsty that highlights that in the PHP accellerator debate PHP OpCache is the best thing to use. It also debunks the Percona vs MySQL debate, showing that it is much of a muchness.

There is a good tutorial specifying how to implement OpCache. I will implement and then test the results.

But this is enabled by default on PHP 5.6 and greater I think. It just requires some variable tweaking.

Server

Scaling Magento Slideshow

More info on the setup above

Good article on implementing HHVM

Optimised OpCache Variables

HHVM 4x Performance boost (Click bait?)

A decent magentoflow question

Magento and frontend optimisations

Resources

Here is a list of excellent resources to make sure your magento is flying

Testing and Load Testing Resources (Speed Test Toolkit)

Varnish

The pinnacle of performance, in terms of getting < 10 ms response times is Varnish Cache.

Here is the repo, and a list of magento sites that use varnish can be checked. You will notice that first load may be normal magento times, but next load can be less than 15 ms.

Increase the Php memory limit to unlimited in .htaccess

Redis Cache

HHVM

Varnish

Nginx

OpCode Cache

Google Pagespeed Insights

Low Hanging Fruit

Caching

Flat catalog and product page

Kibana using lots memory

Is Digital Oceans 1-click-deploy ELK stack poorly configured on purpose?

Kibana is using too much memory

I wanted to see what juicy stats and info I could get out of the logs some of my servers were creating. Eventually I ran into the kibana using lots memory error. But I went through the Install ELK on ubuntu 14.04 tutorial and all was well but the certificate  creation didn’t go too well so when filebeat and topbeat was installed on the client they would not restart and give an error:

ERR SSL client failed to connect with: dial tcp my.fqdn.co.za:5044: i/o timeout

So it turned out to be a mission so I resorted to using the One click install image tutorial and it worked like a charm.

I started with a 1Gb droplet and that gave issues like logstash just stopping, elasticsearch just stopping and kibana just stopping. So you had to restart and then after a while it still wouldn’t work.

Kibana node memory keeps increasing

So I moved to the 2Gb droplet and added it to newrelic so I could see what was ccasuing these problems. And I think I have found the culprit:

node-using-memory-elk-stack

As you can see node just starts going out of hand from the start until it runs out of memory.

So is Digital ocean deliberately leaving this issue / not configuring this issue so that you go for a more powerful droplet. They are in the business of making you use more VPS’s.

But I digress…

Node is the kibana part of the ELK stack and this issue has been highlighted without a clear answer (like always) on Github and another one

But suggestions are saying add this to bin/kibana at the top:

NODE_OPTIONS="--max-old-space-size=250"

On the one click install that is located at:


/opt/kibana/bin/kibana

and it should look like:


NODE_OPTIONS="${NODE_OPTIONS:=--max-old-space-size=500}"
exec "${NODE}" $NODE_OPTIONS "${DIR}/src/cli" ${@}

I will implement this and let you know the results

Update (The following day)

kibana-rising-then-restarting

So I set the limit to 500 and duely it restarted at 440 mb used, according to this graphic:

kibana-usage-max-500mb

So summing up it may be possible to run this on a 1Gb RAM droplet.

The java instance for logstash runs at about 250mb max memory usage. Java for elasticsearch seems to max at 600 mb but maybe that is just because of the amount of data it has to search through / keep in memory. I’m also not sure if you can limit this amount. Filebeat uses about 15mb max. Rsyslog uses about 10mb.

So can you run the ELK stack on a 1Gb droplet?

Processes and Memory being used on the ELK stack

User Process Count CPU Memory
elasticsearch java 1 8.5% 564 MB
kibana node 1 0.4% 325 MB
logstash java 1 3.8% 239 MB
root java 1 81.1% 154 MB
root apt-get 1 7.5% 42.6 MB
root filebeat 1 0.0% 12.6 MB
root fail2ban-server 1 0.0% 9.37 MB
syslog rsyslogd 1 0.0% 7.96 MB
www-data nginx 4 0.0% 6.16 MB
newrelic nrsysmond 2 0.1% 5.43 MB
root vim 1 0.0% 5.21 MB
root bash 1 0.0% 3.72 MB
root sudo 1 0.0% 2.03 MB
root init 1 0.0% 1.71 MB
root sshd 1 0.0% 1.62 MB
sshd sshd 1 0.0% 1.4 MB
root nginx 1 0.0% 1.09 MB
root getty 6 0.0% 944 KB
messagebus dbus-daemon 1 0.0% 820 KB
root systemd-udevd 1 0.0% 739 KB

So the total excluding kibana is: 1059.75 mb

Well it looks like we are over budget as it stands. I will keep monitoring.

Making the ELK Stack run on a 1Gb Droplet (Update)

Well we can see that we can stop Kibana going out of control on the memory side. Now that try save some money by limiting it so it only needs a 2Gb droplet.

We are going to limit Kibana to 250Mb and Elasticsearch to 500Mb and hopefully everything goes smoothly. To change the limit on elasticsearch memory change: /etc/default/elasticsearch

Add the following:


ES_HEAP_SIZE=512m

That is what I had the best success with. Although there are additional steps outlined on stackoverflow.

So we will implement the above and wait until it safely runs under 1Gb memory, otherwise we are  going to flip the switch and just downgrade to 1Gb and reboot.

Update

We are still seeing the jaggered edges created by the periodic increase and dropping of kibana memory usage.

kibana-usage-still-dropping-and-climbing

And this is pushing over out 1Gb max. Also elasticsearch is using 650 mb of memory or there abouts when we have told it to use 512 max. Maybe it realises there is spare memory so I will be downgrading it now to a 1Gb droplet.

So far so good…the retrieval or records and visualisation generation in kibana is less speedy but I can deal with it. Here is the current memory situation:

kibana-on-1-gb-droplet screen-shot-2016-09-17-at-12-10-04-pm

Well that didn’t work…

Elasticsearch is still using > 512mb memory and logstash stops as soon as it starts. So if I set the heap size in sudo vim /etc/init.d/elasticsearch:

ES_HEAP_SIZE=256mb

“`

the elasticsearch service does not start…

Ah you need to also edit sudo vim /etc/elasticsearch/elasticsearch.yml

and add the following:


bootstrap.memory_lock: true

I will continue monitoring stability

Update

 Oh no looks like Kibana has died. The stability of the system is coming into question now.

nginx-kibana-502-bad-gateway

Hmm it seems there is a one must die situation, as logstash also just dies after a while…

Somehow the server has dow, couldn’t ssh and newrelic not showing the server but on DO it still said it was up. After turning off and on from DO console I am now getting this error and it is the last straw.

kibana-elasticsearch-request-timeout

Tis’ a shame but it is not in the price range at this stage. Will stop the server.