Saturday 4 March 2017

Ubuntu xenial64 on Vagrant

If you are getting errors similar to the ones below:

The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

hostname -f

Stdout from the command:

Stderr from the command:

sudo: unable to resolve host ubuntu-xenial
mesg: ttyname failed: Inappropriate ioctl for device
hostname: Name or service not known

or

Vagrant was unable to mount VirtualBox shared folders. This is usually
because the filesystem "vboxsf" is not available. This filesystem is
made available via the VirtualBox Guest Additions and kernel module.
Please verify that these guest additions are properly installed in the
guest. This is not a bug in Vagrant and is usually caused by a faulty
Vagrant box. For context, the command attempted was:

mount -t vboxsf -o uid=1000,gid=1000 v-csc-78dab358b /tmp/vagrant-chef/1d140fd50fa8a0774caff0f697e00977/cookbooks

The error output from the command was:

mount: unknown filesystem type 'vboxsf'

And keep finding google results for these threads:

Here's a TLDR of what you probably need to do:

1. Upgrade Vagrant to at least 1.9.2.
2. Install vagrant-vbguest.

vagrant plugin vagrant-vbguest

Thursday 17 October 2013

Can't flush bind cache due to rndc: connect failed

I couldn't find a solution for this in the googles:

$ sudo rndc flushname somedomain.com
rndc: connect failed: 127.0.0.1#953: connection refused

Another symptom:

$ sudo service named restart
Stopping named: .                                          [  OK  ]
mount: block device /etc/rndc.key is write-protected, mounting read-only
mount: cannot mount block device /etc/rndc.key read-only
Starting named:                                            [  OK  ]

The setup:

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.4 (Santiago)
$ rpm -q bind
bind-9.8.2-0.17.rc1.el6_4.6.x86_64

The fix:

# cd /var/named/chroot/
# cp /etc/rndc.key .
cp: overwrite `./rndc.key'? y
# /etc/init.d/named restart
# logout
$ sudo rndc flushname somedomain.com

Friday 23 November 2012

Linux driver file (PPD) for Fuji Xerox printers

The cheapskates at FX don't provide PPDs for Linux, even though they are the same as the MacOS files.

Now, opening apple's *.dmg files is surprisingly convoluted.  Here's a step-by-step guide to, hopefully, save you some time.

1. Download the MacOS ApeosPort IV drivers from Fuji Xerox's site.
2. Download dmg2img (I downloaded the source code for dmg2img and ran `make' - you guys rock!).
3. Convert the dmg file into an image:

./dmg2img ~/Downloads/fxmacprnps1208am105iml.dmg /tmp/out.img

4. Mount the obtained image:

sudo mount  -o loop /tmp/out.img /mnt/iso/

5. Use xar (yum install xar) to extract the pkg file:


cp /mnt/iso/Fuji\ Xerox\ PS\ Plug-in\ Installer.pkg /tmp/
cd /tmp && mkdir fx && cd fx
xar -xf ../Fuji\ Xerox\ PS\ Plug-in\ Installer.pkg 

6. Inside the extracted folders, locate and copy the Payload file with the PPDs:

cp ppd.pkg/Payload /tmp/Payload.cpio.gz

7. Gunzip the file and extract the cpio archive:

cd /tmp
gunzip Payload.cpio.gz
mkdir ppd && cd /tmp/ppd
cpio -id < ../Payload.cpio

8. VoilĂ ! your PPD files are in Library/Printers/PPDs/Contents/Resources/:

$ls
Fuji Xerox 4112 PS.gz          FX ApeosPort-IV C4475 PS.gz     FX DocuCentre-IV 7080 PS.gz
Fuji Xerox 4127 PS.gz          FX ApeosPort-IV C5570 PS.gz     FX DocuCentre-IV C2260 PS.gz
Fuji Xerox D110 PS.gz          FX ApeosPort-IV C5575 PS.gz     FX DocuCentre-IV C2263 PS.gz
Fuji Xerox D125 PS.gz          FX ApeosPort-IV C5580 PS.gz     FX DocuCentre-IV C2265 PS.gz
Fuji Xerox D95 PS.gz           FX ApeosPort-IV C6680 PS.gz     FX DocuCentre-IV C2270 PS.gz
FX ApeosPort 350 I  PS B.gz    FX ApeosPort-IV C7780 PS.gz     FX DocuCentre-IV C2275 PS.gz
FX ApeosPort 350 I  PS.gz      FX DocuCentre 450 I  PS B.gz    FX DocuCentre-IV C3370 PS.gz
(...)

Enjoy.

Thursday 8 November 2012

Caching 301 redirects in Varnish while keeping the protocol

I have noticed a drop in our Varnish cache hit ratio and, upon investigation, found that the backend was not allowing 301 redirects to be cached:

Client request:
   42 RxRequest    c GET
   42 RxURL        c /img_resized/au/images/gifts/2012/hero/christmas-homepage-hero-alyce.jpg
   42 RxProtocol   c HTTP/1.1

Backend request: 
   21 TxRequest    b GET
   21 TxURL        b /img_resized/au/images/gifts/2012/hero/christmas-homepage-hero-alyce.jpg
   21 TxProtocol   b HTTP/1.1

Backend response: 
   21 RxHeader     b Cache-Control: no-cache
   21 RxHeader     b Location: http://static.example.com/img/au/images/gifts/2012/hero/christmas-homepage-hero-alyce.jpg
   21 RxHeader     b Status: 301

I logged a bug for the application to be fixed and, in the meantime, added a VCL snippet to override the backend headers:

        # WSF-xxxx: App is not allowing redirects to be cached
        if ((req.http.Host == "static.example.com") &&
           (beresp.status == 301) &&
           (beresp.http.Cache-Control ~ "no-cache")) {
                set beresp.http.Cache-Control = "public, max-age=604800";
        }

There's a catch though.  We do SSL offloading for static.example.com in our F5 BigIP LTM load balancer.  In other words, the load balancer receives a https request and, in turn, makes a http request to Varnish.

As Varnish doesn't support SSL natively it is unaware of the protocol (http or https) being used; thus, the requests below get cached with the first answer it sees:

HTTP request
curl -I http://static.example.com/img/au/images/gifts/2012/hero/christmas-homepage-hero-alyce.jpg
(...)
HTTP/1.1 301 Moved Permanently
Location: http://static.example.com/img/au/images/gifts/2012/rectangle-75-58/christmas-promo-tile-alyce.jpg

HTTPS request
curl -I https://static.example.com/img/au/images/gifts/2012/hero/christmas-homepage-hero-alyce.jpg
(...)
HTTP/1.1 301 Moved Permanently
Location: http://static.example.com/img/au/images/gifts/2012/rectangle-75-58/christmas-promo-tile-alyce.jpg

Notice that the HTTPS request was redirected to a HTTP URL.  This will most certainly cause issues, including mixed-mode SSL alerts in some browsers.

Our backend is a Ruby on Rails application and, to my surprise, it understands the X-Forwarded-Proto HTTP header.  This means that the application will use the value of X-Forwarded-Proto to build the URL in the Location header.

I improved the F5 iRule by instructing the load balancer to insert a X-Forwarded-Proto header in HTTPS requests which were being off-loaded to Varnish:

when HTTP_REQUEST {
      HTTP::header replace X-Forwarded-Proto "https"
}

In addition to this, I also had to instruct Varnish to use the X-Forwarded-Proto header in the cache hash:

sub vcl_hash {
if (req.http.X-Forwarded-Proto) {
set req.hash += req.http.X-Forwarded-Proto;
}
}

With these tweaks Varnish is now able to account for SSL offloading and correctly serve the appropriate cached version of a page.  An improvement on this configuration would be to only use X-Forwarded-Proto in the hash if the response is a redirect; at the moment I can't be bothered that Varnish is caching objects twice.

HTTP request
$curl -I http://static.example.com/img_resized/au/images/gifts/2012/rectangle-75-58/christmas-promo-tile-alyce.jpg
HTTP/1.1 301 Moved Permanently
Location: http://static.example.com/img/au/images/gifts/2012/rectangle-75-58/christmas-promo-tile-alyce.jpg

HTTPS request
$curl -I https://static.example.com/img_resized/au/images/gifts/2012/rectangle-75-58/christmas-promo-tile-alyce.jpg
HTTP/1.1 301 Moved Permanently
Location: https://static.example.com/img/au/images/gifts/2012/rectangle-75-58/christmas-promo-tile-alyce.jpg

Saturday 10 March 2012

Using varnish to increase the cache time of slow pages

If you are having trouble getting your organisation to accept performance as a feature, one solution is to tie it to another feature.

A common discussion between content owners ("the business") and web operators is how long to cache pages for.  The content owners want to see their fresh content in minimal time but web operators want to cache expensive (to generate) content longer.

Usually content owners get their way, as they should, or you wouldn't have a business.  So you set the default cache time on all pages to something low, like 10 minutes.  If you are not quite ready to take the ESI leap, an attractive compromise is to penalise only expensive pages with a higher expiry time.

If your server backend runs on Ruby-On-Rails, the job is already half-done since requests return a X-Runtime HTTP header indicating, in milliseconds, how long it took RoR to generate the page.  This header is either available in most web frameworks or would be quite easy to implement.

We use the VCL code below to override the internal varnish TTL for slow-to-generate pages.


# VCL 2.1.5 to extend TTL on pages which take 1s or longer to generate
if (beresp.http.X-Runtime ~ "[0-9]{4,}") {
    if (beresp.ttl < 1d) {
        set beresp.ttl = 1d;
        set beresp.http.X-Cache-Override = "1d";
    }
}

We choose not to modify the original headers, which are sent to the client, because this gives you the opportunity to keep or ban the content from Varnish while letting the browsers come back to ask for the "new" content.

The code also sets a "X-Cache-Override" header so you can tell which pages are taking 1 second or more to generate.

This is great because now, when someone complains that their content isn’t refreshing, we can improve performance to areas of the site that matter to the business instead of the whole “make the site faster” approach.

The results will come quickly.  Two days after putting this in production Google has shown that the average page load time in our (very large) site dropped half a second. As you can see in the graph below, the trend is still going down (click on the image).
Site speed improvement as perceived by the Google bot

A further improvement to this would be to write an algorithm which sets the TTL based on the page generation time.  For example, add 1 day to the TTL for each second it takes the page to generate so if a page takes 5 seconds to generate, keep it for 5 days.

Tuesday 27 July 2010

Varnish and the Linux IO bottleneck

There are 2 main ways Varnish caches your data:
  1. to memory (with the malloc storage config)
  2. to disk (with the file storage config)
As Kristian explains in a best practices post, you get better performance with #1 if your cache size fits in RAM. With either method the most accessed objects end-up in RAM. While the malloc method puts everything in RAM and the kernel swaps it out to disk, the file storage method puts everything in disk and the kernel caches it in memory.

We have been using the file storage for a little more than 3 months on servers with 16GB of RAM and a file storage of 30GB. Recently, because of an application configuration error, the size of the cache grew from its usual maximum of 6GB to 20GB+.

Since the system only has 16GB, the kernel now needs to choose which pages to keep in RAM. When a dirty mmap'ed page needs to be released the kernel will write the changes to disk before doing so. More than that, the linux kernel will proactively write dirty mmap'ed pages to disk.

On a Varnish server with this setup or any other application that uses mmap() for large files, the situation translates to constant disk activity.

At some point the varnish worker process will be blocked by a kernel IO call. The Varnish manager process, with no way to tell what's happening to the child, believes it has stopped responding and kills it. The log entries below are typical:

00:30:10.395590 varnishd[22919]: Child (22920) not responding to ping, killing it.
00:30:10.395622 varnishd[22919]: Child (22920) not responding to ping, killing it.
00:30:10.417309 varnishd[22919]: Child (22920) died signal=3

At this point you lose your cached data and the system goes back to normal. After some time, the size of the cache will grow to be larger than your server's RAM again, repeating the kill-restart cycle.

You could choose to give the client more time to respond to the manager process' ping requests. This is done by increasing the default value of cli_timeout from 10 seconds (e.g. varnish -p cli_timeout=20) but this is merely masking the issue. The real issue is that the Linux kernel is busy writing dirty pages to disk.

There are parameters you can use to control how much time the kernel spends writing dirty mmap pages to disk. I have spent some time fine-tuning the parameters below with little results:

/proc/sys/vm/dirty_writeback_centisecs
/proc/sys/vm/dirty_ratio
/proc/sys/vm/dirty_background_ratio

In RHEL 5.x kernels you can completely disable committing changes to mmap'ed files:

echo 0 > /proc/sys/vm/flush_mmap_pages

Short of reducing the file storage size to fit in RAM, this is the best solution I found so far. Be aware that using this with the upcoming persistent storage in Varnish is a really bad idea as you risk serving corrupt and/or stale data. This is only acceptable because the mmap'ed data, in this setup, is superfluous: varnish throws away the cache on restart and it can always fetch the object from the backend.

I have asked Red Hat if they plan to keep the flush_mmap_pages setting in RHEL 6 but haven't received a response yet.  They did, however, confirm that msync() calls are honoured and dirty pages being evicted from RAM are committed to disk.

Friday 30 April 2010

CruiseControl, ProxyPass

Dear blog, today I submitted a patch for CruiseControl. It feels great to give something back (even if it's half-baked).

I couldn't find a way to make CC give the correct URL to users when running in a configuration like below:

ProxyVia On
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
(...)

CC would insist on building URLs using localhost:8080 even though the user can only access via http://your.host.com/. With the patch CruiseControl will use the X-Forwarded-Host HTTP header to build the correct address.