Back in Early 2012 were looking at under 100K visitors per day and struggling with our current infrastructure. Our Databases were straining, our web servers were railing and our all important ad server was maxed out. We knew that caching was the only answer. We embarked on a multi faceted attack to be able to support 10x growth.
We thought about three levels of caching and how to achieve it.
- Data Caching. Caching Data sets even if only for a minute.
- Dot.Net object caching.
- Web Page HTML caching.
Data Caching – The fastest database query is the one that doesn’t have to happen.
For Data caching, we considered the holy grail of data caching layers where every request reads and writes to the cache and hits to the DB second via the middleware. However, we didn’t have the resources or the time to go down that path yet. We wound up using three different data caching mechanisms which were best for each use case.
The first data caching layer that we used was within .Net itself. We used .net data caching. We decided to use this for smaller data sets that rarely change such as for configuration data and lookup tables that required expiration only when the data changed. We wanted to limit this because with .net caching, the same data is stored in memory for every process on every server. We run 3 processes in a web garden for each web server so we would be caching the same data 30+ times. For data that changed rarely and accessed often, this made sense.
For more set based data caching we turned to Redis as a caching data store and Postsharp as a mechanism to instantiate it. Initially we used Service Stack as the client for Redis but were not happy with the performance. When we dug into it, we noticed that service stack opened and closed a connection with every access. Wasn’t very efficient. We switched to using Booksleeve and that worked well for a while but had trouble with deserializing nested json objects. Finally, we turned to protobuf which not only allowed for ascii serialization, but binary which was even more performant since it made the payload to and from Redis that much smaller.
We had our home grown caching layer but soon found another method of data caching that was super simple to implement and met a need that our other mechanisms did not. We found a product called ScaleArc that sits between the application and the database as a proxy/cache layer. To use it, all you do is point your connection string to it and set it up to use your existing DBs as the origins. Then all requests are reported in nice analytics. You can then see which queries are the most expensive and then set up ScaleArc to cache the result set for that type of request. It gives great stats on DB usage and cache hit ratios. We first used it to help scale our third party ad server since we didn’t have access to the code. Our ad server has a horrible Sql Server DB bottleneck. We were railing it constantly which caused us a loss of potential revenue. We upgraded the hardware and then soon railed that server. Since it was a third party application, we could not make any changes to the app to improve it. ScaleArc allowed us to add a data caching layer without touching the application. This gave us a lot more room for growth and paid for itself many times over.
One of our big sources of revenue was advertising. We did a lot buying of keywords to drive traffic to the site. Then in turn, we made money on the ads that were on the pages that these users visited. These users were not the same type of engaged users that our normal registered users were, so we wanted to segment this traffic so that these visitors did not affect our normal users. We leveraged Amazon EC2 instances to serve up that traffic. Since our DB servers sat in our collocation space, there was a lag to access the data from EC2 to our datacenter servers. We put ScaleArc in place for the Amazon instances for our purchased traffic and implemented heavy caching. Since most of the traffic driven by marketing is driven initially to a set of landing pages, the db caching had a high cache hit rate and has allowed us to scale that business tremendously.
Dot.net caching – Good for some applications.
Data caching can only take you so far. Better yet, avoid needing to require any data access at all. To this end we started caching components of a page. We mainly cached some slowly changing user controls, header, footer and a handful of fairly static pages.
When you cache user controls, we quickly learned that referencing them in subsequent page loads becomes problematic. Since it is cached, any reference to the control will return null. We could not access the control. So we had to put in some error trapping. Downside of Dot.net output caching is that there is still work being done on the server side even though greatly reduced. It forces you to develop with specific patters to expect controls to not be referenced in your code.
Web Page HTML caching – Edge caching for the win!
When we went to the Velocity Conference for the first time a couple of years ago, we learned about a product called aiCache. It is a caching appliance with a lot of flexibility. It could have different caches based on cookie, browser, etc.
This worked well for us because most of our traffic was guests who were not logged in. So based on cookie we were able to direct traffic via the load balancer to aiCache if there authenticationcookie or directly to our origin servers if the user was logged in. This way we didn’t have to change the application to show personalized information.
As we continued to grow we started seeing bottlenecks with logged in traffic and the aiCache infrastructure. So we wanted to add a layer of edge caching for our high traffic pages.
We had been using limelight as our CDN for images but the performance wasn’t fantastic. After attending Velocity in California we learned about TCP/IP overloading and looked for a CDN vendor that did it. The overloading is much faster than standard tcp/ip because it sends packets and does not wait for an Ack before sending the next packet. We saw times of 15ms from Cotendo vs 100ms from limelight for the same cached content. We went with Cotendo which was recently bought by Akamai. (Boo)
The key with successful CDN edge caching of our pages was two fold.
1) Proper configuration – Setting TTLs, Include Querystring, etc
2) Client Side Customizations.
Configuration required understanding the data and how often it changed. The structure of the URLs and understanding what requests couldn’t be cached was crucial. Many Ajax requests and some pages needed to bypass cache and go directly to the origin servers. We had to carefully set TTLs and identify when querystring should be part of the cache key or not.
Most importantly, our pages had customizations based on who is logged in. It needed to show user driven menus that changed to show the user’s avatar, premium level, etc. In order to properly cache pages at the edge, we had to cache a generic for a non logged in user and then apply the customizations client side. The complexity is that you don’t want the user to see the generic page and see the customization snap in. We want the page to appear nicely for the user.
To do this we knew that it had to be done in the header with as little required code as possibly. If we waited for JQuery to load or if we made an ajax call to get information it would slow down the page or cause that snap in. So the lesser of the evils was to store the minimum needed for the header customization in a cookie and have the small js snippet that knows how to generate the header html customizations in-line on the pages.
With this in place we were free to launch the CDN edge caching of HTML and our logged in users saw what they expect to see. All was good.
Because of the increased cookie payload, we made sure to move most of our static file http requests to cookieless domains (btrcdn.com) so it wouldn’t add a lot more overhead to the http requests.
Adding all of this caching enabled us to grow our traffic over 10x and still have plenty of room to grow another 10x.