RSS

Open letter Computer Science majors

I have been in the position to hire programmers in many of the companies that I have worked for.  I have seen my share of developer resumes from people who are either still in school or recently graduated.  The vast majority have no real world experience. They put college or graduate school project work down as real experience.  Hiring managers in technology know that school assignments and projects are nothing like the real world. They are well defined, usually little business value and done in an overly controlled way.  When I have interviewed these applicants, it is obvious that they are missing some key knowledge that is mainly taught in the real world.

I enjoyed lifeguarding in the summers before my freshman and sophomore years of college.  It was a ton of fun and I made some decent money.  Then, during my sophomore year, I got some good advice. Lifeguarding may be great at the time, but it is not going to help me get a job in technology when I graduate. In your senior year, when you start applying for a job, your resume hopefully ends up the desk of some hiring manager if you can get past the HR screener. Your resume will be submitted along with many other resumes showing off their real work experience.

I applied and secured a summer internship as an Informix programmer at AT&T for the next two summers.  When I graduated, it was during an economic downturn. Only half of the graduates from my Tufts engineering class had jobs upon graduation. I was one of them because I had interned at AT&T and had done well. They hired me full time when I graduated. Had I kept lifeguarding, getting a “real” job when with no relevant experience would have been much, much harder.

In the last 20 years, things have changed a lot in technology.  There is much more to learn but way more opportunities get real experience coding in real projects.  There is also much more competition for the best jobs.

If you take nothing else from this blog, take this point: A pet peeve of mine when I interview someone is when they tell me that they are interested in learning some technology in our stack but have never even read about it or attempted to play with it.  It is too easy today to spin up a free virtual environment on your own computer or in the cloud to try out almost anything.  If you want to impress a hiring manager when you are a newly minted graduate, show them you have initiative and drive to learn and contribute with actions, not words.

In my humble opinion, Universities are not doing enough to prepare their computer science candidates to function in a real development environment. They do not teach a lot of very important concepts.

Top 10 Concepts that students should know when they graduate.

  1. Inversion of Control – Dependency Injection service locator, factory pattern.
  2. Unit Testing, Integration testing, and regression testing.
  3. Source control (Git, SVN)
  4. Development Patterns (Singleton, Repository, Adaptor)
  5. Agile/Kanban Methodologies vs waterfall
  6. Relational vs NOSQL databases
  7. Secure Coding practices
  8. Scaling systems
  9. Service oriented architecture
  10. Performance tuning (caching, efficient code, etc)

What are the top 10 things you can do to improve your marketability

  1. Summer Internships where you actually code
  2. COOP programs where you actually code.
  3. Teach yourself the above concepts.
  4. Do your own projects and open source it. Build something interesting.
  5. Contribute to open source projects.
  6. Learn and understand source control. Git flow and SVN are important ones.
  7. Use the cloud resources out there to play with different technologies.
  8. Go to meetups on technology that you are interested in and ask questions.
  9. Having a blog and Github account shows what you can do and shows initiative.
  10. Above all, have a passion to learn, try new things and be an innovator.

Jobs in technology are out there.  It is much easier to get a job right out of school if you show initiative, ability to learn and contribute and to have a body of real work to display.  When a hiring managers makes a decision to bring somebody in, they are gambling that they are going to contribute successfully to the company. If you do internships or Coops or show real initiative and a body of interesting work it will help the hiring manager feel that the odds of you succeeding are in your favor.

 

 

 

 

Tags: , ,

ORM or not to ORM that is the question

ORMs (Object relationship mapper) have come a long way over the last few years.

ORMs used to be very slow and hard to tune but that is not true anymore. However, the decision to use one or not still weighs on every technology leader.

  • If you decide to use one, which one?
  • When you pick one how much of the functionality do you use?
  • How do you tune the queries it generates?
  • Is a full ORM right for you or is a micro ORM more a fit?

This blog entry is not going to answer any of the above questions 😉 This is more talking about what we did on BlogTalkRadio, what were our pain points and how we dealt with them.

When we first built BlogTalkRadio’s data access layer 7 years ago, we looked to see if we wanted to use an ORM. Since we were a .Net shop we looked at Entity Framework and nHibernate. Both were not only  in their infancy and were slow and cumbersome.

So in 2006, We decided to build our own Data Access Layer and mapper that called stored procedures exclusively and did all of the result set mappings to classes with private functions that referenced columns by index. Why by column index and not name? Because when we launched, we were on ASP.Net 2.0 where the reflection methods that are used under the covers to convert that name to an index in ADO was VERY expensive at the time.  It wasn’t until 3.5 that it became more efficient.   I think this was also a contributing reason early ORMs for .net were so slow.

 const int C_ID= 0;
 const int C_WIDTH= 1;
 const int C_HEIGHT= 2;
 const int C_TYPE_ID= 3;

private static MyEntity GetMyEntityFromReader(IDataReader reader)
{
  var MyEntityVariable = new MyEntity
   {
     ID = reader.GetInt32(C_ID),
     Width = reader.GetInt32(C_WIDTH),
     Height = reader.GetInt32(C_HEIGHT),
     Type = (Type)reader.GetInt32(C_TYPE_ID)
   };
  return zone;
}

We grew to hundreds of stored procedures and dozens of entities. After a couple of years with this pattern, it was obvious that there were shortcomings with this method. Let’s list the top few issues, shall we.

  1. We could not change the column order or insert a column in a result set. It forced us to be very careful about backward compatability. Adding a field in the middle would throw off the column count. (Note: When we moved to .net 4 framework I did a benchmark of using Index vs Name and found no discernible difference anymore. After that, we stopped using the index and started using name.)
  2. Having standard Entities that are manually mapped forced us to return ALL fields on every query. Every stored procedure that returned show information had all columns returned. So if we only needed a couple of fields in an entity that has 30 attributes, the procedure returned them all. Occasionally we would create a custom mapping for a particular use case but managing that with a team of 3 developers was too great.
  3. By Extension, if we wanted to add one field, we would have to change sometimes 20 or 30 stored procedures that handle specific CRUD operations and specialized queries for parts of the system.
  4. We had to add tons over overloads to the DALC to handle different parameter sets since nothing is dynamic.
  5. It made it very cumbersome to add new Entities. Had to create the Entity, create the DALC class, create all of the methods calling the stored proc, create the mapping of fields to attributes.

So for the first 4 years we worked fine this way. We had a small static team and we understood the shortcomings and worked around them by always ensuring backward compatability. Columns had to be added to the end of the results sets. We did tedious updates to many stored procs.

Our justification was performance. ORMs were slow and we had a site with millions of page views per day. We did not want to slow down the site.

Then, a couple of years ago, we realized that the justification of saying we had better performance without an ORM no longer held water.  Well, that was true for some of them. Some continued to be a dog. In addition, changes in .Net 4.5 framework boosted LINQ query performance significantly.

So with the performance bottleneck no longer a concern we set upon implementing an ORM. We decided on nHibernate. I am embarrassed to say that it was not for the right reasons. Mainly we chose it because some of my developers were very familiar with it and they lobbied for it. My thinking was I would prefer to use something that my engineers knew so implementation would be smoother since they already know some of the gotchas and how to get around them. This reasoning did prove to be correct.

It is very common to choose technology based on familiarity rather than doing your due diligence.

If you are reading carefully,  this article has indicated that everything was stored procedure based. Obviously nHibernate can call stored procedures just fine and map the results but this is like using a sledge hammer to hang a picture. If you just want to do mapping and call SPs I would have used the micro ORM,  Dapper instead.

So the plan was to start small and initially implement nHibernate on small simple queries and CRUD operations in parallel to the existing DALC. Then we expanded it to joining complex entities and even try to replace some complex stored procedures.

However, all was not peaches and cream. We hit and had to solve a few issues. We also had to come up with our going forward patterns and how to handle the existing DALC.

Issues:

  1. In SQL Server I am adamant about using (nolock) or Setting Isolation to Uncommited on queries to keep queries from blocking Inserts and Updates. This actually was a requirement that whatevery ORM we use, had to support this for SQL Server.  We solved it this way:
    private static ISession BeginSession(ISessionFactory sessionFactory,
                                       bool useTransaction)
     {
        var session = sessionFactory.OpenSession();
        session.FlushMode = FlushMode.Never;
        if (useTransaction)
        {
          session.BeginTransaction(IsolationLevel.ReadUncommitted);
        }
        return session;
    }
  2. The biggest issue with using an ORM is the query that gets generated. We initially used nHibernate in a project for our host tools. We built an api layer so this was a good place to start off and not affect the old stuff. We built it and then did the performance check. It was slow. Very slow. So we profiled all of the queries and there were some doozies in there. These are a few of the issues we had to deal with in the queries that it generated.
    1. The ORM had created some that joined the same table 3 times.
    2. Used outer joins instead of inner joins
    3. My favorite was the lower() function around the column in the where clause. WHERE lower(column) = lower(@value). Makes sense in C# to so this:
      queryable.Where(x => x.AspNetUser.Username.ToLower() == this.UserName.ToLower())

      But that keeps an index from being used when you convert it to SQL.

    4. Turning off lazy loading, We had some entities with sub entities and did not want nHibernate going and loading everything. Often, it wasn’t necessary. We wanted to explicitly load the entities and attributes that we wanted when we wanted them.
    5. Reverted some back to SP because they were quite complex and we could not get nHibernate to do what we wanted it to do.
    6. It is easy to get into a situation where you are doing n+1 queries. You request one entity and then it goes and fetches additional data for each item loaded into the collection. We had to eradicate a couple of those.
  3. Maintenance of queries. In our old stored procedure pattern, if there was a performance issue or we needed to change a query, we just had to change it in the Db which is 100 times simpler than making the c# code changes, compiling, deploying, etc.  DB we can hot deploy and there is zero downtime. Not so with compiled .net code. We had to adjust out expectations and process to deal with it.

This, of course begs the debate between  using SPs vs not using any. I will reserve that for another post.

So where we are now is a mix of old DALC and new DALC. As we rewrite pages we will be making the old DALC obsolete. I look forward to hitting the delete button on the project.

I expect that we will move towards all CRUD operations being done purely through nHibernate (Or a different ORM if we decide to change. Gotta love dependency injection (DI)).

Simple queries will move to nHIbernate more and more but the more complex ones may remain as Stored Procedures for quite a while.

Although, lately I have been thinking that using an ORM gives you the ability to become DB agnostic. If we want to use something like MEMSQL where the entire DB sits in memory but there are no stored procs, we can. If we want to keep everything in Mongo, there are ORMs for that. Gives a lot of flexibility. Just remember, with great flexibility comes performance hits so it depends on your application.

In Summary, writing your own gives you control but then you need to do everything yourself.  Using a tool like an ORM gives the developer more power to not think about the DB and just work with objects.

 
Comments Off on ORM or not to ORM that is the question

Posted by on December 7, 2013 in Web Development, Web Performance

 

Tags: , , , ,

Continuous Deployment via Git

Automated deployments is the holy grail of web development shops. Simple shops with only a small number of homogeneous servers have a much easier time than those of us with more complex environments. There are so many pitfalls to watch out for.  In this talk, we discuss the good, the bad and the ugly that we experienced in getting our continuous deployment process going.

Continuous integration via Jenkins had been running for months. We used it to build our svn instance and then off of our Git Master Branch for dev, QA and Staging. But getting the code deployed to 10 Data center servers and 6 Amazon servers where different servers have different configurations, operating system and yes, 32/64 bit issues, was quite a headache. Couple that with a 24×7 SAAS website with 6 Million daily visits, we had a challenge. We considered using an open source or commercial product such as Puppet, Capistrano or Maestro but they really didn’t meet the needs. They seemed either overkill or not flexible enough.

So we did the following changes to our build and deployments processes:

1. Moving from SVN cherry-picking releases to version-based promotion

In our previous workflow a release-master would take care of cherry-picking the commits associated to tickets that were successfully tested on Qa and Staging environments, and make a big merge to the next environment, all the way to production.

We moved to version-based workflow in which a new version is tagged everytime new features are merged to our master branch (using github pull requests).

Each time a version it’s approved on an environment (qa exploratory testing, product team acceptance, staging regression tests) it moves to the next one, ending up in production.

2. Decoupling build process from deployments

Updating an environment consisted of running a jenkins which will checkout a specific (environment-specific) branch from SVN, build it, and copy the output to a remote shared folder (using tools like robocopy or unison). In order to move towards continuous deployment we needed a way to deploy faster.

With the new workflow established (step 1), we started to move to each environments versions that match exactly a version deployed to dev environment, so we can now build versions once, deploy anywhere.

Decoupled build process now consists in taking a tagged version, build, and push the build output (a distributable) to a separate git repository. Git makes an excellent job in pushing only the deltas from previous version, even for binary files.
Now versions are built only once (unit tests and static code analysis tools run only once too). Using git as a store for our distributable versions allowed us to:

  • Gain ultimate traceability, now not only on source code but on deployed files
  • Mapping between deployed files and original source is a breeze, eg. this allows us to know exactly the differences between the code deployed on each environment, and other cool stuff  generating automatic changelogs based on pull request info.
  • Reduce our build time from more than 20min to < 2min

Once a version is built and pushed to git build repo, it’s ready to be deployed super-fast (a git pull of file deltas) to any environment.

Finally we added a 2nd build process using Release configuration that runs once before moving to qa environments, this one uses msbuild release optimizations and performs all client-side resource optimizations (minification, bundling, hashing, etc.)

3. Git-based deployments

Once versions are published to a git repo, they can be easily deployed using a git clone, and updated with a git pull.

To identify the versions we want to deploy to specific enviroments we use more git tags (eg. env-staging, env-production) that match specific version tags.

In order to setup a new web server we now:

  1. Clone the build repo
  2. Setup IIS website
  3. Run PostDeploy powershell script (more details below)

And each update consists in:

Manually move the env-* (eg. env-production) tag to a newer (or older) version. And on each target server:

  1. Fetch and check if the env-* tag moved, stop the process otherwise
  2. Run a predeploy powershell script, that will:
    1. Take the server out of Load Balancers (If it’s in)
    2. Git checkout the env-* tag
    3. Run postbuild powershell script, that will:
      1. Transform the config files based on the current environment name
      2. Recycle app pool if needed
      3. Run automated smoke tests
      4. Put the server back in the Load Balancers

Note: steps 2.a, 4.b and 4.d are automatically omitted if the diff to the new version doesn’t require an app restart (no .dll or .config changes)

Of course performing this process for each webserver doesn’t escalate well so in order to automate this process we considered other popular deployment tools, but none matched our needs.

We built (and open-sourced) a simplistic “deploy hub” web app that runs on each target server and updates deployed instances (running the process described above) triggered by a REST call, and can notify the result with different mechanisms.

This REST call can be done manually (curl), from a central deploy dashboard, a github post-receive webhook, or at the end of a build process.

Using git here allow us to update environments in no time (specially because as often you deploy, smaller are the deltas). This is very important in achieving zero-downtime, one of the pillars of continuous deployment.

Another consequence is rolling back is pretty straightforward (move the env-* tag and trigger a new deploy), which will happen even faster by just applying the git deltas from an already fetched version.

The overall results is TTR (Time to Release) has got extremely reduced (from hours to a few minutes), and we got rid of the stressful release day big-merges and regression nightmares.

Feedback from features comes as fast as possible, and the number of production found bugs (and the need for emergency hotfixes) reduced drastically .

Since Git does such a great job of identifying what has changed from one tag to another it was a natural way to be able to get code deployed and only have differences applied. So our Git based deployments goes like this:

1)      We set up Jenkins to pull from the Git  Master branch of our main repository.

2)      We used Powershell to script out the commands such as msbuild, git pull, git push, config transformations(cft) ,etc.

3)      With each build, we push the completely built website to a separate Repository (Builds) which only contains the build and a tag for each one.

4)      To set up the Git deploy we need to go on each server once and do a Git Clone and Pull.

5)      We also set an environment variable for each server which identifies the unique configuration for that server. Variables for each config identifies in the Environment variable include different connection strings, Some 3rd party DLL version and some domain configuration.

6)      We Built a web app that we install on each server that exposes a Rest call that will execute Powershell script that does a git Pull and runs CFT which will do config file transformations based on the environment variable to configure the specific server environment. Next the powershell script does a recycle of the app pool if DLLs are being deployed and then does a few curl requests which are some simple smoke tests to ensure the site came back up right. Once the curl requests are successful it makes a call to the load balancer to put it back in.

7)      We also built a web app that runs on a deployment server that is used to trigger the deployment. The deployment app takes a server out of the load balancer and then makes the rest call to server to trigger the deployment.

So, once QA finishes testing and blesses the release. We manually set that version with the production tag. Then we go to our deployment website, click a button and it rotates through the servers taking then out of the load balancer, calling the deployment app on each server and putting them back in. Then it emails out to tell us that it is done and if there were any issues.

Sounds clean no? Well to get there we had a bunch of pain points we had to get through.

1)      Permissions. Since we used a deployment website that calls powershell, we had to run it under an account with proper permissions.

2)      Since the deployment websites are on each web server we needed to make sure that permissions were tight. So we locked down the web site with ip and login credentials. We also only allow access to the deployment site if you are accessing it through the internal IP and via a different port other then port 80. We couldn’t use port any anyway because  the main website accepts all IP addresses into the server. So we had to go with another port so there was no collision.

3)      Because of using Git to deploy, we can not go in and manually deploy anything to the servers or the Git pull would not succeed since the server would not match the tag anymore. In the past we sometimes would take a server out of the load balancer to test a prod issue locally on the box. We needed to be aware if we do that, that we need to do a git reset to put the server back to the state that we can deploy again.

4)      Since we run a post step after deploy to set the configuration files for the environment, we needed to have git ignore them.

5)      Taking servers in and out of the load balancer that we have was not simple. There is no api for the coyote point so we had two option to take the servers out. SSh into the balancer and edit config files or do http Post commands to the load balancer.

6)      Git was always seeing DLLs as changed even when they were not. So it was always deploying the DLLs which made simple deployments of files that do not reset the app pool impossible. We had to put in a compare to verify if the dll had changed or not.

In general it took us about 2 months of deployments to get to this point after we started. We evolved to it by adding different parts each release.  It has greatly simplified our dev process and allows us to release more often with more confidence since our releases are smaller. It also enabled us to stop having to get up early to release code since we can automate the release to happen at slower times. We love Git J


 

 

Tags: , , ,

Bundling and async loading Javascript for fast page loading using require.js

With thanks to Benja Eidelman for help with this Post

Main Idea

Browsers delay page first render until all resources mentioned in the <head> are fully loaded. That’s why the one of the most basic tips to optimize page perceived speed is to move all scripts to the bottom of the body. The obvious next step is to remove from body all scripts that can be loaded asynchronously using one of the many script loaders out there.

We know this is only the tip of the iceberg of optimizations and fine tuning techniques that can quickly introduce lots of noise in your codebase.

We’ll share an approach to perform all common page speed optimizations while keeping your javascript and templates clean and organized.

The tools involved are require.js, grunt.js and a small library we authored to achieve clean declarative integration between templates and javascript modules.

Optimizing page speed

First stage we used require.js as an async script loader.

We got async loading (not blocking the page render or dom ready), callbacks urls, handling errors, cascading, all these with paved cross-browser inconsistencies.

Besides more control on the page load waterfall, this gives you a consistent mechanism to deal with 3rd party scripts loading errors and consistent mechanism to run JavaScript code only when a 3rd party library is loaded successfully.

Organizing the JavaScript Codebase

But require.js is much more than a script loader. require.js implements AMD in browsers. Asynchronous Module Definition is a standard way to organize JavaScript code into modules that can supports dependency resolution and prevent global object pollution.

This is a must in order to write non-trivial JavaScript apps or libraries. This gives us the freedom to structure code semantically as we do in other programming languages. Saying goodbye to the typical 1-giant-file libraries.

Having automatic dependency loading means the end of hacky async initialization mechanisms (to ensure scripts are correctly initialized when the load order is not guaranteed), like register functions, temporal polyfills or process queues.

Also by setting up require.js aliases and shims we keep our js files agnostic of infrastructure decisions (like chosen CDNs or url fallbacks).

Also this leaves us in a pretty good position to move forward when ES6 is ready, including JavaScript native module support.

Bundling

Bundling and compressing static resources, as JavaScript files is one of the basic tips we’ll get from any page speed analysis tool available out there.

The purpose is to reduce the number requests (avoiding http handshaking costs) and bandwidth (by reducing size).

Compressing is done typically by compression algorithms like gzip or image compression formats (jpeg, png), and for text files (like JavaScript or CSS) we can go further using minifiers.

Using AMD to structure code gives us the ability to create bundles automatically based on the declared dependencies, we user the r.js optimizer tool (part of the require.js project) which gives enormous flexibility in terms of configuring the creation of different bundles.

This way using AMD we can keep excellent debuggability both on development environments (where bundling is off) and on production-like environments.

When bundling is off, require.js takes care of loading the necessary modules (dependencies) asynchronously, making the file tree on browser dev tools to show the same structure and content on the source project.
When bundling is on, we use uglify2 minifier as part of r.js optimization that provides source maps. Source maps are supported in most modern browsers and allow us to debug JavaScript code on production-like environments by seeing the same file structure and code that was used to produce the bundle, breakpoints and step-by-step debugging works as expected.

The final step in our bundle process is versioning bundle files. Versioning is a critical part of deploying applications. In order to use a CDN with long cache times you need to ensure cache is purged when new versions are deployed, the safest way to do this consistently is using versioned filenames to ensure cache busting.

And using content hashing ensure cache is busted if (and only if) needed (ie. when content changes).

To keep all this process in place we use grunt.js (a node.js build tool), which provides us with a great ecosystem of 3rd party plugins for most of these tasks, and it’s pretty easy to extend.

Declarative html+js integration

Once we moved to a fully AMD-based JavaScript structure, and setup r.js automatic bundling, we realized there was an important missing piece in this architecture.

How do we make use of this modules in our html UI without introducing noise or polluting the beautiful declarative-ness of our templates.

Now that we embraced the clean code organization that AMD allowed us, we decided to keep our templates equally clean, free of embedded JavaScript code.

In order to do so, we modeled the typical cases where JavaScript modules are related to the  UI, and made them explicit in our model:

Widgets: modules that attach to a specific DOM element, modifying it’s appearance or behavior, many instances can exist on a page (a concept similar to jQuery UI Widgets)

Behaviors: modules that custom behavior that can be enabled or disabled on a page basis, they are not associated to any specific visual element. eg. Keyboard shortcuts, auto-scrolling.

View-Modules: very small modules that can wire-up elements in a specific template. (a concept similar to codebehind files in webforms)

Now these are the only entry-points in the html templates to JavaScript modules (all the other modules are dependencies of these).

And we established an html convention to declare them:

On pure html:

<h1>Using pure html</h1>
<section>
    <H2>Widgets</H2>

    <!-- simplest example, will load "user-profile-pic" AMD module, and initialize this DOM element with it -->
    <div data-widget="user-profile-thumb"></div>

    <!-- specifying parameters for this widget instance -->
    <div data-widget="user-profile-thumb" data-user-profile-thumb-parameters='{"size": "large"}'></div>

    <!-- multiple widgets per element -->
    <ul data-widget="accordion, dockable" data-dockable-parameters='{"position": "left"}'>
        <li>item 1</li>
        <li>item 2</li>
        <li>item 3</li>
    </ul>

</section>
<section>
    <h2>Behaviors and View Modules</h2>
    <!-- lists of behaviors or view modules are specified on json format in script[data-modules] tags -->
    <script data-modules type="application/json">
        {
            "view":{
                "views/home/index":""
                },
            "behavior":{
                "keyboardShortcuts":"",
                "iefixes":">=8"
                }
        }
    </script>
</section>

And using razor:

<h1>Using Razor</h1>
    <h2>Widgets</h2>
    <div data-user-profile-thumb-parameters='{"size": "small"}'></div>
    <div></div>
    <!-- Razor helper to register multiple widgets using a selector -->
    @{ Widget.RegisterByCssSelector("user-profile-thumb", "div.profile", new { size = "large" }); }
    <!-- an optional parameter (C# anonymous object) is used as default it no parameters exists on the element -->
</section>
<section>
    <h2>Behaviors and View Modules</h2>
    <!-- Activate Behaviors -->
    @{ Behavior.Activate("iefixes", ">=8"); }
    <!-- Activate multiple Behaviors -->
    @{ Behavior.Activate("keyboardShortcuts, iefixes"); }
    <!-- Load the view module associated (by path) to the current template -->
    @{ ViewModule.Register(); }
    <!-- Finally, print the script tag with all the behavior and view modules declared so far -->
    @Module.LoadAll()
</section>

Finally this data-widget attributes and script tags are parsed and modules get loaded on the client using a small JavaScript library that runs on page initialization, and supports Ajax loaded content too.

Moving towards this declarative syntax allowed used to customize our bundling process, now we analyze templates looking for AMD modules (widgets, behaviors or view-modules) and create bundles based on them (dependencies are added automatically by r.js)

Now each time a module is used on a template it’s automatically bundled without extra work.

Later, if desired, this default behavior can be overridden by moving specific modules to separate custom bundles (eg. modules that are only used on less visited areas of the site)

Conclusion:

As you can see, we found an architecture that allowed us to tune the page load towards perceived performance (the time to first render, and the time until main elements are responsive reduced significatively), while still keeping the codebase cleanly organized and optimal in terms of maintainability, by keeping configuration as lean as possible.

Here are waterfall charts of one of the hottest pages in our website:

Image

Below is the same page in dev without bundling so you can really see how require has changed the waterfall. And of course see the benefit of bundling where we save over 24 http requests.

Image

The benefit is a much faster feeling website. As you can see above on the production /live page, we are seeing start render times under a second when we still have things loaded. Start render is before the main JS file is loaded.

 

Tags: , ,

Leveraging Caching and Amazon EC2 to scale BlogTalkRadio.com from 100k to 6M visits per day

 

BTR Growth

Back in Early 2012 were looking at under 100K visitors per day and struggling with our current infrastructure. Our Databases were straining, our web servers were railing and our all important ad server was maxed out.  We knew that caching was the only answer. We embarked on a multi faceted attack to be able to support 10x growth.

We thought about three levels of caching and how to achieve it.

  1. Data Caching. Caching Data sets even if only for a minute.
  2. Dot.Net object caching.
  3. Web Page HTML caching.

Data Caching – The fastest database query is the one that doesn’t have to happen.

For Data caching, we considered the holy grail of data caching layers where every request reads and writes to the cache and hits to the DB second via the middleware.  However, we didn’t have the resources or the time to go down that path yet.  We wound up using three different data caching mechanisms which were best for each use case.

The first data caching layer that we used was within .Net itself. We used .net data caching. We decided to use this for smaller data sets that rarely change such as for configuration data and lookup tables that required expiration only when the data changed. We wanted to limit this because with .net caching, the same data is stored in memory for every process on every server. We run 3 processes in a web garden for each web server so we would be caching the same data 30+ times. For data that changed rarely and accessed often,  this made sense.

For more set based data caching we turned to  Redis as a caching data store and Postsharp as a mechanism to instantiate it. Initially we used Service Stack as the client for Redis but were not happy with the performance. When we dug into it, we noticed that service stack opened and closed a connection with every access. Wasn’t very efficient. We switched to using Booksleeve  and that worked well for a while but had trouble with deserializing nested json objects.  Finally, we turned to protobuf which not only allowed for ascii serialization, but binary which was even more performant since it made the payload to and from Redis that much smaller.

We had our home grown caching layer but soon found another method of data caching that was super simple to implement and met a need that our other mechanisms did not. We found a product called ScaleArc that sits between the application and the database as a proxy/cache layer. To use it, all you do is point your connection string to it and set it up to use your existing DBs as the origins. Then all requests are reported in nice analytics. You can then see which queries are the most expensive and then set up ScaleArc to cache the result set for that type of request. It gives great stats on DB usage and cache hit ratios. We first used it to help scale our third party ad server since we didn’t have access to the code.  Our ad server has a horrible Sql Server DB bottleneck. We were railing it constantly which caused us a loss of potential revenue. We upgraded the hardware and then soon railed that server. Since it was a third party application, we could not make any changes to the app to improve it. ScaleArc allowed us to add a data caching layer without touching the application.  This gave us a lot more room for growth and paid for itself many times over.

One of our big sources of revenue was  advertising.  We did a lot buying of keywords to drive traffic to the site.  Then in turn, we made money on the ads that were on the pages that these users visited.  These users were not the same type of engaged users that our normal registered users were, so we wanted to segment this traffic so that these visitors did not affect our normal users.  We leveraged Amazon EC2 instances to serve up that traffic.  Since our DB servers sat in our collocation space, there was a lag to access the data from EC2 to our datacenter servers. We put ScaleArc in place for the  Amazon instances for our purchased traffic and implemented heavy caching.  Since most of the traffic driven by marketing is driven initially to a set of landing pages, the db caching had a high cache hit rate and has allowed us to scale that business tremendously.

Dot.net caching – Good for some applications.

Data caching can only take you so far.  Better yet, avoid needing to require any data access at all. To this end we started caching components of a page. We mainly cached some slowly changing user controls, header, footer and a handful of fairly static pages.

When you cache user controls, we quickly learned that referencing them in subsequent page loads becomes problematic. Since it is cached, any reference to the control will return null. We could not access the control. So we had to put in some error trapping. Downside of Dot.net  output caching is that there is still work being done on the server side even though greatly reduced. It forces you to develop with specific patters to expect controls to not be referenced in your code.

Web Page HTML caching – Edge caching for the win!

Things were getting better but to get 10-20x growth, we needed to go another caching step. The best way to scale your infrastructure is to not have it hit your infrastructure at all. CDN caching was the next important step for us. We had been using CDNs for our audio files, javascript, css and images for a while, but now we set our sights on caching of full pages.

When we went to the Velocity Conference for the first time a couple of years ago, we learned about a product called aiCache.  It is a caching appliance with a lot of flexibility. It could have different caches based on cookie, browser, etc.

This worked well for us because most of our traffic was guests who were not logged in. So based on cookie we were able to direct traffic via the load balancer to aiCache if there authenticationcookie or directly to our origin servers if the user was logged in. This way we didn’t have to change the application to show personalized information.

As we continued to grow we started seeing bottlenecks with logged in traffic and the aiCache infrastructure. So we wanted to add a layer of edge caching for our high traffic pages.

We had been using limelight as our CDN for images but the performance wasn’t fantastic.  After attending Velocity in California  we learned about TCP/IP overloading and looked for a CDN vendor that did it. The overloading is much faster than standard tcp/ip because it sends packets and does not wait for an Ack before sending the next packet. We saw times of 15ms from Cotendo vs 100ms from limelight for the same cached content. We went with Cotendo which was recently bought by Akamai. (Boo)

The key with successful CDN edge caching of our pages was two fold.

1)      Proper configuration – Setting TTLs, Include Querystring, etc

2)      Client Side Customizations.

Configuration required understanding the data and how often it changed. The structure of the URLs and understanding what requests couldn’t be cached was crucial.  Many Ajax requests and some pages needed to bypass cache and go directly to the origin servers. We had to carefully set TTLs and  identify when querystring should be part of the cache key or not.

Most importantly, our pages had customizations based on who is logged in. It needed to show user driven menus that changed to show the user’s avatar, premium level, etc.   In order to properly cache pages at the edge, we had to cache a generic for a non logged in user and then apply the customizations client side.  The complexity is that you don’t want the user to see the generic page and see the customization snap in. We want the page to appear nicely for the user.

To do this we knew that it had to be done in the header with as little required code as possibly. If we waited for JQuery to load or if we made an ajax call to get information it would slow down the page or cause that snap in. So the lesser of the evils was to store the minimum needed for the header customization in a cookie and have the small js snippet that knows how to generate the header html customizations in-line on the pages.

Interesting management challenge with edge caching of most pages was the paradigm shift needed by our developers.  We had a lot of old page customization code that we had to pull out of the back end C# code and move it front end Javascript.  User-based Analytics and displaying page partials based on the user’s history and level had to all be removed from the C# and moved to Javascript.  Developers needed to switch their thinking that pages would be generic for all users.  Even months after we implemented this pattern, we were still catching mistakes during code reviews where developers were doing server side customizations rather than on the client.

With this in place we were free to launch the CDN edge caching of HTML and our logged in users saw what they expect to see. All was good.

Because of the increased cookie payload, we made sure to move most of our static file http requests to cookieless domains (btrcdn.com) so it wouldn’t add a lot more overhead to the http requests.

Adding all of this caching enabled us to grow our traffic over 10x and still have plenty of room to grow another 10x.

 

Why tools that monitor and profile your website are so important

Over the last 12 months, we have been able to grow the unique visitor volume for BlogTalkRadio.com by 10X AND increase speed from an average of 10 seconds to 5 seconds per page load.

The architecture on how we scaled the site are a topic of another blog post. This blog post pertains to some critical tools that we used to drop the average page load in half even though volume increased dramatically.

If you are not measuring your site speed and the experience for your users, you are throwing darts against a wall in the dark. There are countless studies showing how page performance directly leads to increased visitor engagement and revenue.

The image below shows some of the famous stories where page performance increased revenue.

Our story is very similar. We have definitely seen an increase in revenue and visitor activity with faster page performance.

So what tools do we use to understand how our site is performing?

They key ones that we use are NewRelic and Dynatrace. We have always used Firebug and webpagetest.org but these only show you one off loads which often is not representative of your whole user base.

NewRelic shows us how each page performs, what part of the page life cycle is doing and even drills down to the back end and db calls.

Dynatrace has two parts to it.

  • There is a Free client piece that can be installed on your machine that gives extremely detailed information about how your page loads and about JS execution. It will show where your JS is running poorly.
  • There is an expensive server side that can show you down to the line of code how your pages are performing.

Using these tools, we worked release by release making changes to the hotspots that we found to increase performance.

These tools come in very handy when you release code to ensure that your changes did not cause an unanticipated performance issue. We had such an issue where we released something that was supposed to improve performance but under real conditions created a bug which greatly slowed down the site. We knew within an hour that we had an issue and released a hotfix. Without NewRelic, we would not have known that we had such an issue.

A HUGE win for us which showed the value of the Dynatrace Server tool happened the first day we used the  tool. We noticed that we kept seeing a helper class method call when reading the config variables taking longer than expected. Was taking 10-50ms each time. Some pages it would do this a couple of times.

We make a lot of use of config settings on almost every page of the site. The helper class that we were using was written years ago and used everywhere. Nobody has looked at that class since it was written.

Since Dynatrace was showing this class in the traces, I decided to look into it.

public static class myAppSettings
{
        public static string GetKeyAsString(string keyName)
        {
             try
             {
                 var config = WebConfigurationManager.OpenWebConfiguration("~");                 
                 var appSection = config.GetSection("appSettings") as AppSettingsSection;

                 try
                 {
                      return appSection.Settings[keyName].Value;
                 }
                 catch (NullReferenceException)
                 {
                     return keyName + " is missing";
                 }
             }
            catch
            {
                return WebConfigurationManager.AppSettings[keyName];
            }
       }
 }

Do you see the issue in the above code? This method is called every time we read a config value. This happens more than once per page load.

Issue was “var config = WebConfigurationManager.OpenWebConfiguration(“~”);”
This opens the config file. This code, every time a config variable was read, rather than reading from an in memory dictionary as expected, was opening the config file. Not good.

Since this is a static class, we just changed to open the file once when the class is instatiated as such:

public static class myAppSettings
{
        static Configuration config = WebConfigurationManager.OpenWebConfiguration("~");
        public static string GetKeyAsString(string keyName)
        {
             try
             {            
                 var appSection = config.GetSection("appSettings") as AppSettingsSection;

                 try
                 {
                      return appSection.Settings[keyName].Value;
                 }
                 catch (NullReferenceException)
                 {
                     return keyName + " is missing";
                 }
             }
            catch
            {
                return WebConfigurationManager.AppSettings[keyName];
            }
       }
 }

This seemingly small change yielded the following change in the NewRelic stats that shows the average time it takes on the server to generate the html.

You can see the dramatic drop in time on the day that we released the change. Finding silver bullets like this are very exciting.

This also explained a couple of odd behaviors that we noticed over the last couple of years that we were not able to understand until we started using these tools.
1) I always wondered why, on our site, when we pushed out a new config file, it immediately picked up any changes. I expected that it should only pick it up when the app pool was recycled. Now we know. It was opening it constantly.

2) A change we put in the header caused the web servers to jump in CPU. This change had included a couple of lookups in the config. This led us to think that reading the config was slow even though all of the articles that we read said it should be just reading from a dictionary in memory. Now we know why it was slow.

Some other hot spots that we found using these tools that helped to speed them up were around how we were loading third party widgets.
1) Facebook Comments needed to be made asynchronous and moved to after dom ready. We learned that you cannot move it till after the load event because it wont render.
2) Addthis share bar loads, twitter, google plus, facebook like, etc. Even though the JS that they provide says it is non blocking. It was to a point and loaded oddly on the page. By changing how it loads and moving it later in the page cycle, it enabled us to ensure the page was rendered before loading these third party tools.
3) On some of our pages we pull in json via an ajax request on dom ready and merge it with a template on the page to display information. By merely putting JS at the bottom of the page, it still executes before dom ready. therefore, some third party js was blocking what we wanted to do on dom ready.

Key learnings is not to just load third party code where they recommend. If we did that our pages would still be 10 seconds+. Experiment with moving them around and controlling when they load via $(window).load and $(document).ready.

NOTE: Be very careful of 3rd party code that has document.write in it. If you move that after the dom has been created, it will break most browsers. If Chrome sees it, it creates a new dom and blanks the page.

So without these tools (as well as firebug, Chrome Developer tools, webpagetest.org) we were flying blind as to how our site was performing out in the wild. Very often, it is different than in your lab setting and definately different than just looking at it as a one off page load via firebug or webpagetest.org. Don’t get me wrong, firebug and webpagetest.org are important tools. However, when you look at actual performance en-masse, in the wild, you see a ton of things that you normally wouldnt. Understanding what your site is doing, why it is doing it, and how your user’s experience is affected is critical to the success of your web property.

 

Tags: , , ,

Culture Issues on ASP.Net Websites

When building a website that can display the pages in the language and culture of the user, there are some type conversion issues that developers need to be aware of. When building our BlogTalkRadio.com site we hit a few of them and have learned some good lessons.

Over the years, our BlogTalkRadio.com web site has seen a number of strange errors littering our exception logs. Often times we see errors that we cannot reproduce ourselves. When analyzing these errors we sometimes find a correlation whereby every user seeing the error has the same language culture which is different than our’s here in the US. (en-US).

Places we have seen issues:

  1. Number Conversions
  2. DateTime Conversions
  3. String Comparisons

One way to combat these issues is to specify and control the culture being used. In Asp.net, many developers do not consider that when somebody comes to your site, and .Net framework executes your code, it will use the culture of the browser visiting the site to decide how to process the code.

Additionally, designers often don’t take into account how another culture’s date and numeric format will look on a page. To say nothing of how language and special characters cause issues. JSON is notoriously tied to US culture. We have seen many issues in JSON and XML with EBCIDIC characters causing errors and issues. Encoding these characters are a MUST.

CultureInvariant is a good tool used for converting from string to Datetime or number and for comparisons so it can handle a number formatted as 1.2 and Russian format such as 1,2 seemlessly. (http://stackoverflow.com/questions/2423377/what-is-the-invariant-culture)

Here are some common places where not thinking about Culture can come back to bite you. Below I default to U.S. culture when CultureInvariant wont do.

  • Some other Cultures such as Saudi Arabia have a different calendar. Therefore creating a date from year, month and date will have unexpected results. The year 2012 has not happened in their calendar yet so will error.

    Error Prone:

    showDate = new DateTime(year, month, day)

    Better:

    var CI = System.Globalization.CultureInfo.GetCultureInfo("en-US");
    showDate = new DateTime(year, month, day, CI.Calendar)
  • Calendar is also an issue going the other way. We had an issue here because urls were getting generated with completely incorrect dates in cultures with a different calendar

    Error Prone:

    var dt = ActualShowDate!=DateTime.MinValue?ActualShowDate :ShowDate;          
    sb.Append("/");
    sb.Append(dt.ToString("yyyy”));
    sb.Append("/");
    sb.Append(dt.ToString("MM”));
    sb.Append("/");
    sb.Append(dt.ToString("dd”));
    sb.Append("/");

    Better:

    var CI = System.Globalization.CultureInfo.GetCultureInfo("en-US");
    sb.Append("/");
    sb.Append(dt.ToString("yyyy", CI.DateTimeFormat ));
    sb.Append("/");
    sb.Append(dt.ToString("MM", CI.DateTimeFormat));
    sb.Append("/");
    sb.Append(dt.ToString("dd", CI.DateTimeFormat));
    sb.Append("/");
  • Below is another example of where a different calendat can generate bad dates. This started to bite us the most when we started using page caching. If a Saudi Arabian visitor hit the page and caused it to be cached then all visitors for the duration of the cache would get the strange dates.

    Error Prone:

    hlArchiveURL = _baseURL + archive.MonthStarting.ToString("yyyy/MM”);

    Better:

    hlArchiveURL = _baseURL + archive.MonthStarting.ToString("yyyy/MM", new CultureInfo("en-us", false));
  • We have have found that the decimal version numbers in the UserAgent are always in the U.S. format. Except of course when there is a letter appended such as IE 7.0b. So the Error Prone code was erroring for non US users since 7.0 is not valid in Brazil.Error Prone:
    if (Request.Browser.Browser == "IE"){
        double browVer = Convert.ToDouble(Request.Browser.Version);
        if (browVer <= 7.0) { 
        	// put hacks for ie7 here.
        }
    }

    Better:

    var CI = System.Globalization.CultureInfo.GetCultureInfo("en-US");
    if (Request.Browser.Browser == "IE"){
        double browVer = Convert.ToDouble(Request.Browser.Version, CI);
        if (browVer <= 7.0) { 
        // put hacks for ie7 here.
        }
    }
  • We use Elasticsearch for our searches. When we tweaked the ranking alghoritm to use decimal values, all of a sudden people outside the US were getting errors when trying to search.This is the call within our code to the ElasticSearch client.
    booleanQuery.Should(new TermQuery("HostDisplayName", term,  1.3, true));

    Within the ElasticSearch Client code this line was giving us trouble because the JSON format when it hit the server wasn’t supported. The server could not handle a number in the format 1,3. Without specifying culture, many non US users were generating 1,3 in the JSON vs 1.3.

    Error Prone:

    writer.WriteRawValue(string.Format("{{\"filter\": {{\"term\": {{ \"{0}\" : \"{1}\" }} }}, \"boost\": \"{2}\" }}", term.Field, term.Value, term.Boost.ToString()));

    Better:

    writer.WriteRawValue(string.Format("{{\"filter\": {{\"term\": {{ \"{0}\" : \"{1}\" }} }}, \"boost\": \"{2}\" }}", term.Field, term.Value, term.Boost.ToString(Utils.JsonBuilder.UsCultureProvider)));
  • Different languages have different comparison rules. Some languages have two character letters, others have an accent mark or tilde over a letter that makes it different. We learned the hard way that in Turkish culture I != i. In fact we discovered that ToLower(“I”) != “i”. Capital I is not the same letter as lowercase I. Therefore, doing a ToLower didn’t help. We were not getting a string match when someone came from Turkey. Therefore we switched to use ToLowerInvariant() which did the trick.

    Error Prone:

    where _includeExtensions.Contains(Path.GetExtension(file)) && !file.ToLower().Contains("includefilesource\\")

    Better:

    where _includeExtensions.Contains(Path.GetExtension(file)) && !file.ToLowerInvariant().Contains("includefilesource\\")

    Please note that for brevity I did not include try/catch logic to account for bad data with some of the code.

HOWEVER,

There is one setting that can be made in the web.config that would have made all of these issues moot.

    <globalization culture="en-US" uiCulture="en-US" />

This setting sets the entire site to US culture and does not take the client’s culture into account. This can be overridden by setting a particular culture within the code on a case by case basis if needed.

The downside of this setting is that you cannot use .Net’s functionality to display dates, numbers and times in the local user’s format. For our BlogTalkRadio.com site, we don’t want to use this functionality anyway.

For high volume sites, caching pages and output caching is common. If someone from an Arabic culture hits the page and causes it to be cached and then someone in New York hits the cached page, the NY will be a bit confused by the odd Arabic date that they will see. Therefore, any client formatting should be done via JS on the page itself.

Summary: It is important to think about how your website will handle different cultures in your application. Not thinking about it will cause errors and a bad experience for most users outside of your home country. You can handle it via global settings or detailed individual settings.

 

Tags: , ,