RSS

Category Archives: Web Development

Open letter Computer Science majors

I have been in the position to hire programmers in many of the companies that I have worked for.  I have seen my share of developer resumes from people who are either still in school or recently graduated.  The vast majority have no real world experience. They put college or graduate school project work down as real experience.  Hiring managers in technology know that school assignments and projects are nothing like the real world. They are well defined, usually little business value and done in an overly controlled way.  When I have interviewed these applicants, it is obvious that they are missing some key knowledge that is mainly taught in the real world.

I enjoyed lifeguarding in the summers before my freshman and sophomore years of college.  It was a ton of fun and I made some decent money.  Then, during my sophomore year, I got some good advice. Lifeguarding may be great at the time, but it is not going to help me get a job in technology when I graduate. In your senior year, when you start applying for a job, your resume hopefully ends up the desk of some hiring manager if you can get past the HR screener. Your resume will be submitted along with many other resumes showing off their real work experience.

I applied and secured a summer internship as an Informix programmer at AT&T for the next two summers.  When I graduated, it was during an economic downturn. Only half of the graduates from my Tufts engineering class had jobs upon graduation. I was one of them because I had interned at AT&T and had done well. They hired me full time when I graduated. Had I kept lifeguarding, getting a “real” job when with no relevant experience would have been much, much harder.

In the last 20 years, things have changed a lot in technology.  There is much more to learn but way more opportunities get real experience coding in real projects.  There is also much more competition for the best jobs.

If you take nothing else from this blog, take this point: A pet peeve of mine when I interview someone is when they tell me that they are interested in learning some technology in our stack but have never even read about it or attempted to play with it.  It is too easy today to spin up a free virtual environment on your own computer or in the cloud to try out almost anything.  If you want to impress a hiring manager when you are a newly minted graduate, show them you have initiative and drive to learn and contribute with actions, not words.

In my humble opinion, Universities are not doing enough to prepare their computer science candidates to function in a real development environment. They do not teach a lot of very important concepts.

Top 10 Concepts that students should know when they graduate.

  1. Inversion of Control – Dependency Injection service locator, factory pattern.
  2. Unit Testing, Integration testing, and regression testing.
  3. Source control (Git, SVN)
  4. Development Patterns (Singleton, Repository, Adaptor)
  5. Agile/Kanban Methodologies vs waterfall
  6. Relational vs NOSQL databases
  7. Secure Coding practices
  8. Scaling systems
  9. Service oriented architecture
  10. Performance tuning (caching, efficient code, etc)

What are the top 10 things you can do to improve your marketability

  1. Summer Internships where you actually code
  2. COOP programs where you actually code.
  3. Teach yourself the above concepts.
  4. Do your own projects and open source it. Build something interesting.
  5. Contribute to open source projects.
  6. Learn and understand source control. Git flow and SVN are important ones.
  7. Use the cloud resources out there to play with different technologies.
  8. Go to meetups on technology that you are interested in and ask questions.
  9. Having a blog and Github account shows what you can do and shows initiative.
  10. Above all, have a passion to learn, try new things and be an innovator.

Jobs in technology are out there.  It is much easier to get a job right out of school if you show initiative, ability to learn and contribute and to have a body of real work to display.  When a hiring managers makes a decision to bring somebody in, they are gambling that they are going to contribute successfully to the company. If you do internships or Coops or show real initiative and a body of interesting work it will help the hiring manager feel that the odds of you succeeding are in your favor.

 

 

 

Advertisements
 

Tags: , ,

ORM or not to ORM that is the question

ORMs (Object relationship mapper) have come a long way over the last few years.

ORMs used to be very slow and hard to tune but that is not true anymore. However, the decision to use one or not still weighs on every technology leader.

  • If you decide to use one, which one?
  • When you pick one how much of the functionality do you use?
  • How do you tune the queries it generates?
  • Is a full ORM right for you or is a micro ORM more a fit?

This blog entry is not going to answer any of the above questions 😉 This is more talking about what we did on BlogTalkRadio, what were our pain points and how we dealt with them.

When we first built BlogTalkRadio’s data access layer 7 years ago, we looked to see if we wanted to use an ORM. Since we were a .Net shop we looked at Entity Framework and nHibernate. Both were not only  in their infancy and were slow and cumbersome.

So in 2006, We decided to build our own Data Access Layer and mapper that called stored procedures exclusively and did all of the result set mappings to classes with private functions that referenced columns by index. Why by column index and not name? Because when we launched, we were on ASP.Net 2.0 where the reflection methods that are used under the covers to convert that name to an index in ADO was VERY expensive at the time.  It wasn’t until 3.5 that it became more efficient.   I think this was also a contributing reason early ORMs for .net were so slow.

 const int C_ID= 0;
 const int C_WIDTH= 1;
 const int C_HEIGHT= 2;
 const int C_TYPE_ID= 3;

private static MyEntity GetMyEntityFromReader(IDataReader reader)
{
  var MyEntityVariable = new MyEntity
   {
     ID = reader.GetInt32(C_ID),
     Width = reader.GetInt32(C_WIDTH),
     Height = reader.GetInt32(C_HEIGHT),
     Type = (Type)reader.GetInt32(C_TYPE_ID)
   };
  return zone;
}

We grew to hundreds of stored procedures and dozens of entities. After a couple of years with this pattern, it was obvious that there were shortcomings with this method. Let’s list the top few issues, shall we.

  1. We could not change the column order or insert a column in a result set. It forced us to be very careful about backward compatability. Adding a field in the middle would throw off the column count. (Note: When we moved to .net 4 framework I did a benchmark of using Index vs Name and found no discernible difference anymore. After that, we stopped using the index and started using name.)
  2. Having standard Entities that are manually mapped forced us to return ALL fields on every query. Every stored procedure that returned show information had all columns returned. So if we only needed a couple of fields in an entity that has 30 attributes, the procedure returned them all. Occasionally we would create a custom mapping for a particular use case but managing that with a team of 3 developers was too great.
  3. By Extension, if we wanted to add one field, we would have to change sometimes 20 or 30 stored procedures that handle specific CRUD operations and specialized queries for parts of the system.
  4. We had to add tons over overloads to the DALC to handle different parameter sets since nothing is dynamic.
  5. It made it very cumbersome to add new Entities. Had to create the Entity, create the DALC class, create all of the methods calling the stored proc, create the mapping of fields to attributes.

So for the first 4 years we worked fine this way. We had a small static team and we understood the shortcomings and worked around them by always ensuring backward compatability. Columns had to be added to the end of the results sets. We did tedious updates to many stored procs.

Our justification was performance. ORMs were slow and we had a site with millions of page views per day. We did not want to slow down the site.

Then, a couple of years ago, we realized that the justification of saying we had better performance without an ORM no longer held water.  Well, that was true for some of them. Some continued to be a dog. In addition, changes in .Net 4.5 framework boosted LINQ query performance significantly.

So with the performance bottleneck no longer a concern we set upon implementing an ORM. We decided on nHibernate. I am embarrassed to say that it was not for the right reasons. Mainly we chose it because some of my developers were very familiar with it and they lobbied for it. My thinking was I would prefer to use something that my engineers knew so implementation would be smoother since they already know some of the gotchas and how to get around them. This reasoning did prove to be correct.

It is very common to choose technology based on familiarity rather than doing your due diligence.

If you are reading carefully,  this article has indicated that everything was stored procedure based. Obviously nHibernate can call stored procedures just fine and map the results but this is like using a sledge hammer to hang a picture. If you just want to do mapping and call SPs I would have used the micro ORM,  Dapper instead.

So the plan was to start small and initially implement nHibernate on small simple queries and CRUD operations in parallel to the existing DALC. Then we expanded it to joining complex entities and even try to replace some complex stored procedures.

However, all was not peaches and cream. We hit and had to solve a few issues. We also had to come up with our going forward patterns and how to handle the existing DALC.

Issues:

  1. In SQL Server I am adamant about using (nolock) or Setting Isolation to Uncommited on queries to keep queries from blocking Inserts and Updates. This actually was a requirement that whatevery ORM we use, had to support this for SQL Server.  We solved it this way:
    private static ISession BeginSession(ISessionFactory sessionFactory,
                                       bool useTransaction)
     {
        var session = sessionFactory.OpenSession();
        session.FlushMode = FlushMode.Never;
        if (useTransaction)
        {
          session.BeginTransaction(IsolationLevel.ReadUncommitted);
        }
        return session;
    }
  2. The biggest issue with using an ORM is the query that gets generated. We initially used nHibernate in a project for our host tools. We built an api layer so this was a good place to start off and not affect the old stuff. We built it and then did the performance check. It was slow. Very slow. So we profiled all of the queries and there were some doozies in there. These are a few of the issues we had to deal with in the queries that it generated.
    1. The ORM had created some that joined the same table 3 times.
    2. Used outer joins instead of inner joins
    3. My favorite was the lower() function around the column in the where clause. WHERE lower(column) = lower(@value). Makes sense in C# to so this:
      queryable.Where(x => x.AspNetUser.Username.ToLower() == this.UserName.ToLower())

      But that keeps an index from being used when you convert it to SQL.

    4. Turning off lazy loading, We had some entities with sub entities and did not want nHibernate going and loading everything. Often, it wasn’t necessary. We wanted to explicitly load the entities and attributes that we wanted when we wanted them.
    5. Reverted some back to SP because they were quite complex and we could not get nHibernate to do what we wanted it to do.
    6. It is easy to get into a situation where you are doing n+1 queries. You request one entity and then it goes and fetches additional data for each item loaded into the collection. We had to eradicate a couple of those.
  3. Maintenance of queries. In our old stored procedure pattern, if there was a performance issue or we needed to change a query, we just had to change it in the Db which is 100 times simpler than making the c# code changes, compiling, deploying, etc.  DB we can hot deploy and there is zero downtime. Not so with compiled .net code. We had to adjust out expectations and process to deal with it.

This, of course begs the debate between  using SPs vs not using any. I will reserve that for another post.

So where we are now is a mix of old DALC and new DALC. As we rewrite pages we will be making the old DALC obsolete. I look forward to hitting the delete button on the project.

I expect that we will move towards all CRUD operations being done purely through nHibernate (Or a different ORM if we decide to change. Gotta love dependency injection (DI)).

Simple queries will move to nHIbernate more and more but the more complex ones may remain as Stored Procedures for quite a while.

Although, lately I have been thinking that using an ORM gives you the ability to become DB agnostic. If we want to use something like MEMSQL where the entire DB sits in memory but there are no stored procs, we can. If we want to keep everything in Mongo, there are ORMs for that. Gives a lot of flexibility. Just remember, with great flexibility comes performance hits so it depends on your application.

In Summary, writing your own gives you control but then you need to do everything yourself.  Using a tool like an ORM gives the developer more power to not think about the DB and just work with objects.

 
Comments Off on ORM or not to ORM that is the question

Posted by on December 7, 2013 in Web Development, Web Performance

 

Tags: , , , ,

Why tools that monitor and profile your website are so important

Over the last 12 months, we have been able to grow the unique visitor volume for BlogTalkRadio.com by 10X AND increase speed from an average of 10 seconds to 5 seconds per page load.

The architecture on how we scaled the site are a topic of another blog post. This blog post pertains to some critical tools that we used to drop the average page load in half even though volume increased dramatically.

If you are not measuring your site speed and the experience for your users, you are throwing darts against a wall in the dark. There are countless studies showing how page performance directly leads to increased visitor engagement and revenue.

The image below shows some of the famous stories where page performance increased revenue.

Our story is very similar. We have definitely seen an increase in revenue and visitor activity with faster page performance.

So what tools do we use to understand how our site is performing?

They key ones that we use are NewRelic and Dynatrace. We have always used Firebug and webpagetest.org but these only show you one off loads which often is not representative of your whole user base.

NewRelic shows us how each page performs, what part of the page life cycle is doing and even drills down to the back end and db calls.

Dynatrace has two parts to it.

  • There is a Free client piece that can be installed on your machine that gives extremely detailed information about how your page loads and about JS execution. It will show where your JS is running poorly.
  • There is an expensive server side that can show you down to the line of code how your pages are performing.

Using these tools, we worked release by release making changes to the hotspots that we found to increase performance.

These tools come in very handy when you release code to ensure that your changes did not cause an unanticipated performance issue. We had such an issue where we released something that was supposed to improve performance but under real conditions created a bug which greatly slowed down the site. We knew within an hour that we had an issue and released a hotfix. Without NewRelic, we would not have known that we had such an issue.

A HUGE win for us which showed the value of the Dynatrace Server tool happened the first day we used the  tool. We noticed that we kept seeing a helper class method call when reading the config variables taking longer than expected. Was taking 10-50ms each time. Some pages it would do this a couple of times.

We make a lot of use of config settings on almost every page of the site. The helper class that we were using was written years ago and used everywhere. Nobody has looked at that class since it was written.

Since Dynatrace was showing this class in the traces, I decided to look into it.

public static class myAppSettings
{
        public static string GetKeyAsString(string keyName)
        {
             try
             {
                 var config = WebConfigurationManager.OpenWebConfiguration("~");                 
                 var appSection = config.GetSection("appSettings") as AppSettingsSection;

                 try
                 {
                      return appSection.Settings[keyName].Value;
                 }
                 catch (NullReferenceException)
                 {
                     return keyName + " is missing";
                 }
             }
            catch
            {
                return WebConfigurationManager.AppSettings[keyName];
            }
       }
 }

Do you see the issue in the above code? This method is called every time we read a config value. This happens more than once per page load.

Issue was “var config = WebConfigurationManager.OpenWebConfiguration(“~”);”
This opens the config file. This code, every time a config variable was read, rather than reading from an in memory dictionary as expected, was opening the config file. Not good.

Since this is a static class, we just changed to open the file once when the class is instatiated as such:

public static class myAppSettings
{
        static Configuration config = WebConfigurationManager.OpenWebConfiguration("~");
        public static string GetKeyAsString(string keyName)
        {
             try
             {            
                 var appSection = config.GetSection("appSettings") as AppSettingsSection;

                 try
                 {
                      return appSection.Settings[keyName].Value;
                 }
                 catch (NullReferenceException)
                 {
                     return keyName + " is missing";
                 }
             }
            catch
            {
                return WebConfigurationManager.AppSettings[keyName];
            }
       }
 }

This seemingly small change yielded the following change in the NewRelic stats that shows the average time it takes on the server to generate the html.

You can see the dramatic drop in time on the day that we released the change. Finding silver bullets like this are very exciting.

This also explained a couple of odd behaviors that we noticed over the last couple of years that we were not able to understand until we started using these tools.
1) I always wondered why, on our site, when we pushed out a new config file, it immediately picked up any changes. I expected that it should only pick it up when the app pool was recycled. Now we know. It was opening it constantly.

2) A change we put in the header caused the web servers to jump in CPU. This change had included a couple of lookups in the config. This led us to think that reading the config was slow even though all of the articles that we read said it should be just reading from a dictionary in memory. Now we know why it was slow.

Some other hot spots that we found using these tools that helped to speed them up were around how we were loading third party widgets.
1) Facebook Comments needed to be made asynchronous and moved to after dom ready. We learned that you cannot move it till after the load event because it wont render.
2) Addthis share bar loads, twitter, google plus, facebook like, etc. Even though the JS that they provide says it is non blocking. It was to a point and loaded oddly on the page. By changing how it loads and moving it later in the page cycle, it enabled us to ensure the page was rendered before loading these third party tools.
3) On some of our pages we pull in json via an ajax request on dom ready and merge it with a template on the page to display information. By merely putting JS at the bottom of the page, it still executes before dom ready. therefore, some third party js was blocking what we wanted to do on dom ready.

Key learnings is not to just load third party code where they recommend. If we did that our pages would still be 10 seconds+. Experiment with moving them around and controlling when they load via $(window).load and $(document).ready.

NOTE: Be very careful of 3rd party code that has document.write in it. If you move that after the dom has been created, it will break most browsers. If Chrome sees it, it creates a new dom and blanks the page.

So without these tools (as well as firebug, Chrome Developer tools, webpagetest.org) we were flying blind as to how our site was performing out in the wild. Very often, it is different than in your lab setting and definately different than just looking at it as a one off page load via firebug or webpagetest.org. Don’t get me wrong, firebug and webpagetest.org are important tools. However, when you look at actual performance en-masse, in the wild, you see a ton of things that you normally wouldnt. Understanding what your site is doing, why it is doing it, and how your user’s experience is affected is critical to the success of your web property.

 

Tags: , , ,

Culture Issues on ASP.Net Websites

When building a website that can display the pages in the language and culture of the user, there are some type conversion issues that developers need to be aware of. When building our BlogTalkRadio.com site we hit a few of them and have learned some good lessons.

Over the years, our BlogTalkRadio.com web site has seen a number of strange errors littering our exception logs. Often times we see errors that we cannot reproduce ourselves. When analyzing these errors we sometimes find a correlation whereby every user seeing the error has the same language culture which is different than our’s here in the US. (en-US).

Places we have seen issues:

  1. Number Conversions
  2. DateTime Conversions
  3. String Comparisons

One way to combat these issues is to specify and control the culture being used. In Asp.net, many developers do not consider that when somebody comes to your site, and .Net framework executes your code, it will use the culture of the browser visiting the site to decide how to process the code.

Additionally, designers often don’t take into account how another culture’s date and numeric format will look on a page. To say nothing of how language and special characters cause issues. JSON is notoriously tied to US culture. We have seen many issues in JSON and XML with EBCIDIC characters causing errors and issues. Encoding these characters are a MUST.

CultureInvariant is a good tool used for converting from string to Datetime or number and for comparisons so it can handle a number formatted as 1.2 and Russian format such as 1,2 seemlessly. (http://stackoverflow.com/questions/2423377/what-is-the-invariant-culture)

Here are some common places where not thinking about Culture can come back to bite you. Below I default to U.S. culture when CultureInvariant wont do.

  • Some other Cultures such as Saudi Arabia have a different calendar. Therefore creating a date from year, month and date will have unexpected results. The year 2012 has not happened in their calendar yet so will error.

    Error Prone:

    showDate = new DateTime(year, month, day)

    Better:

    var CI = System.Globalization.CultureInfo.GetCultureInfo("en-US");
    showDate = new DateTime(year, month, day, CI.Calendar)
  • Calendar is also an issue going the other way. We had an issue here because urls were getting generated with completely incorrect dates in cultures with a different calendar

    Error Prone:

    var dt = ActualShowDate!=DateTime.MinValue?ActualShowDate :ShowDate;          
    sb.Append("/");
    sb.Append(dt.ToString("yyyy”));
    sb.Append("/");
    sb.Append(dt.ToString("MM”));
    sb.Append("/");
    sb.Append(dt.ToString("dd”));
    sb.Append("/");

    Better:

    var CI = System.Globalization.CultureInfo.GetCultureInfo("en-US");
    sb.Append("/");
    sb.Append(dt.ToString("yyyy", CI.DateTimeFormat ));
    sb.Append("/");
    sb.Append(dt.ToString("MM", CI.DateTimeFormat));
    sb.Append("/");
    sb.Append(dt.ToString("dd", CI.DateTimeFormat));
    sb.Append("/");
  • Below is another example of where a different calendat can generate bad dates. This started to bite us the most when we started using page caching. If a Saudi Arabian visitor hit the page and caused it to be cached then all visitors for the duration of the cache would get the strange dates.

    Error Prone:

    hlArchiveURL = _baseURL + archive.MonthStarting.ToString("yyyy/MM”);

    Better:

    hlArchiveURL = _baseURL + archive.MonthStarting.ToString("yyyy/MM", new CultureInfo("en-us", false));
  • We have have found that the decimal version numbers in the UserAgent are always in the U.S. format. Except of course when there is a letter appended such as IE 7.0b. So the Error Prone code was erroring for non US users since 7.0 is not valid in Brazil.Error Prone:
    if (Request.Browser.Browser == "IE"){
        double browVer = Convert.ToDouble(Request.Browser.Version);
        if (browVer <= 7.0) { 
        	// put hacks for ie7 here.
        }
    }

    Better:

    var CI = System.Globalization.CultureInfo.GetCultureInfo("en-US");
    if (Request.Browser.Browser == "IE"){
        double browVer = Convert.ToDouble(Request.Browser.Version, CI);
        if (browVer <= 7.0) { 
        // put hacks for ie7 here.
        }
    }
  • We use Elasticsearch for our searches. When we tweaked the ranking alghoritm to use decimal values, all of a sudden people outside the US were getting errors when trying to search.This is the call within our code to the ElasticSearch client.
    booleanQuery.Should(new TermQuery("HostDisplayName", term,  1.3, true));

    Within the ElasticSearch Client code this line was giving us trouble because the JSON format when it hit the server wasn’t supported. The server could not handle a number in the format 1,3. Without specifying culture, many non US users were generating 1,3 in the JSON vs 1.3.

    Error Prone:

    writer.WriteRawValue(string.Format("{{\"filter\": {{\"term\": {{ \"{0}\" : \"{1}\" }} }}, \"boost\": \"{2}\" }}", term.Field, term.Value, term.Boost.ToString()));

    Better:

    writer.WriteRawValue(string.Format("{{\"filter\": {{\"term\": {{ \"{0}\" : \"{1}\" }} }}, \"boost\": \"{2}\" }}", term.Field, term.Value, term.Boost.ToString(Utils.JsonBuilder.UsCultureProvider)));
  • Different languages have different comparison rules. Some languages have two character letters, others have an accent mark or tilde over a letter that makes it different. We learned the hard way that in Turkish culture I != i. In fact we discovered that ToLower(“I”) != “i”. Capital I is not the same letter as lowercase I. Therefore, doing a ToLower didn’t help. We were not getting a string match when someone came from Turkey. Therefore we switched to use ToLowerInvariant() which did the trick.

    Error Prone:

    where _includeExtensions.Contains(Path.GetExtension(file)) && !file.ToLower().Contains("includefilesource\\")

    Better:

    where _includeExtensions.Contains(Path.GetExtension(file)) && !file.ToLowerInvariant().Contains("includefilesource\\")

    Please note that for brevity I did not include try/catch logic to account for bad data with some of the code.

HOWEVER,

There is one setting that can be made in the web.config that would have made all of these issues moot.

    <globalization culture="en-US" uiCulture="en-US" />

This setting sets the entire site to US culture and does not take the client’s culture into account. This can be overridden by setting a particular culture within the code on a case by case basis if needed.

The downside of this setting is that you cannot use .Net’s functionality to display dates, numbers and times in the local user’s format. For our BlogTalkRadio.com site, we don’t want to use this functionality anyway.

For high volume sites, caching pages and output caching is common. If someone from an Arabic culture hits the page and causes it to be cached and then someone in New York hits the cached page, the NY will be a bit confused by the odd Arabic date that they will see. Therefore, any client formatting should be done via JS on the page itself.

Summary: It is important to think about how your website will handle different cultures in your application. Not thinking about it will cause errors and a bad experience for most users outside of your home country. You can handle it via global settings or detailed individual settings.

 

Tags: , ,