Rails Deployment Options

There are many ways to deploy a Rails application. Sorting them all out and finding the right solution for your situation can be confusing. In this post I list and briefly describe popular choices.

Ruby Virtual Machine

Most Ruby implementations can run Rails. Some, however, cannot. The ability to run Rails is a major achievement for alternative Ruby VMs. For detailed comparisons check out Antonio Cangiano’s blog. He does the Ruby shootout.

  • MRI
    • 1.8 standard
    • Ruby 1.8.6 is the recommended version to use with Rails
  • YARV
    • 1.9 standard
    • Significantly faster than MRI
    • Rails is not yet fully compatible with Ruby 1.9
    • Many gems are not compatible with Ruby 1.9
  • JRuby
    • Java implementation of Ruby
    • Runs Rails
  • Rubinius
    • “Ruby in Ruby”
    • Runs Rails
  • Ruby Enterprise Edition
    • From the creators of Passenger
    • Fork of MRI
    • 33% less memory consumption on average when used with Passenger
  • MagLev
    • Commercial
    • Pending release
    • Lots of promise in terms of performance and features, but won’t run Rails for some time

For critical production applications, there are really only two choices to consider. If you are using Passenger, use Ruby Enterprise Edition. Otherwise, use the standard Ruby VM. The other implementations are progressing rapidly and developers are working hard to make them production ready.

Server Configuration

This is an active area. Notable configurations include:

  • nginx + mongrel | thin | ebb | fuzed (yaws)
    • nginx is a powerful lightweight frontend server/reverse proxy/load balancer that can withstand a real pounding
    • mongrel is the veteran backend web server for Ruby on Rails
    • thin is an evented backend server that’s faster than mongrel and supports unix socket connections
    • ebb is an evented backend server written in C that’s faster than thin and also supports unix socket connections, but it uses more memory than thin while idling
    • fuzed allows Rails to be served up by yaws, a server written in Erlang that provides an unparalleled degree of concurrency
  • Apache + Passenger (mod_rails) + Ruby Enterprise Edition
    • New and exciting deployment option for Apache
    • Easy to setup
    • Deploying an app can be as simple is uploading your app
    • mod_rails and Ruby Enterprise Edition, both developed by Phusion, together provide a 33% lower memory footprint (for Rails) on average
    • Integrated monitoring and load balancing; monitors Rails processes and starts/kills them as necessary based on demand
  • LiteSpeed
    • Commercial
    • Relatively easy to setup
    • Better performance than most other solutions
    • Despite its qualities, not a popular choice

I only mention LiteSpeed because of its performance. Few people actually use it for serious Rails deployments. I omitted lighttpd from the list because nginx has stolen the show. Ancient solutions like fastcgi were also omitted.

I use nginx + thin. I have not transitioned to ebb because of higher memory consumption (at least at idle). I included the fuzed project in my list because I find yaws and Erlang fascinating. Yaws puts Apache to shame when it comes to concurrency. I’m not sure how polished the fuzed project is, but it could be a contender. It’s also good to see cooperation between Ruby and Erlang. Mongrel, thin, and ebb are all good options. It all depends on your needs and preferences.

I have not tried out Passenger. It is being touted as a breakthrough solution because of how simple it makes deployment. My first impression is that it is more for deployment novices and conventional shared hosting environments. With WebFaction, you have the freedom and ability to build your own app stack. I’ve made this a breeze with a shake and bake shell script. Nginx is a better frontend server than Apache in its ability to serve static pages and with regard to memory usage. Unfortunately, Passenger only works with Apache.

Load Balancing

Load balancing allows your applications to scale horizontally.

  • Hardware
    • For very large applications
    • Most advanced
    • Expensive
  • HAProxy
    • For large applications
    • Very advanced
    • Difficult to setup
  • nginx-upstream-fair
    • Third party module for nginx
    • Adds fair load balancing to nginx (replaces standard round-robin load balancing)
    • Very simple to setup
    • Small to large applications

I use the nginx-upstream-fair module for load balancing. Written by Grzegorz Nosek, the module works very well and is so easy to setup that there is no reason not to do so.

Monitoring

To make sure that your processes are behaving, you need a process monitor.

I use monit. I haven’t tried the god gem, but I’ve heard good things.

Multi-core Processors and Parallel Computing

The performance of modern computer processors can be likened to space in Manhattan. On the island of Manhattan, the fundamental problem of scaling outward is overcome by scaling upward. The opposite is true in today’s computer processors. Clock frequency is limited by physical and economic factors, such as power/cooling requirements. Computer performance continues to improve at a predictable rate, however, because an increasing number of processors are used to work in parallel. Methods for utilizing multiple processors include:

These technologies can be combined. Apple’s Mac Pro can be equipped with two quad-core processors. Sun manufactures multi-core processors with multiple hyper-threads per core. An SMP capable UltraSPARC T2 Plus ships with 8 cores and 4 hyper-threads/core. That’s virtually equivalent to 32 cores per processor. A computer cluster can be composed of just about any computer system that can be networked.

From the list above, the most recent technology to enter the market is multi-core. Multi-core technology represents a fundamental shift in processor design. Performance is driven by core quantity rather than clock frequency. Clock frequency is still important, but not as much as it used to be.

It is no coincidence that Intel dropped the venerable Pentium name. The Pentium name correlates computer performance directly with clock frequency. The switch to the Core name helps consumers unfamiliar with the concept of benchmarking to discern apples from oranges. It also serves to forge a strong association between multi-core technology and Intel.

Multi-core technology has also changed the landscape of software development. Performance is now concurrency based. It’s no longer a certainty that software will run faster if programmers leave it up to technology turnover alone. For best performance, software must be explicitly written to take advantage of multiple cores. Otherwise, performance is limited to that of a single core. All programs can benefit from multi-core technology at the operating system level through multitasking. Different processes can be handled concurrently by different cores. This means that a multi-core computer will not get bogged down while running a CPU intensive application. For the average user, only a few cores are sufficient to experience the full extent of this benefit.

Sequentially written programs can only utilize a single core. To utilize multiple cores, these programs must be parallelized. The degree to which a program can be parallelized determines how much faster it can run on a multi-core machine and how many cores are required to approach maximum performance. Parallel programming is subject to Amdahl’s Law.

Many problems are easy to parallelize. These problems are called “embarrassingly parallel”. Other problems require various degrees of cleverness. Some problems are fundamentally sequential. Generally speaking, the larger a problem, the more likely it can be broken down and parallelized.

Parallel programming is inherently more complex than sequential programming. It introduces a unique set of behaviors, which can result in errors that are difficult to debug. One such behavior is the race condition, where an outcome is sequence dependent. Even worse, nearly every programming language is fundamentally flawed in its support for parallel programming. Shared memory, locks, and mutexes are no good. Erlang gets it right. However, Erlang may be too strange to achieve critical mass.

The asymmetry between hardware and software development is well recognized. Unless something profound emerges, rapid expansion in processor cores per computer (“core sprawl”, to coin a phrase) will significantly widen the gap. Automatic or assisted parallelization would be tremendous. Unfortunately, there has been little to show for many decades of work on automatic parallelization.

Many people, companies, and institutions are hard at work trying to make parallel programming easier. Some encouraging news comes from Apple. Practically lost among the iPhone 3G hoopla at WWDC 2008, the basic plans for Mac OS X 10.6 (Snow Leopard) were publicly disclosed. The new operating system is supposed to be much leaner than its predecessor and multi-core optimized. Multi-core optimization comes from a set of technologies together called Grand Central. According to Apple:

Grand Central takes full advantage by making all of Mac OS X multicore aware and optimizing it for allocating tasks across multiple cores and processors. Grand Central also makes it much easier for developers to create programs that squeeze every last drop of power from multicore systems.

The most detailed account I’ve found about Grand Central comes from RoughlyDrafted (found via Mac Rumors). Other interesting articles on Grand Central come from AnandTech and Mac Rumors. Apple’s parallelization solution presumably works by “handling processes like network packets”. That would make it easier to delegate work across multiple cores.

Multi-core technology represents an exciting convergence. Personal computers have become very much like supercomputers in terms of performance scaling. Parallel programming techniques for supercomputers can be applied to modern personal computers. Clustering and distributed computing in general will benefit significantly from the rise in parallel programming competency. New and exciting applications will result and web application scaling will become easier.