Home > Scalability > System Scalability: Database and Storage “Sharding”

System Scalability: Database and Storage “Sharding”

I’ve been fortunate enough in my career to be involved and lead some very large projects.  Scalability and HA are two topics I deal with on a virtually endless basis.  As most of the services, systems and platforms I’ve been involved in and responsible for are deployed for service providers, scalability tends to exist in a realm unlike other industries.   An average enterprise can usually use off the shelf products and ideas for their systems, the scalability demands of an on-line service provider means “shrink wrapped” solutions seldom scale to the levels needed.  There aren’t many “regular” companies that have web applications that need to scale to hundreds of thousands, millions, or hundreds of millions of users.  As with anything there are exceptions.  Add the fact that the application itself IS the actual business of  “on-line service providers” and you’ve got a complex environment where scalability and high availability are paramount concerns and challenges.

Techniques to address some of these issues have pretty much become commodities, load balancing a web serving or application serving farm for example is pretty much a done deal.  Other aspects of the application, particularly databases and storage tend to offer more significant scalability challenges than their web serving or non-stateful brethren.  The key here is the stateful nature of database and storage.  

I tend to shy away from separating databases and file systems, at the end of the day a file system is a database, it might not be relational, that is coming, but it is a database nonetheless.  I believe scalability in the area of database and enterprise storage is where most applications make or break their scalability goals, where the rubber meets the road so to speak.  The current buzz in database scalability revolves around database sharding; the idea is pretty simple.  Take the database tables and split them into shards or peices ideally across multiple database servers and ideally have the database shards reside on different storage systems.   There are many ways to “shard” your database, at the application level, with a database proxy (Mysql Proxy/Hscale), or depending on database you are using, within the database itself.  

Interestingly database sharding is not unlike the techniques used to scale storage.  Typical enterprise storage system use controllers with disk systems connected to them, with throughput limited to the capability of the controller and number of disks connected.  However this design has become pretty much old hat.  New systems and projects have hit the market in the last few years allowing for multiple controllers to handle a portion of the overall filesystem.  In other words parts of the data is being served by different parts of the infrastructure, what I like to call storage sharding.  Companies like Isilon, Netapp, EMC’s Atmos, and HP’s PolyServe and others all have or are rolling out systems that distribute the “file system” across multiple controllers, effectively “sharding” the file system.  I won’t get into the specifics of how they’re doing this, as with everyting they all have their own methods, pros and cons.

There is more to come in this space, particularly at the file system level.  Many of the challenges I’ve faced in the past tend to involve the underlying file system itself, more specifically the file system metadata.  Typically a file system is written to provide something to everyone and isn’t necessarily developed for your specific application, but what if you could tailor and pull information from a file system for your specific application, a sort of “file system” API?  That would be game changing, and I assure you, it is coming.

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: