22 July 2009

The Google File System ( GFS )


Google File System (GFS) is a file system developed by Sanjay Ghemawat, Shun-Tak Leung and Urs Holzle for Google to handle rapid growth of the company’s infrastructure and to suit their needs. The company was looking for a way to improve the stability and reliability of their services and the only reasonable way to do that was to build their own file management system as no other system was able to handle massive number of requests over huge amount of servers.

The GFS supports gigantic user population combined with very cheap computer equipment that tends to break down quite often. From the beginning of Google, the company used cheap computers running Linux to maintain the need for storage space for information gathered by web crawlers (and other services, like GMail, etc.). It caused problems with reliability of service and low efficiency. They had to create reliable software to run over unreliable hardware. It was a total change to the economy of the IT companies.

In 2003 Google published a paper on their Google File System (GFS) at SOSP, the Symposium on Operating Systems Principles. This is the same venue at which Amazon published their Dynamo work, albeit four years earlier. One of the lecturers in my group tells me that SOSP is a venue where "interesting" is rated highly as a criterion for acceptance, over other more staid conferences. So what, if anything, was interesting about GFS? Read on for some details...

Filesystems

Filesystems are a integral component of most operating systems that mediate access to persistently stored data. When you save a file to a magnetic hard disk, it gets recorded as a logical series of 1s and 0s, amongst a sea of other files similarly represented. The filesystem is the part of the operating system that makes sense of these 1s and 0s and is able to recover the structure in terms of files and folders that was present when the data were written.

Of course this, as any other one paragraph introduction to such a broad subject must be, is a simplification. Filesystems don't tend to deal in 1s and 0s, that would be far too messy. Instead they are supported by a device driver that typically speaks in terms of blocks - large chunks of data - which takes care of actually writing the blocks to disk. But the filesystem is responsible for determining what those blocks 'mean' in the context of files and folders.

Different filesystems have different characteristics. Some filesystems are extremely simple, lightweight and fast. Others are more complex, but can recover from some disk corruption. Some are more appropriate to workstation usage patterns, whereas some are tailored towards, say, the requirements of a disk-backed SQL database.

What is GFS?

GFS as I described above is storage system which allows main functionality of storage, processing search and retrieval and in case of failure should be able to self correct it in order to retrieve huge amount of data which Google’s paper say is in terms of Multi-GB.

How GFS Work?

According to Google Lab’s Jeffery Dean, The idea behind GFS is to store data reliably even in the presence of unreliable machines. The GFS system works on master slave module. There is one machine which acts as server and several other which are slaves or node.

The master is responsible to keep track of “which data is stored on which machine” , called as meta data ( data about data). The GFS is said to maintain 3 copies of any data or file including executable. The meta data resides on the main memory of Master , i.e on ram, thus allowing faster access.GFS was designed to store huge amount of data, till current date,

The largest Google Cluster ( Cluster is group of computers together as network) stores hundreds of terrabytes of memory across thousands of disks.

How Data Failure and tolerance works in GFS ?

In GFS, The master server handles all the requests , which in turn transfer the request to exact location of data to one or more nodes or slaves. In case the request takes more time than alloted the system switches to a backup-copy ( the reason of maintaining 3 copies!!).

In case of one of the slaves or nodes fail, its Master which is responsible to maintain the count of nodes either by reallocating ro some other machine or creating a duplicate copy. Although there is only single master “active” at any given point, the state of master ( log of what the master has been doing) is present on other machines too.

Thus in case of failure of master , another machine which knows what the failed master was doing takes the position and keeps the work on move.This is an overview of how GFS works. For more technical details look at GFS Papers.

Conclusion

The GFS implementation we’ve looked offers many winning attributes. These include:

  • Availability. Triple redundancy (or more if users choose), pipelined chunk replication, rapid master failovers, intelligent replica placement, automatic re-replication, and cheap snapshot copies. All of these features deliver what Google users see every day: datacenter-class availability in one of the world’s largest datacenters.
  • Performance. Most workloads, even databases, are about 90% reads. GFS performance on large sequential reads is exemplary. It was child’s play for Google to add video download to their product set, and I suspect their cost-per-byte is better than YouTube or any of the other video sharing services.
  • Management. The system offers much of what IBM calls “autonomic” management. It manages itself through multiple failure modes, offers automatic load balancing and storage pooling, and provides features, such as the snapshots and 3 day window for dead chunks to remain on the system, that give management an extra line of defense against failure and mistakes. I’d love to know how many sysadmins it takes to run a system like this.
  • Cost. Storage doesn’t get any cheaper than ATA drives in a system box.

Yet as a general purpose commercial product, it suffers some serious shortcomings.

  • Performance on small reads and writes, which it wasn’t designed for, isn’t good enough for general data center workloads.
  • The record append file operation and the “relaxed” consistency model, while excellent for Google, wouldn’t fit many enterprise workloads. It might be that email systems, where SOX requirements are pushing retention, might be redesigned to eliminate deletes. Since appending is key to GFS write performance in a multi-writer environment, it might be that GFS would give up much of its performance advantage even in large serial writes in the enterprise.
  • Lest we forget, GFS is NFS, not for sale. Google must see its infrastructure technology as a critical competitive advantage, so it is highly unlikely to open source GFS any time soon.

1 comments:

The blog post venue at which Amazon published their Dynamo work, albeit four years earlier. One of the lecturers in my group - was completely unbelievable! Lot of great knowledge which can be useful in some or the other way,

Post a Comment

Drop in Your Comments, Problems, Suggestions, Praise, Complains or just anything.

We are always excited to hear from you.

Don't post rude or nasty comments. Ethnic slurs, personal insults and abuses are rather uncool. Criticize, but know where to draw the line.

 
Related Posts Plugin for WordPress, Blogger...