Digg site admin Ron Gorodetzky discusses “how Digg has optimized its architecture to handle 26 million unique visitors a month.” Database management is the key, and those thumbnails are a bigger challenge than you might expect.
That slavery is also a heavy focus of optimization at Digg. They site is based on a LAMP stack, with MySQL and PHP doing most of the heavy lifting, said Gorodetzky. Since he's been there right from the start, Gorodetzky had to work through the typical growing pains, particularly those related to MySQL replication.
“The first pain point we hit was just database stuff. The first thing you'll notice is when you start to grow these queries, the database can't commit as much time to committing a certain query as it used to,” said Gorodetzky. “You'll find the normal things that work, suddenly don't. You'll find that, one day, you'll see a spike in your graphs telling you that something's going slower. Once you do that, you get to the point where the database part is as fast as it can be, you cache things. You scale out your Web server so you have more resources there, generally caching and doing less work per request.”
Another pain point along the road to success came when Digg decided to host images and video links accompanied by thumbnails of the linked visuals. Hosting all those thumbnails was actually a difficult problem to solve, said Gorodetzky.
“How do you deal with images? You can't just use NFS. We use MogileFS for that,” said Gorodetzky. The MogileFS was originally created to run the file systems behind LiveJournal, but the Digg team has found it to be capable of scaling horizontally, something they had trouble achieving with NFS.