Twitter Indexing Average of 2,200 TPS, Serving 1.6B Queries Per Day!

Today, Twitter launched a personalized search experience to help users find the most relevant Tweets, images, and videos. They've completely rebuilt it.Twitter says "To build this product, our infrastructure needed to support two major features: relevance-filtering of search results and the identification of relevant images and photos. Both features leverage a ground-up rewrite of the […]

Today, Twitter launched a personalized search experience to help users find the most relevant Tweets, images, and videos. They've completely rebuilt it.

Twitter says "To build this product, our infrastructure needed to support two major features: relevance-filtering of search results and the identification of relevant images and photos. Both features leverage a ground-up rewrite of the search infrastructure, with Blender and Earlybird at the core."

Twitter detailed the project by going back into history of Twitter Search, which evolved from the Summize purchase in 2008.

"Twitter detailed some of this last October. But it wasn't until this past April that they were able to replace the old Ruby on Rails front-end with the newly-built Blender. At the time, Twitter said this made search 3x faster and gave them 10x throughput. This's important since they're now seeing 2,200 tweets-per-second on average and serving up 18,000 queries per second -1.6 billion queries per day. That's up from 1 billion last Ocotober."

But that's still mainly back-end talk. The key to today's search announcements are what is now being surfaced on the front-end. "Blender completed the infrastructure necessary to make the most significant user-facing change to Twitter search since the acquisition of Summize," Twitter writes.

Twitter now has a "Most relevant" tab on the search results page. And while at first glance it may seem that this is simply searching your contacts' tweets (something that is long overdue) and displaying them in reverse chronological order, there's actually a lot more going on. At its most basic, here's how Twitter says to think about it: "Often, users are interested in only the most memorable Tweets or those that other users engage with. In our new search experience, we show search results that are most relevant to a particular user. So search results are personalized, and we filter out the Tweets that don't resonate with other users."

Twitter cites three key types of signals they're looking for:

  • Static signals, added at indexing time
  • Resonance signals, dynamically updated over time
  • Information about the searcher, provided at search time

Based on these, a "personal relevance score" is computed for each tweet. "The highest-ranking, most-recent Tweets are returned to the Blender, which merges and re-ranks the results before returning them to the user," Twitter notes.

Duplicates tweets are now removed. This has been a huge issue with Twitter search in the past. "To remove duplicates we use a technique based on MinHashing, where several signatures are computed per Tweet and two Tweets sharing the same set of signatures are considered duplicates," Twitter said.

Personalization: Twitter is most powerful when you personalize it by choosing interesting accounts to follow, so why shouldn't your search results be more personalized too? They are now! Our ranking function accesses the social graph and uses knowledge about the relationship between the searcher and the author of a Tweet during ranking. Although the social graph is very large, we compress the meaningful part for each user into a Bloom filter, which gives us space-efficient constant-time set membership operations. As Earlybird scans candidate search results, it uses the presence of the Tweet's author in the user's social graph as a relevance signal in its ranking function.

Even users that follow few or no accounts will benefit from other personalization mechanisms; for example, we now automatically detect the searcher's preferred language and location.

Finally, Twitter has begun surfacing images and videos for searches. Right now, these are shown in the right-side pane when a search is done on twitter.com. Because they're different from text-based tweets, these queries have to be handled differently.

In the end, Twitter says that in they'll improve quality, scale our infrastructure, expand our indexes, and bring relevance to mobile.

[Source: Twitter Engineering]