Blekko, the slashtag search engine has now integrated the up-to-date crawl information directly into its SEO tools for web search and SEO data!
"Not only is the data more comprehensive, but there are major improvements in real-time updates to our SEO pages with the release of now updated crawler and indexer that also powers Blekko's search engine index," Blekko writes.
"While we were upgrading our site to handle more traffic, we decided to leverage our highly customizable NoSQL database to make real time access to our crawl publicly available," the company wrote. Adding, "Our "combinator" abstraction proved critical in quickly making the right tradeoffs between crawl throughput and user request latency."
"When it comes to pages crawled, the sweet spot for blekko is a little more than 4 billion pages. To keep our crawl fresh, we update at least 100 million pages each day. As soon as our crawler, Scoutjet, crawls a webpage, users have access to information about it through blekko's SEO product. We want to enable people to see the Internet the way a search engine sees it, especially what the rest of the internet is saying about an url," explains Blekko.
"Scoutjet updates the top ranked starting pages on the Internet around every hour, while other high quality pages are checked at least every week. The continuous updates to blekko's SEO data include page content, meta data, duplicate text, and inbound link counts. Staying up-to-date is as much about forgetting the old as finding the new. So, we eliminate inbound links that are no longer live and duplicate content that is no longer available."