In a Google Webmaster blog post, Google explained the algorithm behind its Google Image Search. "The images you see in our search results come from publishers of all sizes -- bloggers, media outlets, stock photo sites -- who have embedded these images in their HTML pages. Google can index image types formatted as BMP, GIF, JPEG, PNG and WebP, as well as SVG," posted Gary Illyes, Webmaster Trends Analyst.
But how does Google know that the images are about coffee and not about tea? "When our algorithms index images, they look at the textual content on the page the image was found on to learn more about the image. We also look at the page's title and its body; we might also learn more from the image's filename, anchor text that points to it, and its "alt text;" we may use computer vision to learn more about the image and may also use the caption provided in the Image Sitemap if that text also exists on the page," explained Illyes.
Illyes notes that webmaster should check the following to help Google index images: "we can crawl both the HTML page the image is embedded in, and the image itself; the image format should be: BMP, GIF, JPEG, PNG, WebP or SVG."In addtion to the above, the following is recommend:
- "image filename is related to the image's content;
- alt attribute of the image describes the image in a human-friendly way;
- and finally, the HTML page's textual contents as well as the text near the image are related to the image."
Explaining the SafeSearch, Illyes writes, "Our algorithms use a great variety of signals to decide whether an image -- or a whole page, if we're talking about Web Search -- should be filtered from the results when the user's SafeSearch filter is turned on. SafeSearch algorithms also look at simpler things such as where the image was used previously and the context in which the image was used."
One of the strongest signals, however, is self-marked adult pages.
Illyes recommends that webmasters who publish adult content should mark up their pages with one of the following meta tags:
<meta name="rating" content="adult" /> <meta name="rating" content="rta-5042-1996-1400-1577-rta">
If you think your images or pages are mistakenly being filtered by SafeSearch, please let us know using the following form.