Why are you reading this? Go outside. Do something meaningful with your life.

Thursday, January 29, 2009

Suspicious and Really Suspicious

Google's Safebrowsing Diagnostic page lists "the last time suspicious content was found on this site". But what does "suspicious" mean?

Google's automated malware scanners have been highly accurate with an astonishingly low false-positive rate. Part of that success has been because their definition of "suspicious" actually means "has nasty malware". If the scanners aren't really sure that a site has malware, they won't add it to the malware list. And that's the definition of "suspicious" ("has nasty malware") that Google's Safebrowsing Diagnostic page uses - content bad enough to get a site added to the malware list.

When the scanners do a review of a site to check if it should be removed from the malware list, they use a more stringent definition of "suspicious". If there's any suspicious activity at all then the site will not be removed from the malware list. Often sites have been infected with malware in multiple ways and the scanners need to be sure that it has been thoroughly cleaned up.

Those different definitions of "suspicious" may cause confusion when looking at Google's Safebrowsing Diagnostic page for a site that has been reviewed. The review may have found "suspicious" content that was not "suspicious" enough to have added the site to the malware list - but it is "suspicious" enough to prevent it being removed from the list. Google's Safebrowsing Diagnostic page won't list the date of that review scan.

If you're looking for the status of a malware review, log into Google's Webmaster Tools - the same place you reqested the malware review. It will show whether the review succeeded and will list urls that were still found to be "suspicious".

Friday, January 23, 2009

Cross-site Warnings

Several browsers use data from Google's malware list to protect users. Firefox 3, Chrome and Safari all check sites that users are visiting against Google's list and warn users if they are about to visit a dangerous site. There are some small differences in implementation across browsers that can cause confusion.

All three browsers check the address of the top-level page a user is navigating to. That protects most users in most cases. But, a web page can include content from another web page and if the included content is malicious then users may be exposed. Chrome (and Safari*) check every request against Google's malware list. This means those browsers will protect users even if malicious content from a flagged page is embedded on a non-flagged page.

Although that approach provides better protection for users, it may be confusing for webmasters if content on their site comes from another site. Some users (those with Chrome or Safari) will get warnings even though the webmaster's site is not blacklisted. Because the webmaster's site isn't blacklisted, they won't be able to request a malware review via Google's Webmaster Tools. Fortunately, this situation usually doesn't exist for very long. Google's scanners have already identified the embedded content as malicious but they haven't yet flagged the webmaster's site that includes the dangerous content. As they continue to crawl the internet, the scanners will quickly flag the webmaster's site.

If you're a webmaster in this situation, you'll need to examine all the content you're including from other websites. Look carefully at the warning page that browsers display since it usually includes the name of the domain that caused the problem.

* I can't say for certain exactly how Safari behaves because I haven't seen the code. But based on observation, Safari seems to have adopted the approach of checking every request.

Updated: FF3.5 checks every request against the blacklist and helps better protect users. FF3.5, Chrome and Safari all behave the same now.