Why are you reading this? Go outside. Do something meaningful with your life.

Monday, November 3, 2008

Kick Them All Out!

On the Google Webmaster Central blog, Ben asked why sites with malware aren't removed from search results.

Users often use Google to search for sites they already know about. Urls are hard to remember but users can remember a couple of search terms and use those instead - like "barak obama" or "john mccain". If a user typed either of those queries, they'd expect the candidate's official site to rank near the top. But malware can strike any website: big or small; Democrat or Republican; corporation, government or private citizen. If Google removed the candidate's website from search results, there would be a lot of confused users. (1)

By leaving the website in the search results and placing a warning on it, users can find what they are looking for but know that visiting the website may be dangerous.

(1) I am not suggesting that either candidate's website has ever had malware. It's just an example.

Update: Yes, yes, I misspelled "Barack". Fortunately, Google still finds the right site: http://www.google.com/search?q=barak+obama

Sunday, November 2, 2008

Information

Webmasters often ask why Google can't give more information about malware on websites. A couple of factors come into play.

Google's automated scanners only see the end result of the infection. They have no way of knowing how or exactly when the malicious content was added to a website. There are many ways it could have happened, from sql injection to stolen passwords. The automated scanners just know the bad content is there now.

To some extent, the success of Google's automated scanners relies on the secrecy of the methods used. The team has published malware papers (like Ghost in the Browser [pdf]) that include overviews of the system. But we're uncomfortable publishing any more detail because malware authors can read that information too. We've observed malware trying to hide from our automated systems and a large part of my job is tweaking the system to adapt to new types of malware. Staying on top of what malware is doing isn't easy and providing exact descriptions of the scanners and what they detect would make it impossible.

Earlier this year Google released the Safe Browsing diagnostic page which list general information about what Google's automated scanners found on a website. Some webmasters have found that information helpful in cleaning up their site. Hopefully we can find other ways to share information with webmasters without compromising the success of the scanning system. If you've got specific ideas, please let me know!