Thursday, August 5, 2010

Index Removal from Major Search Engines like Google, Bing and Yahoo Search

Indices…

First to understand on what the index is all about, every search query on search engines actually scans every web page document exposed on the internet to fetch the relevant results that would require considerable time and computing power. Just to optimize this behavior for speedy retrieval and improving the performance on finding relevant documents for a search query, major search engines collects, parses and stores data by creating index of web documents facilitating faster and accurate information retrieval.

Search Bots…

Secondly, to understand on when the search engines collects those valuable information, every search engine uses something termed as internet “Bots” or “web robots” that are nothing but software applications that run automated tasks over the internet to gather data. Every web server hosting web sites has something called “robots.txt” that controls these bots on whether a particular data can be indexed or not.

Control Bots for Indices…

In order to include specific pages/folders to be indexed, we can use the ‘ALLOW’ directive within robots.txt that will be supported by most standard search engines like Google, Bing, Yahoo etc, but the order of execution typically changes for different search providers other than Google

For allowing sub-folders or pages within the disallowed folders, place the ‘ALLOW’ directive before the ‘DISALLOW’ directive.

Also most of the search engines supports pattern matching for these directives providing more control.

Example: Robots.txt

User-Agent: *
Allow: /allowedcontext/
Disallow: /contentNotAllowed/

Example (with pattern matching): Robots.txt

User-Agent: *
Allow: /allowedcontext*/
Disallow: / contentNotAllowed */

We need to identify the list of allowed contexts and disallowed contexts to correctly create the robots.txt file in order to allow/disallow the pages being indexed.

URL Index Removal…

But even though we decided to prevent the bots from storing our content, how do we remove if the content was already indexed before we had implemented robots.txt. Now when we say about removing the index from search engines, every search engine follows a process to get the content to be removed from their indices. Thought of writing this blog on how to place request for removing them from major search engines like Google, Bing and Yahoo searches

In order to remove the indices from Bing, Yahoo or Google, we need to

  • Authenticate oneself as the site owner
  • Submit a removal request with the Search Engine Provider

Following are the steps to place a URL removal request for major search engines like Google Search, Yahoo Search and Bing Search

Google - URL Index Removal

  • Login to web master tool - www.google.com/webmasters/tools using a Google account
  • Add the site for which we request the Index Removal
  • Google would verify if we are the site owners
  • In order to authenticate ourselves as the Site owner, we have to perform one of the following
    • Uploading a verification file to our site
      • Download the verification HTML
      • Place it in the root folder of the site
      • Click on “Verify” and allow the site to be authenticated. May be it takes 24 hrs
    • Adding a Meta Tag to the home page
      • Copy the Meta Tag within the HEADER section
      • Click on “Verify” and allow the site to be authenticated. May be it takes 24hrs
  • Go to web page removals section - www.google.com/webmasters/tools/removals
  • Click on “New Removal Request” button
  • Choose “Information or image that appears in the Google search results.” and click next
  • Choose “The site owner has removed this page/image or blocked it from being indexed by using robots.txt or meta tags” and click next
  • Now type the targeted indexed URL that is appearing on the Google search results.
  • Click on “Submit Request >>” button

Yahoo - URL Index Removal

  • Login to Yahoo! Site Explorer - https://siteexplorer.search.yahoo.com/mysites using a Yahoo account
  • Add the site for which we request the Index Removal
  • Yahoo would verify if we are the site owners
  • In order to authenticate ourselves as the Site owner, we have to perform one of the following
    • Uploading a verification file to our site
      • Download the verification HTML
      • Place it in the root folder of the site
      • Click on “Ready To Authenticate” and allow the site to be authenticated. May be it takes 24 hrs
    • Adding a Meta Tag to the home page
      • Copy the Meta Tag
      • Click on “Ready To Authenticate” and allow the site to be authenticated. May be it takes 24 hrs
  • Locate the site in Explorer results. Notice the Delete URL/Path button next to each URL. Note: When you use Site Explorer to delete a URL/Path from the Yahoo! index, it deletes that URL as well as all the sub-paths listed under that URL
  • Click Delete URL/Path to go to the confirmation page. The confirmation page shows the total number of sub-path URLs that will be affected as a result of that Delete URL/Path action and also displays a list those URLs
  • Use the input text box to edit the URL and limit the delete action to a specific subdirectory. Click Update to regenerate the list of URLs that will be affected by the delete action. Example of deleting multiple URLs: Use the Delete URL/Path option to remove multiple URLs by truncating the URL string back to the trailing ? mark, then clicking the Update button. To remove URLs that look like this: http://example.com/test/index.php?test=foo&user=john, truncate the URL back to the ? mark: http://example.com/test/index.php?
  • Click the Update button to regenerate the list of URLs that will be affected by the delete action
  • Click Yes to delete the URL and any sub-paths listed

Bing - URL Index Removal

  • Go to - http://www.bing.com/webmaster/WebmasterManageSitesPage.aspx using a MSN account
  • Add the site for which we request the Index Removal
  • Bing would verify if we are the site owners
  • In order to authenticate ourselves as the Site owner, we have to perform one of the following
    • Uploading a verification file to our site
      • Download the verification XML
      • Place it in the root folder of the site
      • Click on site name from the list and the site will be authenticated.
    • Adding a Meta Tag to the home page
      • Copy the Meta Tag within the HEADER section
      • Click on site name from the list and the site will be authenticated
  • Launch the Live Search Support form. Go to https://support.discoverbing.com/default.aspx?mkt=en-us&productkey=bingcontentremoval&brand=&&ct=eformts and begin filling it out
  • Identify from where in Live Search you want the URL removed. To quickly remove a URL, select Content Removal Request from the form’s drop-down list. Select one of these resulting options for removal:
  • Remove my content. If you want the URL removed from the SERP, select this option. This is a permanent removal. Should you want this URL indexed again in the future, you will need to fill out a Content Inclusion Request from the same support form.
  • Cache removal. If you just want the cached page removed, use this option. Note that this will not remove the URL from our index.
  • Complete the rest of the form. Submit the URL or URLs to be removed, the query used to find the URL, complete the rest of the form, and then click Submit
Reference Links:

1 comment:

  1. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!. How do i remove a url from google search results

    ReplyDelete