Today’s Korea Herald has an interesting article about Korean websites. Believing that it’s safer to keep foreign search engines from indexing all the content on their site, many Korean websites use robots.txt files to block search engine bots from indexing a portion (or all) of their site. The robots.txt files are read by search engine bots as they index websites and the data inside this file lists which pages should not be indexed.
The complication is that many websites, such as tourism, university, and government websites, are blocking potentially good traffic to their sites. Another problem is that some of these sites believe they are securing sensitive information when in fact this information should be password protected. And finally, if the robots.txt file is not written correctly, a site may not be indexed at all and could lose potential traffic.
Here’s an example of the robots.txt file at: http://cafe.naver.com/robots.txt
-
User-Agent: *
Disallow: /CafeRankingSectionList.nhn
Disallow: /SectionTagList.nhn
Disallow: /*ArticleList.nhn
Disallow: /*BestArticleList.nhn
Disallow: /CafeHistoryView.nhn
Disallow: /*ArticleRead.nhn
Disallow: /SectionSearch
Disallow: /CafeScrapContent.nhn
Disallow: /*CafeMemberNetworkView.nhn
Disallow: /*MemoList.nhn
Disallow: /ArticleSearchList.nhn
Disallow: /CafeRankingList.nhn
Disallow: /RandomPowerCafeList.nhn
Disallow: /GroupPowerCafeList.nhn
Disallow: /SectionHome.nhn
Disallow: /CombinationSearch.nhn
Disallow: /CafeSearch.nhn
Disallow: /ArticleSearch.nhn
Disallow: /ManagerSearch.nhn
Disallow: /SocialAppList.nhn
Disallow: /SocialAppMain.nhn
Disallow: /CafeMemberNetworkSocialAppActivityList.nhn
Disallow: /MyCafeIntro.nhn?clubid=11038280
Disallow: /PageRead.nhn
Disallow: /PageDiff.nhn
Disallow: /OpenGraphArticle.nhn
Disallow: /MailContent.nhn