{"id":231,"date":"2018-06-14T19:05:00","date_gmt":"2018-06-14T19:05:00","guid":{"rendered":"https:\/\/pressbooks.ccconline.org\/bus3060\/chapter\/ch14-2\/"},"modified":"2026-02-17T19:23:57","modified_gmt":"2026-02-17T19:23:57","slug":"ch14-2","status":"publish","type":"chapter","link":"https:\/\/pressbooks.ccconline.org\/bus3060\/chapter\/ch14-2\/","title":{"raw":"14.2 Understanding Search","rendered":"14.2 Understanding Search"},"content":{"raw":"<div id=\"slug-14-2-understanding-search\" class=\"chapter standard\">\r\n<div class=\"ugc chapter-ugc\">\r\n<div id=\"fwk-38086-ch08_s02_n01\" class=\"bcc-box bcc-highlight\">\r\n<div class=\"textbox textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><span style=\"font-family: 'Cormorant Garamond', serif; font-size: 1em; font-style: normal; font-weight: bold;\">Learning Objectives<\/span><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"fwk-38086-ch08_s02_p01\" class=\"nonindent para\">After studying this section you should be able to do the following:<\/p>\r\n\r\n<ol id=\"fwk-38086-ch08_s02_l01\" class=\"orderedlist\">\r\n \t<li>Understand the mechanics of search, including how Google indexes the Web and ranks its organic search results.<\/li>\r\n \t<li>Examine the infrastructure that powers Google and how its scale and complexity offer key competitive advantages.<\/li>\r\n<\/ol>\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\n<\/div>\r\n<p id=\"fwk-38086-ch08_s02_p02\" class=\"nonindent para editable block\">Before diving into how the firm makes money, let\u2019s first understand how Google\u2019s core service, search, works.<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p03\" class=\"indent para editable block\">Perform a search (or <span class=\"margin_term\"><a class=\"glossterm\">query<\/a><\/span>) on Google or another search engine, and the results you\u2019ll see are referred to by industry professionals as <span class=\"margin_term\"><a class=\"glossterm\">organic or natural search<\/a><\/span>. Search engines use different algorithms for determining the order of organic search results, but at Google the method is called <span class=\"margin_term\"><a class=\"glossterm\">PageRank<\/a><\/span> (a bit of a play on words, it ranks Web pages, and was initially developed by Google cofounder Larry Page). Google does not accept money for placement of links in organic search results. Instead, PageRank results are a kind of popularity contest. Web pages that have more pages <em class=\"emphasis\">linking to them<\/em> are ranked higher.<\/p>\r\n\r\n<div id=\"fwk-38086-ch08_s02_f01\" style=\"text-align: center; font-size: .8em; max-width: 497px;\">\r\n<p class=\"nonindent title\"><span class=\"title-prefix\">Figure 14.4<\/span><\/p>\r\n<p class=\"indent\"><a>\r\n<img style=\"max-width: 497px;\" src=\"https:\/\/pressbooks.ccconline.org\/wp-content\/uploads\/sites\/324\/2018\/06\/096d08b8103c60faaf7291f79c8ca725.jpg\" alt=\"The query for \u201cToyota Prius\u201d triggers organic search results, flanked top and right by sponsored link advertisements. Screen shot of said search.\" \/>\r\n<\/a><\/p>\r\n<p class=\"indent para\">The query for \u201cToyota Prius\u201d triggers organic search results, flanked top and right by sponsored link advertisements.<\/p>\r\n\r\n<\/div>\r\n<p id=\"fwk-38086-ch08_s02_p04\" class=\"indent para editable block\">The process of improving a page\u2019s organic search results is often referred to as <span class=\"margin_term\"><a class=\"glossterm\">search engine optimization (SEO)<\/a><\/span>. SEO has become a critical function for many marketing organizations since if a firm\u2019s pages aren\u2019t near the top of search results, customers may never discover its site.<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p05\" class=\"indent para editable block\">Google is a bit vague about the specifics of precisely how PageRank has been refined, in part because many have tried to game the system. In addition to in-bound links, Google\u2019s organic search results also consider some two hundred other signals, and the firm\u2019s search quality team is relentlessly analyzing user behavior for clues on how to tweak the system to improve accuracy (Levy, 2010). The less scrupulous have tried creating a series of bogus Web sites, all linking back to the pages they\u2019re trying to promote (this is called <span class=\"margin_term\"><a class=\"glossterm\">link fraud<\/a><\/span>, and Google actively works to uncover and shut down such efforts). We do know that links from some Web sites carry more weight than others. For example, links from Web sites that Google deems as \u201cinfluential,\u201d and links from most \u201c.edu\u201d Web sites, have greater weight in PageRank calculations than links from run-of-the-mill \u201c.com\u201d sites.<\/p>\r\n\r\n<div id=\"fwk-38086-ch08_s02_n02\" class=\"bcc-box bcc-highlight\">\r\n<div class=\"textbox shaded\">\r\n<h4 class=\"title\">Spiders and Bots and Crawlers\u2014Oh My!<\/h4>\r\n<p id=\"fwk-38086-ch08_s02_p06\" class=\"nonindent para\">When performing a search via Google or another search engine, you\u2019re not actually searching the Web. What really happens is that the major search engines make what amounts to a <em class=\"emphasis\">copy<\/em> of the Web, storing and indexing the text of online documents on their own computers. Google\u2019s index considers over one trillion URLs (Wright, 2009). The upper right-hand corner of a Google query shows you just how fast a search can take place (in the example above, rankings from over eight million results containing the term \u201cToyota Prius\u201d were delivered in less than two tenths of a second).<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p07\" class=\"indent para\">To create these massive indexes, search firms use software to crawl the Web and uncover as much information as they can find. This software is referred to by several different names\u2014<span class=\"margin_term\"><a class=\"glossterm\">software robots, spiders, Web crawlers<\/a><\/span>\u2014but they all pretty much work the same way. In order to make its Web sites visible, every online firm provides a list of all of the public, named servers on its network, known as <span class=\"margin_term\"><a class=\"glossterm\">domain name service (DNS)<\/a><\/span> listings. For example, Yahoo! has different servers that can be found at http:\/\/www.yahoo.com, sports.yahoo.com, weather.yahoo.com, finance.yahoo.com, and so on. Spiders start at the first page on every public server and follow every available link, traversing a Web site until all pages are uncovered.<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p08\" class=\"indent para\">Google will crawl frequently updated sites, like those run by news organizations, as often as several times an hour. Rarely updated, less popular sites might only be reindexed every few days. The method used to crawl the Web also means that if a Web site isn\u2019t the first page on a public server, or isn\u2019t linked to from another public page, then it\u2019ll never be found<sup>1<\/sup>. Also note that each search engine also offers a page where you can submit your Web site for indexing.<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p09\" class=\"indent para\">While search engines show you what they\u2019ve found on their <em class=\"emphasis\">copy<\/em> of the Web\u2019s contents; clicking a search result will direct you to the actual Web site, not the copy. But sometimes you\u2019ll click a result only to find that the Web site doesn\u2019t match what the search engine found. This happens if a Web site was updated before your search engine had a chance to reindex the changes. In most cases you can still pull up the search engine\u2019s copy of the page. Just click the \u201cCached\u201d link below the result <span class=\"margin_term\"><a class=\"glossterm\">cache<\/a><\/span>\u00a0refers to a temporary storage space used to speed computing tasks).<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p10\" class=\"indent para\">But what if you want the content on your Web site to remain off limits to search engine indexing and caching? Organizations have created a set of standards to stop the spider crawl, and all commercial search engines have agreed to respect these standards. One way is to put a line of <em class=\"emphasis\">HTML code<\/em> invisibly embedded in a Web site that tells all software robots to stop indexing a page, stop following links on the page, or stop offering old page archives in a cache. Users don\u2019t see this code, but commercial Web crawlers do. For those familiar with HTML code (the language used to describe a Web site), the command to stop Web crawlers from indexing a page, following links, and listing archives of cached pages looks like this:<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p11\" class=\"indent para\">\u2329META NAME=\u201cROBOTS\u201d CONTENT=\u201cNOINDEX, NOFOLLOW, NOARCHIVE\u201d\u232a<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p12\" class=\"indent para\">There are other techniques to keep the spiders out, too. Web site administrators can add a special file (called robots.txt) that provides similar instructions on how indexing software should treat the Web site. And a lot of content lies inside the \u201c<span class=\"margin_term\"><a class=\"glossterm\">dark Web<\/a><\/span>,\u201d either behind corporate firewalls or inaccessible to those without a user account\u2014think of private Facebook updates no one can see unless they\u2019re your friend\u2014all of that is out of Google\u2019s reach.<\/p>\r\n\r\n<\/div>\r\n&nbsp;\r\n\r\n<\/div>\r\n<div id=\"fwk-38086-ch08_s02_n03\" class=\"bcc-box bcc-highlight\">\r\n<div class=\"textbox shaded\">\r\n<div id=\"slug-14-2-understanding-search\" class=\"chapter standard\">\r\n<div class=\"ugc chapter-ugc\">\r\n<div id=\"fwk-38086-ch08_s02_n03\" class=\"bcc-box bcc-highlight\">\r\n<h4 class=\"title\">What\u2019s It Take to Run This Thing?<\/h4>\r\n<p id=\"fwk-38086-ch08_s02_p13\" class=\"nonindent para\">Sergey Brin and Larry Page started Google with just four scavenged computers (Liedtke, 2008). But in a decade, the infrastructure used to power the search sovereign has ballooned to the point where it is now the largest of its kind in the world (Carr, 2006). Google doesn\u2019t disclose the number of servers it uses, but by some estimates, it runs over 1.4 million servers in over a dozen so-called <span class=\"margin_term\"><a class=\"glossterm\">server farms<\/a><\/span> worldwide (Katz, 2009). In 2008, the firm spent $2.18 billion on capital expenditures, with data centers, servers, and networking equipment eating up the bulk of this cost<sup>2<\/sup>. Building massive server farms to index the ever-growing Web is now the cost of admission for any firm wanting to compete in the search market. This is clearly no longer a game for two graduate students working out of a garage.<\/p>\r\n<p class=\"indent simpara\">Google\u2019s Container Data Center<\/p>\r\n\r\n<\/div>\r\n<p class=\"nonindent para\">Take a virtual tour of one of Google\u2019s data centers.<\/p>\r\n\r\n<\/div>\r\n<p id=\"fwk-38086-ch08_s02_p14\" class=\"indent para\">The size of this investment not only creates a barrier to entry, it influences industry profitability, with market-leader Google enjoying huge economies of scale. Firms may spend the same amount to build server farms, but if Google has nearly 70 percent of this market (and growing) while Microsoft\u2019s search draws less than one-seventh the traffic, which do you think enjoys the better return on investment?<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p15\" class=\"indent para\">The hardware components that power Google aren\u2019t particularly special. In most cases the firm uses the kind of Intel or AMD processors, low-end hard drives, and RAM chips that you\u2019d find in a desktop PC. These components are housed in rack-mounted servers about 3.5 inches thick, with each server containing two processors, eight memory slots, and two hard drives.<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p16\" class=\"indent para\">In some cases, Google mounts racks of these servers inside standard-sized shipping containers, each with as many as 1,160 servers per box (Shankland, 2009). A given data center may have dozens of these server-filled containers all linked together. Redundancy is the name of the game. Google assumes individual components will regularly fail, but no single failure should interrupt the firm\u2019s operations (making the setup what geeks call <span class=\"margin_term\"><a class=\"glossterm\">fault-tolerant<\/a><\/span>). If something breaks, a technician can easily swap it out with a replacement.<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p17\" class=\"indent para\">Each server farm layout has also been carefully designed with an emphasis on lowering power consumption and cooling requirements. And the firm\u2019s custom software (much of it built upon open source products) allows all this equipment to operate as the world\u2019s largest grid computer.<\/p>\r\n<p id=\"fwk-38086-ch08_s02_p18\" class=\"indent para\">Web search is a task particularly well suited for the massively parallel architecture used by Google and its rivals. For an analogy of how this works, imagine that working alone, you need try to find a particular phrase in a hundred-page document (that\u2019s a one server effort). Next, imagine that you can distribute the task across five thousand people, giving each of them a separate sentence to scan (that\u2019s the multi-server grid). This difference gives you a sense of how search firms use massive numbers of servers and the divide-and-conquer approach of grid computing to quickly find the needles you\u2019re searching for within the Web\u2019s haystack. (For more on grid computing, see <strong>Chapter 5 \u201cMoore\u2019s Law: Fast, Cheap Computing and What It Means for the Manager<\/strong>\u201d, and for more on server farms, see <strong>Chapter 10 \u201cSoftware in Flux: Partly Cloudy and Sometimes Free<\/strong>\u201d.)<\/p>\r\n\r\n<div style=\"text-align: center; font-size: .8em; max-width: 640px;\">\r\n<div id=\"fwk-38086-ch08_s02_f02\" class=\"figure large medium-height\">\r\n<p class=\"nonindent title\"><span class=\"title-prefix\">Figure 14.5<\/span><\/p>\r\n<p class=\"indent\"><a>\r\n<img class=\"aligncenter size-full wp-image-229\" src=\"https:\/\/pressbooks.ccconline.org\/wp-content\/uploads\/sites\/324\/2026\/01\/14.2.0.jpg\" alt=\"The Google Search Appliance\" width=\"640\" height=\"480\" \/>\r\n<\/a><\/p>\r\n<p class=\"indent para\">The Google Search Appliance is a hardware product that firms can purchase in order to run Google search technology within the privacy and security of an organization\u2019s firewall.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"fwk-38086-ch08_s02_p19\" class=\"indent para\">Google will even sell you a bit of its technology so that you can run your own little Google in-house without sharing documents with the rest of the world. Google\u2019s line of search appliances are rack-mounted servers that can index documents within a corporation\u2019s Web site, even specifying password and security access on a per-document basis. Selling hardware isn\u2019t a large business for Google, and other vendors offer similar solutions, but search appliances can be vital tools for law firms, investment banks, and other document-rich organizations.<\/p>\r\n\r\n<\/div>\r\n<div id=\"fwk-38086-ch08_s02_n04\" class=\"bcc-box bcc-highlight\">\r\n<h4 class=\"title\"><\/h4>\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n<div class=\"textbox shaded\">\r\n<div id=\"slug-14-2-understanding-search\" class=\"chapter standard\">\r\n<div class=\"ugc chapter-ugc\">\r\n<div id=\"fwk-38086-ch08_s02_n03\" class=\"bcc-box bcc-highlight\">\r\n<h4 class=\"title\">Trendspotting with Google<\/h4>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"fwk-38086-ch08_s02_n04\" class=\"bcc-box bcc-highlight\">\r\n<p id=\"fwk-38086-ch08_s02_p20\" class=\"nonindent para\">Google not only gives you search results, it lets you see aggregate trends in what its users are searching for, and this can yield powerful insights. For example, by tracking search trends for flu symptoms, Google\u2019s Flu Trends Web site can pinpoint outbreaks one to two weeks faster than the Centers for Disease Control and Prevention (Bruce, 2009). Want to go beyond the flu? Google\u2019s Trends, and Insights for Search services allow anyone to explore search trends, breaking out the analysis by region, category (image, news, product), date, and other criteria. Savvy managers can leverage these and similar tools for competitive analysis, comparing a firm, its brands, and its rivals.<\/p>\r\n\r\n<div style=\"text-align: center; font-size: .8em; max-width: 497px;\">\r\n<div id=\"fwk-38086-ch08_s02_f03\" class=\"figure large\">\r\n<p class=\"nonindent title\"><span class=\"title-prefix\">Figure 14.6<\/span><\/p>\r\n<p class=\"indent\"><a>\r\n<img style=\"max-width: 497px;\" src=\"https:\/\/pressbooks.ccconline.org\/wp-content\/uploads\/sites\/324\/2026\/01\/c7af9e9668195c53e66071e17fbdf87e.jpg\" alt=\"Google Insights for Search can be a useful tool for competitive analysis and trend discovery. The chart above shows a comparison (over a twelve-month period, and geographically) of search interest in the terms Wii, Playstation, and Xbox.\" \/>\r\n<\/a><\/p>\r\n<p class=\"indent para\">Google Insights for Search can be a useful tool for competitive analysis and trend discovery. The chart above shows a comparison (over a twelve-month period, and geographically) of search interest in the terms Wii, Playstation, and Xbox.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"fwk-38086-ch08_s02_n05\" class=\"bcc-box bcc-success\">\r\n<div class=\"textbox textbox--key-takeaways\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><span style=\"font-family: 'Cormorant Garamond', serif; font-size: 1em; font-style: normal; font-weight: bold;\">Key Takeaways<\/span><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul id=\"fwk-38086-ch08_s02_l02\" class=\"itemizedlist\">\r\n \t<li>Ranked search results are often referred to as organic or natural search. PageRank is Google\u2019s algorithm for ranking search results. PageRank orders organic search results based largely on the number of Web sites linking to them, and the \u201cweight\u201d of each page as measured by its \u201cinfluence.\u201d<\/li>\r\n \t<li>Search engine optimization (SEO) is the process of using natural or organic search to increase a Web site\u2019s traffic volume and visitor quality. The scope and influence of search has made SEO an increasingly vital marketing function.<\/li>\r\n \t<li>Users don\u2019t really search the Web; they search an archived copy built by crawling and indexing discoverable documents.<\/li>\r\n \t<li>Google operates from a massive network of server farms containing hundreds of thousands of servers built from standard, off-the-shelf items. The cost of the operation is a significant barrier to entry for competitors. Google\u2019s share of search suggests the firm can realize economies of scales over rivals required to make similar investments while delivering fewer results (and hence ads).<\/li>\r\n \t<li>Web site owners can hide pages from popular search engine Web crawlers using a number of methods, including HTML tags, a no-index file, or ensuring that Web sites aren\u2019t linked to other pages and haven\u2019t been submitted to Web sites for indexing.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\n<\/div>\r\n<div id=\"fwk-38086-ch08_s02_n06\" class=\"bcc-box bcc-info\">\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><span style=\"font-family: 'Cormorant Garamond', serif; font-size: 1em; font-style: normal; font-weight: bold;\">Questions and Exercises<\/span><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ol id=\"fwk-38086-ch08_s02_l03\" class=\"orderedlist\">\r\n \t<li>How do search engines discover pages on the Internet? What kind of capital commitment is necessary to go about doing this? How does this impact competitive dynamics in the industry?<\/li>\r\n \t<li>How does Google rank search results? Investigate and list some methods that an organization might use to improve its rank in Google\u2019s organic search results. Are there techniques Google might not approve of? What risk does a firm run if Google or another search firm determines that it has used unscrupulous SEO techniques to try to unfairly influence ranking algorithms?<\/li>\r\n \t<li>Sometimes Web sites returned by major search engines don\u2019t contain the words or phrases that initially brought you to the site. Why might this happen?<\/li>\r\n \t<li>What\u2019s a cache? What other products or services have a cache?<\/li>\r\n \t<li>What can be done if you want the content on your Web site to remain off limits to search engine indexing and caching?<\/li>\r\n \t<li>What is a \u201csearch appliance?\u201d Why might an organization choose such a product?<\/li>\r\n \t<li>Become a better searcher: Look at the advanced options for your favorite search engine. Are there options you hadn\u2019t used previously? Be prepared to share what you learn during class discussion.<\/li>\r\n \t<li>Visit Google Trends and Google Insights for Search. Explore the tool as if you were comparing a firm with its competitors. What sorts of useful insights can you uncover? How might businesses use these tools?<\/li>\r\n<\/ol>\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\n<\/div>\r\n<p class=\"indent\"><sup>1<\/sup>Most Web sites do have a link where you can submit a Web site for indexing, and doing so can help promote the discovery of your content.<\/p>\r\n<p class=\"indent\"><sup>2<\/sup>Google, \u201cGoogle Announces Fourth Quarter and Fiscal Year 2008 Results,\u201d press release, January 22, 2009.<\/p>","rendered":"<div id=\"slug-14-2-understanding-search\" class=\"chapter standard\">\n<div class=\"ugc chapter-ugc\">\n<div id=\"fwk-38086-ch08_s02_n01\" class=\"bcc-box bcc-highlight\">\n<div class=\"textbox textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><span style=\"font-family: 'Cormorant Garamond', serif; font-size: 1em; font-style: normal; font-weight: bold;\">Learning Objectives<\/span><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"fwk-38086-ch08_s02_p01\" class=\"nonindent para\">After studying this section you should be able to do the following:<\/p>\n<ol id=\"fwk-38086-ch08_s02_l01\" class=\"orderedlist\">\n<li>Understand the mechanics of search, including how Google indexes the Web and ranks its organic search results.<\/li>\n<li>Examine the infrastructure that powers Google and how its scale and complexity offer key competitive advantages.<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<\/div>\n<p id=\"fwk-38086-ch08_s02_p02\" class=\"nonindent para editable block\">Before diving into how the firm makes money, let\u2019s first understand how Google\u2019s core service, search, works.<\/p>\n<p id=\"fwk-38086-ch08_s02_p03\" class=\"indent para editable block\">Perform a search (or <span class=\"margin_term\"><a class=\"glossterm\">query<\/a><\/span>) on Google or another search engine, and the results you\u2019ll see are referred to by industry professionals as <span class=\"margin_term\"><a class=\"glossterm\">organic or natural search<\/a><\/span>. Search engines use different algorithms for determining the order of organic search results, but at Google the method is called <span class=\"margin_term\"><a class=\"glossterm\">PageRank<\/a><\/span> (a bit of a play on words, it ranks Web pages, and was initially developed by Google cofounder Larry Page). Google does not accept money for placement of links in organic search results. Instead, PageRank results are a kind of popularity contest. Web pages that have more pages <em class=\"emphasis\">linking to them<\/em> are ranked higher.<\/p>\n<div id=\"fwk-38086-ch08_s02_f01\" style=\"text-align: center; font-size: .8em; max-width: 497px;\">\n<p class=\"nonindent title\"><span class=\"title-prefix\">Figure 14.4<\/span><\/p>\n<p class=\"indent\"><a><br \/>\n<img decoding=\"async\" style=\"max-width: 497px;\" src=\"https:\/\/pressbooks.ccconline.org\/wp-content\/uploads\/sites\/324\/2018\/06\/096d08b8103c60faaf7291f79c8ca725.jpg\" alt=\"The query for \u201cToyota Prius\u201d triggers organic search results, flanked top and right by sponsored link advertisements. Screen shot of said search.\" \/><br \/>\n<\/a><\/p>\n<p class=\"indent para\">The query for \u201cToyota Prius\u201d triggers organic search results, flanked top and right by sponsored link advertisements.<\/p>\n<\/div>\n<p id=\"fwk-38086-ch08_s02_p04\" class=\"indent para editable block\">The process of improving a page\u2019s organic search results is often referred to as <span class=\"margin_term\"><a class=\"glossterm\">search engine optimization (SEO)<\/a><\/span>. SEO has become a critical function for many marketing organizations since if a firm\u2019s pages aren\u2019t near the top of search results, customers may never discover its site.<\/p>\n<p id=\"fwk-38086-ch08_s02_p05\" class=\"indent para editable block\">Google is a bit vague about the specifics of precisely how PageRank has been refined, in part because many have tried to game the system. In addition to in-bound links, Google\u2019s organic search results also consider some two hundred other signals, and the firm\u2019s search quality team is relentlessly analyzing user behavior for clues on how to tweak the system to improve accuracy (Levy, 2010). The less scrupulous have tried creating a series of bogus Web sites, all linking back to the pages they\u2019re trying to promote (this is called <span class=\"margin_term\"><a class=\"glossterm\">link fraud<\/a><\/span>, and Google actively works to uncover and shut down such efforts). We do know that links from some Web sites carry more weight than others. For example, links from Web sites that Google deems as \u201cinfluential,\u201d and links from most \u201c.edu\u201d Web sites, have greater weight in PageRank calculations than links from run-of-the-mill \u201c.com\u201d sites.<\/p>\n<div id=\"fwk-38086-ch08_s02_n02\" class=\"bcc-box bcc-highlight\">\n<div class=\"textbox shaded\">\n<h4 class=\"title\">Spiders and Bots and Crawlers\u2014Oh My!<\/h4>\n<p id=\"fwk-38086-ch08_s02_p06\" class=\"nonindent para\">When performing a search via Google or another search engine, you\u2019re not actually searching the Web. What really happens is that the major search engines make what amounts to a <em class=\"emphasis\">copy<\/em> of the Web, storing and indexing the text of online documents on their own computers. Google\u2019s index considers over one trillion URLs (Wright, 2009). The upper right-hand corner of a Google query shows you just how fast a search can take place (in the example above, rankings from over eight million results containing the term \u201cToyota Prius\u201d were delivered in less than two tenths of a second).<\/p>\n<p id=\"fwk-38086-ch08_s02_p07\" class=\"indent para\">To create these massive indexes, search firms use software to crawl the Web and uncover as much information as they can find. This software is referred to by several different names\u2014<span class=\"margin_term\"><a class=\"glossterm\">software robots, spiders, Web crawlers<\/a><\/span>\u2014but they all pretty much work the same way. In order to make its Web sites visible, every online firm provides a list of all of the public, named servers on its network, known as <span class=\"margin_term\"><a class=\"glossterm\">domain name service (DNS)<\/a><\/span> listings. For example, Yahoo! has different servers that can be found at http:\/\/www.yahoo.com, sports.yahoo.com, weather.yahoo.com, finance.yahoo.com, and so on. Spiders start at the first page on every public server and follow every available link, traversing a Web site until all pages are uncovered.<\/p>\n<p id=\"fwk-38086-ch08_s02_p08\" class=\"indent para\">Google will crawl frequently updated sites, like those run by news organizations, as often as several times an hour. Rarely updated, less popular sites might only be reindexed every few days. The method used to crawl the Web also means that if a Web site isn\u2019t the first page on a public server, or isn\u2019t linked to from another public page, then it\u2019ll never be found<sup>1<\/sup>. Also note that each search engine also offers a page where you can submit your Web site for indexing.<\/p>\n<p id=\"fwk-38086-ch08_s02_p09\" class=\"indent para\">While search engines show you what they\u2019ve found on their <em class=\"emphasis\">copy<\/em> of the Web\u2019s contents; clicking a search result will direct you to the actual Web site, not the copy. But sometimes you\u2019ll click a result only to find that the Web site doesn\u2019t match what the search engine found. This happens if a Web site was updated before your search engine had a chance to reindex the changes. In most cases you can still pull up the search engine\u2019s copy of the page. Just click the \u201cCached\u201d link below the result <span class=\"margin_term\"><a class=\"glossterm\">cache<\/a><\/span>\u00a0refers to a temporary storage space used to speed computing tasks).<\/p>\n<p id=\"fwk-38086-ch08_s02_p10\" class=\"indent para\">But what if you want the content on your Web site to remain off limits to search engine indexing and caching? Organizations have created a set of standards to stop the spider crawl, and all commercial search engines have agreed to respect these standards. One way is to put a line of <em class=\"emphasis\">HTML code<\/em> invisibly embedded in a Web site that tells all software robots to stop indexing a page, stop following links on the page, or stop offering old page archives in a cache. Users don\u2019t see this code, but commercial Web crawlers do. For those familiar with HTML code (the language used to describe a Web site), the command to stop Web crawlers from indexing a page, following links, and listing archives of cached pages looks like this:<\/p>\n<p id=\"fwk-38086-ch08_s02_p11\" class=\"indent para\">\u2329META NAME=\u201cROBOTS\u201d CONTENT=\u201cNOINDEX, NOFOLLOW, NOARCHIVE\u201d\u232a<\/p>\n<p id=\"fwk-38086-ch08_s02_p12\" class=\"indent para\">There are other techniques to keep the spiders out, too. Web site administrators can add a special file (called robots.txt) that provides similar instructions on how indexing software should treat the Web site. And a lot of content lies inside the \u201c<span class=\"margin_term\"><a class=\"glossterm\">dark Web<\/a><\/span>,\u201d either behind corporate firewalls or inaccessible to those without a user account\u2014think of private Facebook updates no one can see unless they\u2019re your friend\u2014all of that is out of Google\u2019s reach.<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<\/div>\n<div id=\"fwk-38086-ch08_s02_n03\" class=\"bcc-box bcc-highlight\">\n<div class=\"textbox shaded\">\n<div class=\"chapter standard\">\n<div class=\"ugc chapter-ugc\">\n<div class=\"bcc-box bcc-highlight\">\n<h4 class=\"title\">What\u2019s It Take to Run This Thing?<\/h4>\n<p id=\"fwk-38086-ch08_s02_p13\" class=\"nonindent para\">Sergey Brin and Larry Page started Google with just four scavenged computers (Liedtke, 2008). But in a decade, the infrastructure used to power the search sovereign has ballooned to the point where it is now the largest of its kind in the world (Carr, 2006). Google doesn\u2019t disclose the number of servers it uses, but by some estimates, it runs over 1.4 million servers in over a dozen so-called <span class=\"margin_term\"><a class=\"glossterm\">server farms<\/a><\/span> worldwide (Katz, 2009). In 2008, the firm spent $2.18 billion on capital expenditures, with data centers, servers, and networking equipment eating up the bulk of this cost<sup>2<\/sup>. Building massive server farms to index the ever-growing Web is now the cost of admission for any firm wanting to compete in the search market. This is clearly no longer a game for two graduate students working out of a garage.<\/p>\n<p class=\"indent simpara\">Google\u2019s Container Data Center<\/p>\n<\/div>\n<p class=\"nonindent para\">Take a virtual tour of one of Google\u2019s data centers.<\/p>\n<\/div>\n<p id=\"fwk-38086-ch08_s02_p14\" class=\"indent para\">The size of this investment not only creates a barrier to entry, it influences industry profitability, with market-leader Google enjoying huge economies of scale. Firms may spend the same amount to build server farms, but if Google has nearly 70 percent of this market (and growing) while Microsoft\u2019s search draws less than one-seventh the traffic, which do you think enjoys the better return on investment?<\/p>\n<p id=\"fwk-38086-ch08_s02_p15\" class=\"indent para\">The hardware components that power Google aren\u2019t particularly special. In most cases the firm uses the kind of Intel or AMD processors, low-end hard drives, and RAM chips that you\u2019d find in a desktop PC. These components are housed in rack-mounted servers about 3.5 inches thick, with each server containing two processors, eight memory slots, and two hard drives.<\/p>\n<p id=\"fwk-38086-ch08_s02_p16\" class=\"indent para\">In some cases, Google mounts racks of these servers inside standard-sized shipping containers, each with as many as 1,160 servers per box (Shankland, 2009). A given data center may have dozens of these server-filled containers all linked together. Redundancy is the name of the game. Google assumes individual components will regularly fail, but no single failure should interrupt the firm\u2019s operations (making the setup what geeks call <span class=\"margin_term\"><a class=\"glossterm\">fault-tolerant<\/a><\/span>). If something breaks, a technician can easily swap it out with a replacement.<\/p>\n<p id=\"fwk-38086-ch08_s02_p17\" class=\"indent para\">Each server farm layout has also been carefully designed with an emphasis on lowering power consumption and cooling requirements. And the firm\u2019s custom software (much of it built upon open source products) allows all this equipment to operate as the world\u2019s largest grid computer.<\/p>\n<p id=\"fwk-38086-ch08_s02_p18\" class=\"indent para\">Web search is a task particularly well suited for the massively parallel architecture used by Google and its rivals. For an analogy of how this works, imagine that working alone, you need try to find a particular phrase in a hundred-page document (that\u2019s a one server effort). Next, imagine that you can distribute the task across five thousand people, giving each of them a separate sentence to scan (that\u2019s the multi-server grid). This difference gives you a sense of how search firms use massive numbers of servers and the divide-and-conquer approach of grid computing to quickly find the needles you\u2019re searching for within the Web\u2019s haystack. (For more on grid computing, see <strong>Chapter 5 \u201cMoore\u2019s Law: Fast, Cheap Computing and What It Means for the Manager<\/strong>\u201d, and for more on server farms, see <strong>Chapter 10 \u201cSoftware in Flux: Partly Cloudy and Sometimes Free<\/strong>\u201d.)<\/p>\n<div style=\"text-align: center; font-size: .8em; max-width: 640px;\">\n<div id=\"fwk-38086-ch08_s02_f02\" class=\"figure large medium-height\">\n<p class=\"nonindent title\"><span class=\"title-prefix\">Figure 14.5<\/span><\/p>\n<p class=\"indent\"><a><br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-229\" src=\"https:\/\/pressbooks.ccconline.org\/wp-content\/uploads\/sites\/324\/2026\/01\/14.2.0.jpg\" alt=\"The Google Search Appliance\" width=\"640\" height=\"480\" srcset=\"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-content\/uploads\/sites\/324\/2026\/01\/14.2.0.jpg 640w, https:\/\/pressbooks.ccconline.org\/bus3060\/wp-content\/uploads\/sites\/324\/2026\/01\/14.2.0-300x225.jpg 300w, https:\/\/pressbooks.ccconline.org\/bus3060\/wp-content\/uploads\/sites\/324\/2026\/01\/14.2.0-65x49.jpg 65w, https:\/\/pressbooks.ccconline.org\/bus3060\/wp-content\/uploads\/sites\/324\/2026\/01\/14.2.0-225x169.jpg 225w, https:\/\/pressbooks.ccconline.org\/bus3060\/wp-content\/uploads\/sites\/324\/2026\/01\/14.2.0-350x263.jpg 350w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><br \/>\n<\/a><\/p>\n<p class=\"indent para\">The Google Search Appliance is a hardware product that firms can purchase in order to run Google search technology within the privacy and security of an organization\u2019s firewall.<\/p>\n<\/div>\n<\/div>\n<p id=\"fwk-38086-ch08_s02_p19\" class=\"indent para\">Google will even sell you a bit of its technology so that you can run your own little Google in-house without sharing documents with the rest of the world. Google\u2019s line of search appliances are rack-mounted servers that can index documents within a corporation\u2019s Web site, even specifying password and security access on a per-document basis. Selling hardware isn\u2019t a large business for Google, and other vendors offer similar solutions, but search appliances can be vital tools for law firms, investment banks, and other document-rich organizations.<\/p>\n<\/div>\n<div id=\"fwk-38086-ch08_s02_n04\" class=\"bcc-box bcc-highlight\">\n<h4 class=\"title\"><\/h4>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<div class=\"textbox shaded\">\n<div class=\"chapter standard\">\n<div class=\"ugc chapter-ugc\">\n<div class=\"bcc-box bcc-highlight\">\n<h4 class=\"title\">Trendspotting with Google<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"bcc-box bcc-highlight\">\n<p id=\"fwk-38086-ch08_s02_p20\" class=\"nonindent para\">Google not only gives you search results, it lets you see aggregate trends in what its users are searching for, and this can yield powerful insights. For example, by tracking search trends for flu symptoms, Google\u2019s Flu Trends Web site can pinpoint outbreaks one to two weeks faster than the Centers for Disease Control and Prevention (Bruce, 2009). Want to go beyond the flu? Google\u2019s Trends, and Insights for Search services allow anyone to explore search trends, breaking out the analysis by region, category (image, news, product), date, and other criteria. Savvy managers can leverage these and similar tools for competitive analysis, comparing a firm, its brands, and its rivals.<\/p>\n<div style=\"text-align: center; font-size: .8em; max-width: 497px;\">\n<div id=\"fwk-38086-ch08_s02_f03\" class=\"figure large\">\n<p class=\"nonindent title\"><span class=\"title-prefix\">Figure 14.6<\/span><\/p>\n<p class=\"indent\"><a><br \/>\n<img decoding=\"async\" style=\"max-width: 497px;\" src=\"https:\/\/pressbooks.ccconline.org\/wp-content\/uploads\/sites\/324\/2026\/01\/c7af9e9668195c53e66071e17fbdf87e.jpg\" alt=\"Google Insights for Search can be a useful tool for competitive analysis and trend discovery. The chart above shows a comparison (over a twelve-month period, and geographically) of search interest in the terms Wii, Playstation, and Xbox.\" \/><br \/>\n<\/a><\/p>\n<p class=\"indent para\">Google Insights for Search can be a useful tool for competitive analysis and trend discovery. The chart above shows a comparison (over a twelve-month period, and geographically) of search interest in the terms Wii, Playstation, and Xbox.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"fwk-38086-ch08_s02_n05\" class=\"bcc-box bcc-success\">\n<div class=\"textbox textbox--key-takeaways\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><span style=\"font-family: 'Cormorant Garamond', serif; font-size: 1em; font-style: normal; font-weight: bold;\">Key Takeaways<\/span><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<ul id=\"fwk-38086-ch08_s02_l02\" class=\"itemizedlist\">\n<li>Ranked search results are often referred to as organic or natural search. PageRank is Google\u2019s algorithm for ranking search results. PageRank orders organic search results based largely on the number of Web sites linking to them, and the \u201cweight\u201d of each page as measured by its \u201cinfluence.\u201d<\/li>\n<li>Search engine optimization (SEO) is the process of using natural or organic search to increase a Web site\u2019s traffic volume and visitor quality. The scope and influence of search has made SEO an increasingly vital marketing function.<\/li>\n<li>Users don\u2019t really search the Web; they search an archived copy built by crawling and indexing discoverable documents.<\/li>\n<li>Google operates from a massive network of server farms containing hundreds of thousands of servers built from standard, off-the-shelf items. The cost of the operation is a significant barrier to entry for competitors. Google\u2019s share of search suggests the firm can realize economies of scales over rivals required to make similar investments while delivering fewer results (and hence ads).<\/li>\n<li>Web site owners can hide pages from popular search engine Web crawlers using a number of methods, including HTML tags, a no-index file, or ensuring that Web sites aren\u2019t linked to other pages and haven\u2019t been submitted to Web sites for indexing.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<\/div>\n<div id=\"fwk-38086-ch08_s02_n06\" class=\"bcc-box bcc-info\">\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><span style=\"font-family: 'Cormorant Garamond', serif; font-size: 1em; font-style: normal; font-weight: bold;\">Questions and Exercises<\/span><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<ol id=\"fwk-38086-ch08_s02_l03\" class=\"orderedlist\">\n<li>How do search engines discover pages on the Internet? What kind of capital commitment is necessary to go about doing this? How does this impact competitive dynamics in the industry?<\/li>\n<li>How does Google rank search results? Investigate and list some methods that an organization might use to improve its rank in Google\u2019s organic search results. Are there techniques Google might not approve of? What risk does a firm run if Google or another search firm determines that it has used unscrupulous SEO techniques to try to unfairly influence ranking algorithms?<\/li>\n<li>Sometimes Web sites returned by major search engines don\u2019t contain the words or phrases that initially brought you to the site. Why might this happen?<\/li>\n<li>What\u2019s a cache? What other products or services have a cache?<\/li>\n<li>What can be done if you want the content on your Web site to remain off limits to search engine indexing and caching?<\/li>\n<li>What is a \u201csearch appliance?\u201d Why might an organization choose such a product?<\/li>\n<li>Become a better searcher: Look at the advanced options for your favorite search engine. Are there options you hadn\u2019t used previously? Be prepared to share what you learn during class discussion.<\/li>\n<li>Visit Google Trends and Google Insights for Search. Explore the tool as if you were comparing a firm with its competitors. What sorts of useful insights can you uncover? How might businesses use these tools?<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<\/div>\n<p class=\"indent\"><sup>1<\/sup>Most Web sites do have a link where you can submit a Web site for indexing, and doing so can help promote the discovery of your content.<\/p>\n<p class=\"indent\"><sup>2<\/sup>Google, \u201cGoogle Announces Fourth Quarter and Fiscal Year 2008 Results,\u201d press release, January 22, 2009.<\/p>\n","protected":false},"author":217,"menu_order":2,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[49],"contributor":[],"license":[],"class_list":["post-231","chapter","type-chapter","status-publish","hentry","chapter-type-numberless"],"part":222,"_links":{"self":[{"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/pressbooks\/v2\/chapters\/231","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/wp\/v2\/users\/217"}],"version-history":[{"count":5,"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/pressbooks\/v2\/chapters\/231\/revisions"}],"predecessor-version":[{"id":789,"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/pressbooks\/v2\/chapters\/231\/revisions\/789"}],"part":[{"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/pressbooks\/v2\/parts\/222"}],"metadata":[{"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/pressbooks\/v2\/chapters\/231\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/wp\/v2\/media?parent=231"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/pressbooks\/v2\/chapter-type?post=231"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/wp\/v2\/contributor?post=231"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/bus3060\/wp-json\/wp\/v2\/license?post=231"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}