Holy Spiderman! What are Web Crawlers?

With the pervasiveness of the internet, everyone is becoming more and more of an expert, at least to some degree, in things they never even knew existed. Search engine optimization is a phrase that didn’t even exist 15 years ago, and now most people at least have a vague idea of what it is. Even the most non-technical folks now know about servers and bandwidth, pixels, and refresh rates.

What about web crawlers or search engine bots?

Ah, an area that is still a mystery to most. But you don’t want it to be a mystery to your practice website. You definitely want web crawlers crawling all over your practice website because that’s important for ranking higher in search returns.

In this springy blog where much of the country is finally crawling out from another long winter, let’s get into just what web crawlers are and why they’re important for the visibility of your practice website.

What is a web crawler?

There are different terms for these guys — web crawlers, spiders, and bots. Their job is to learn what (almost) every webpage on the web is about. This then gives the search engines ideas of where the content can be found that answers a searcher’s request. That’s where the term crawler originated, as crawling is the technical term for automatically accessing a website and looking at its data through the commands of a software program.

Web crawlers are operated by (for the most part) the search engines. The search engines use intricate algorithms with billions of lines of code to take the data collected by the bots to then use in search returns.

You can compare these bots to someone trying to make sense of a roomful of haphazard books. To understand what each book was about, the person would read the title, look at the summary on the inside flap, and read some pages in the book to fully discern what it’s about. Then the person could organize the room’s books according to where they fit. Now, if someone comes along and asks a question about gardening, the person has organized the room so that book can be pulled and the answer delivered to the query.

Indexing is key

The web crawlers go about their endless task of discerning the content of the web in order to index all of this information. Search indexing creates an index that allows the search engine to know where on the internet it needs to go to retrieve the information that matches the search query. For a comparison in the non-digital world, you could compare indexes to a library’s card catalog or to the actual index in the back of a book that tells you what page to go to find the information you’re seeking.

Indexing uses both the text that appears on pages of a website, along with all of the background metadata that people don’t see. Metadata are the titles and descriptions of elements on websites that are only there for the web crawlers to see and read. These titles and descriptions help the web crawler understand what each page is about. As for the page’s content, search engines use all the words on the pages except for words such as “a,” “an,” and “the.” It scans those indexed words and the metadata to then find the most relevant ones to match the search.

Where to start?

When considering it’s estimated there are over 1.7 billion websites on the web, that seems like an impossible task. Of course, it’s estimated that less than 200 million of these sites are active. How could anything, even a monster algorithm, do such a job?

Web crawlers start from a list of known URLs, and they crawl those sites first. As they crawl those sites, they come across hyperlinks to other sites, and they move to those sites next.

They also follow certain rules set up by the governing algorithm about which sites to crawl, in what order, and how often it should go back looking for new content.

They don’t crawl every site publicly available on the internet. But the algorithms usually send them first to sites based on the number of other sites that link to those sites, the number of visitors to those sites, and other factors that make certain sites more relevant for being good sources of information for the search engine.

That’s why having your practice website linked to other sites is important, as the bots then view you as a resource for high-quality information, an authority. The search engine algorithm will want to make sure to index your site for those reasons, as it may want to use it for search returns more often. You could think of a comparison as a library having more copies of popular books.

Come back for more

Google and the other search engines want to deliver the latest information for their search returns. To ensure sites are continually updating and adding new information, they reward this new content in search rankings. You can think of this as self-serving, but also to ensure the best search returns. For instance, the endoscope has dramatically changed various surgical procedures, everything from knee surgery to back surgery to a brow lift. If Google only returned procedures detailing older “open” surgical methods, rather than the newer minimally invasive endoscopic methods, those returns would disappoint the searcher.

That’s why we are constantly touting the value of new content for our practice sites. This can be done by updating content on older pages, adding new pages, or simply having an active blog on your site. All of this is viewed by the bots as new content, and they are continually re-indexing sites based on whether or not they have fresh content.

Now you know what web crawlers are and how important they are when it comes to your practice website ranking in organic search. If you have any questions about how we can help ensure the bots are finding what they like on your practice website, just give us a call at Advice or fill out a contact form and we’ll talk.

Scroll to Top