How Google Indexing Works
How Google Indexing Works
Before a search engine can advise the reading of a site, that is to say place it in its search pages, it must
take cognizance of the content and identify how to make it quickly available to the Internet user. who
asked him the question.
The site's crawl : To index a site, search engines operates a research program often called "robot" or
"spider" that will browse the site and index its content. This phase of indexing is called the crawl (from
the English verb crawler which means to brew). During this exploration phase, the robot browses the
content and follows the links on the page in order to discover the linked content.
The copy of the content : When browsing the site, the robot copy its contents to be stored in the
search engine servers to be able to analyze and study its evolution.
Indexing : Indexing the site designates the operation making it possible to codify the content stored in
the servers in order to be able to offer it for reading by Internet users according to the keywords
contained in their question. Indexing, that is to say the fact of recording that such or such content may
interest a question on such or such other keyword differs from one search engine to another while
remaining in the end quite close . In the rest of this book, we will focus almost exclusively on the Google
methodology, which represents approximately 90% of internet searches carried out in France.
Note :
Not all pages are indexed by Google. It is possible to prevent the indexing of certain pages by indicating
to google that the content should not be made available on the web.
Content indexing is one of the strategic functions of search engines. The latter are therefore changing
the indexing process in order to modernize them. For example, in 2010, Google modified its
infrastructure in order to accelerate the constitution of the index and to facilitate its update. It allowed :
to quickly make newly crawled and indexed pages accessible in the SERPs
speed up the indexing of news pages in order to make them visible in the SERPs as quickly as possible.
When we create a site we create a document allowing google to know the pages that should or should
not be referenced. Sitemaps are a protocol that allows a site to notify search engines of the addresses of
a website available for automatic indexing. This is a sitemap ("the sitemap") that lists the URLs of the
site. It allows you to add additional information about each URL, such as the date modified, the
frequency of update and its importance. This therefore allows search engines to crawl the site more
intelligently.
Ask Google to index your site . All you have to do is use the submission form offered by Google and
specify the URL of the domain name of the site. After adding the URL, Google will receive your request
for inclusion in its index.
To ask another site with the same theme to link to your site (this is called making a backlink). When the
Googlebot crawls the external site, it will find the link, follow it, and discover your site. It will therefore
crawl your site and index it immediately.
Register your site in one of the main referencing directories (Dmoz directory type). This method is
slower than link tracking but works fine.
Make an automatic submission through internet directories which are actually software programs that
automatically submit the site to different directories. This approach is not recommended because it
involves many risks that could harm the referencing of the site (over-optimization of anchors, poor
quality directories, choice of directories unrelated to the theme of the site, poor quality of directory
metrics, etc.).
The most effective solution is clearly to ask a site to backlink to you. It allows to have a shorter indexing
time, especially if the external site is often updated, the googlbot then crawls the site regularly in order
to keep its indexing up to date. Registering your site in an SEO directory is slower than link tracking but
works perfectly.
It is possible to ask google to do a quick indexing. By going to Search Console (or Google
WebMasterTools) it is possible to ask Google to index a page quickly. Google indicates (without
providing any guarantee) that it will do the necessary within 24 hours.
Note : there is an indexing quota, be careful not to make too many indexing requests. It is preferable
to keep this approach for strategic pages.
It is possible to deindex a site. This consists of removing the site from the database. All you have to do is
delete the pages from the index, during the visit, the robot will not go to its pages which will be invisible
to it. Consequently it will deindex them. Two solutions are possible:
In case of emergency, you must add the URLs to be deleted in a sitemap file and use the “expires” tag.
If there is no urgency, just go to the WebMasterTools google index and go to the URL to delete section.
Having your site indexed by Google is actually a request for it to come to the crawler and then index it in
its servers. Without this request it is impossible to hope to appear in Google results.
Should you ask Yahoo, Bing, Exalead, Qwant to index your site?
Even if Google represents 90% of searches made in France, it is not the only one to index sites. It may
therefore be interesting to submit your site to engines like Yahoo, Bing Exalead or Qwant.
The process is identical for each engine, you just have to go to the engine link allowing you to submit
your site, it will only take a few minutes for each request.
The sitemap or "site map" is a file which includes all the information useful for referencing the site (URL
of the pages, additional information, etc.). Edited in XML format, it allows Google to read the entire plan
of your site in order to index it as efficiently as possible. However, even if it has the sitemap, Google is
free to index only what it wants, nothing obliges it to index everything. Each search engine decides what
to index or not.
Indexing a site is the necessary first step, but not sufficient. An indexed site does not mean that it is
visible, that is to say well positioned in search engine results (SERP).