crawl website pages

Semalt: A Guide To Using And Analyzing The Log File For SEO


Log files offer us useful data to analyze information related to the technical aspects of the domain, in order to have the tools to check if a search engine reads the site correctly and scans all pages. Already from this, we should understand the SEO value of log files analysis. 

But there are also other important aspects that come from these operations. All this, coupled with the use of better SEO tactics and tools such as the DSD, will make your site more visible. 

Follow this entire guide to understand everything about how the log files work. 

What is a log file?


Log files are simply files in which the webserver keeps track of every request made by robots or users on our site. 

In effect, log files are records of who accessed the site and the content they accessed. They also contain information about who requested access to the website (also called "client"), distinguishing human visitors from search engine bots. 

In addition, log file records that are collected from the site's web servers are usually kept for a certain period and are only made available to the webmaster.

How are log files created?

Each server records events in the logs differently. But the information provided is always similar and organized in fields. 

Indeed, when a user or a bot visits a web page of the site, the server writes an entry in the log file for the downloaded resource. That is, the log file contains all the data about this request and shows exactly how users, search engines and other crawlers interact with your online resources. 

Meaning and value of the log file

The log file tells the whole story of the operations recorded during the daily use of the site (or, more generally, of a software, an application or a computer) keeping all the information in chronological order both when it works well and when errors and problems occur. 

Indeed, the register contains useful data to have a perfect knowledge of the state of the site. For example, it allows identifying if pages are scanned by harmful or useless bots (whose access is then forbidden, in order to lighten the server), if the site's throughput is good or if there are pages that are too slow, if there are broken links or pages that return a problematic status code. 

More generally, through log files, you can find out which pages are visited the most and how often, identify possible bugs in the online software code, identify security flaws and collect data on the site's users to improve the user experience. 

Moreover, all this information combined with the use of a better SEO tool such as the SEO Personal Dashboard on your site will improve your site's ranking in no time.


Where to find and how to read the log files?

Trivially, to analyze the site log file, you need to get a copy of it. But, the method of accessing it depends on the hosting solution (and the level of authorization). 

Indeed, in some cases, it is possible to get the log files from a CDN or even from the command line, which is to be downloaded locally to your computer and run in export format. 

Usually, to access the log file, you need to use the file manager in the server's control panel, via the command line, or an FTP client (such as Filezilla, which is free and generally recommended). 

This second option is the most common. In this case, you need to connect to the server and access the location of the log file, which is usually, in common server configurations. 

On the other hand, sometimes it is not easy to retrieve the log file, because errors or problems may occur. For example, the files may not be available if they have been disabled by a server administrator; or they may be large or configured to store only recent data. 

In other circumstances, there may be problems caused by CDN or the export may only be allowed in a custom format, which is unreadable on the local computer. However, none of these situations is unsolvable and it is only necessary to work with a developer or server administrator to overcome the obstacles. 

However, Semalt is available to provide you with better support services for all your SEO concerns.

What is log file analysis and what does it do?



You already have some ideas why log file analysis can be a strategic activity to improve site performance. It reveals information about how search engines analyze a domain and its Web pages. 

In particular, when performing this operation, you should focus on studying certain aspects, such as:
  • How often Googlebot crawls your site; list the most important pages (and if they are crawled) and identifies pages that are not crawled often
  • Identify the pages and folders that are crawled most frequently
  • Determine the crawl budget and check for any wastage on irrelevant pages
  • Search the URL(s) whose parameters are crawled unnecessarily
  • Validate the switch to Google's mobile-first indexing
  • Specific status code served for each page of the site; search for areas of interest
  • Search for unnecessarily large or slow pages
  • Search for static resources that are scanned too frequently
  • Search for frequently scanned redirect strings
  • Detect sudden increases or decreases in robot activity

How to use log file analysis for SEO?

Looking at a log file for the first time can be a bit confusing. But it takes a little practice to understand the value of this document for the optimization of your site. 

Indeed, performing an analysis of the log file can provide you with useful information on the perception of your site by the search engine robots in order to help you define a referencing strategy and necessary optimization interventions. We know, in fact, that each page has three basic SEO states: crawlable, indexable and classifiable. 

Obviously, to be indexed, a page must first be read by a bot, and the analysis of the log file allows us to know if this step is correct. 

In fact, the study allows system administrators and SEO professionals to understand exactly what a bot is reading, the number of times the bot reads the resource and the cost in terms of time spent and of crawls. 

Therefore, the recommended first step in the analysis, according to Ruth Everett, is to select the site connection data to display only the search engine bot data, setting a filter limited to only the user agents you are interested in. Similarly, the same expert suggests some sample questionnaires that can guide you in analyzing the log file for SEO:
  • What part of the site is actually crawled by search engines?
  • What sections of the site are crawled or not crawled?
  • How deep is the site crawled?
  • How often are certain sections of the site crawled?
  • How often are regularly updated pages scanned?
  • How long does it take for new pages to be discovered and crawled by search engines?
  • How will the change in site structure / architecture affect search engine crawl?
  • How fast does the website crawl and download resources?
In addition, using a better SEO tool such as the SEO Personal Dashboard ensures your site's success on search engines.

Log files and SEO: useful information to look for

The log file gives you an idea of how much crawling your site is doing and how much crawling budget Googlebot is spending on you. 

Even if we know that "most sites don't have to worry too much about the crawl budget", as John Mueller from Google often says, it is still useful to know which pages Google crawls and how often so that you can eventually intervene to optimize the crawl budget by allocating it to more important resources for your business. 

Indeed, on a broader level, you need to ensure that the site is crawled effectively and efficiently. Information like this can also be found in the Google crawl statistics report, which allows you to view Googlebot crawl requests for the past 90 days, with analysis of status codes and file type requests, as well as on Googlebot type (desktop, mobile, ads, image, etc.). 

However, this report only presents a sample of pages and therefore does not give a complete picture of the situation from the site's log files.

What data to extrapolate in the analysis?

In addition to what has already been written, the log file analysis offers other useful insights to look for to further your framing. 

For example, you can combine status code data to see how many requests end up with different results at code 200, and thus how much crawl budget you are wasting on broken or redirected pages. At the same time, you can also examine how search engine spiders crawl indexable pages on the site, compared to non-indexable pages. 

In addition, by combining the log file data with site crawl information, we can discover how deeply they crawl the site architecture. According to Everett's statement, "When log files show that Googlebot is not crawling our key product pages often, we need to make optimizations that increase the visibility of those pages. 

Therefore, one possible intervention to improve this is internal links, another important data point you can examine from this combined use of log files and analytics. Generally, the more internal links a page has, the easier it is to be discovered. 

Again, log file data is useful for examining how a search engine's behaviour changes over time, especially when a content migration or site structure change is underway, to understand how this has affected site crawling.

Finally, the log file data also shows the agent used to access the page and thus can tell you whether the access was done by a mobile or desktop bot. This means that you can see how many pages of the site are scanned from mobile versus a desktop computer. 

Conclusion

Thanks to the content of this article, it is easy to understand the importance of analyzing log files to better understand the functioning of your website. 

Thus, all this data combined with the use of a better SEO tool such as the SEO Personal Dashboard will allow you to position your website in the top position of search engines.

Do you have any questions or concerns about this article? Feel free to write to us in the comments or contact us directly.