Here you find the transcript and a link to the youtube recording of SEOnerdSwitzerland meetup ‘The Mystery Behind Indexing Problems’ with Aleksandra Zarzycka.
In the talk, Aleksandra looks at indexing trends and partial-indexing problems that happen more and more everyday.
Aleksandra knows one fact for sure: Getting content rendered and indexed by Google is not a given anymore– and it’s even getting harder to earn a place in Google’s index.
? To support SEOnerd Switzerland, invite your friends to come to the next event and share this article.
Meet Aleksandra Zarzycka
Aleks ventured into the world of SEO from sales with a mission to help organizations put relevant content on their websites. For her, SEO is all about the user, and search engines are just users with particular needs. At Onely, Aleks helps organizations with their crawling and indexing strategies, improving information architecture, and all things technical SEO.
Follow Aleks on Twitter @al_zarz
Follow Aleks on LinkedIn Aleksandra Zarzycka
Full webinar recording with Aleksandra Zarzycka
Thank you Aleksandra Zarzycka for your presentation
Thanks to our speaker Aleksandra Zarzycka, we are so happy to welcome you! Preparing a presentation and being present at the meetup take a lot of time.
SEOnerdSwitzerland is nothing without speakers willing to share their knowledge. I am happy we got to welcome Aleks!
About #SEOnerdSwitzerland
SEOnerdSwitzerland is a non-profit association that aims at promoting and sharing knowledge about SEO (Search Engine Optimization). SEOnerd Switzerland organizes events in person and in webinars.
Join the community of SEO enthusiasts.
Follow us on Twitter @SEOnerdSwitz where we share slides, nexts events and articles we enjoyed.
Full transcript of the webinar with Aleks
Transcript created with the help Ross John dela Rosa.
Isaline: Hello, everybody. This is us, Sara and I, Isaline. And we are happy to be back for another SEOnerdSwitzerland. We kind of missed you during the summer. I mean, it was good to have holidays, obviously, and sun but we still kind of missed our little meetup. So we’re really happy to be back now. And so I see that people are joining us. There is a little number at the bottom of my screen saying 9, 10, 11, and more. So that’s great.
Welcome to SEOnerdSwitzerland. This is the first meetup of our back-to-school moments like the September meetup. And for the ones who don’t know yet, if it’s your first time, SEOnerdSwitzerland is a non-profit association, which means that everything that Sara and I do, it’s like a passion project. So it’s our free time. We’re not paid to do that but we have lots of fun doing it and working together. But we do have sponsors, which I’ll mention later to help us cover, of course, the operating cost and the software, and these kinds of things. So anything you do to share the events, it’s great for us. It helps us a lot.
And a little bit of history here, Sara and I, we met because we worked together in the same agency. And then I started to go on my own to start my own consultancy. And we were like, “Oh, but we need to keep working doing something together.” So we started SEOnerdSwitzerland. And the second reason also that we started the meetup is that we didn’t find a meetup about SEO here where we lived. And we were like, “Oh, but if there is not a meetup, then we’ll start the meetup.” Then COVID happened and the meetup turned online. And then I don’t know really how it happened. But now, you are here and we have online meetups, and we have an international audience and we love it. So we will not go back to having in-person meetups ever, I think. [chuckles]
Sara: Yes, exactly. Correct, girl. We might see you, maybe, at some conferences when we travel again, and have a drink there but for now, it’s just the meetup and the apero will be– you know, drinks together will be for some other time. And so let’s go to our sponsor. Okay, time to thank our sponsors. So Liip, a web agency, one of the biggest in Switzerland where Sara works and PILEA.ch that’s my consultancy. And that’s about Zoom and meetup, and the gifts, and these kinds of things. So thanks so much for their support. This is really, really helpful.
Sara: And this is the next event. So let me introduce Emilia, Emma for the friends. So she is a computer science engineer. She works in SEO. And we particularly love her approach, her critical thinking. So we asked her to participate in our meetup because she has this background. Sometimes, people are a little bit scared about people in computer science, which is the perfect mix. And then she will explain to us a bit how she works, how she interacts, how the two worlds should work together. And then she will give us some great examples. She will speak about Google Cloud, too. So yes, I mean, I would say there is a meetup for everybody that wants to better understand these two words, and wants to have a better critical thinking and be ready to integrate better between everybody.
So I hope that I was clear. If not, you can read Emilia’s presentation. Okay. So now, I should introduce Alex. But before– I just would like to do something. Oh, no, but you can’t see me. So I will do it at the end. So let me introduce Alex, as I said. So she didn’t have like a linear pattern. Now, Alex, correct me if I’m wrong. So she started in sales for a travel agency, and then as most of us start in another domain, she switched. And she switched to SEO. And now, she works for Onely, which is one of the main technical SEO agencies. So one thing that we really like about Alex is when you speak to her, she really has a human approach in everything. Whenever we were organizing the meetup, she had this lovely human approach, which is not something that you’ll find all the time. And there was this sentence, which she sent us, and we find it lovely. And it was “SEO is all about user.” And “I know what you’re thinking because I also saw the same thing.” Yes, this is what everybody says. And yes, we repeat it too but then there is Google now. And there is the other search engine. And then she completed the sentence with “And search engine are just user with particular needs.” So it is funny. It does exactly that. And I don’t know what else should we say. Should we say something else about you, Alex?
Alex: I think that you’ve covered a lot of it. Yeah, I ended up in SEO because I was dealing with a lot of people in sales. And then I figured that there’s a lot of information missing on websites for those people, you know? And yeah, that kind of stays the focus and Google is just kind of a filter and stuff. So yeah, I’m super happy to be here. And yeah, it’s been absolutely wonderful working with you, guys, so far. I think we can just get it all started with. [laughs]
Sara: Yes. Good. So we would let you share your presentation.
Sara: Yes. Can you see my screen? Wonderful. Oh, I went a little too far. You’ve seen too much. Stop it. Okay, cool. All right. So let’s just get it started. Thank you so much for the introductions. We have gathered in here so that we can talk about the indexing problems. So let’s move on to the indexing challenges. So as Sara said, and I’m going to be true to that throughout the whole presentation, I’m all about people. So before I kind of started finalizing the presentation, I checked up who signed up for the meetup today. And I realized that we’ve got quite of a– or at least from the subscriptions there, we had quite a lot of diverse bunch with us. So there were SEOs with quite a lot of experience in the field, but there are also more generally focused digital marketers. So I thought it would be a good idea to start the presentation with kind of getting on the same page with how search works because getting on Google is actually a process. And it’s important to understand that each of these processes is equally important to get us to the final line, which is the ranking.
So Google puts a lot of effort into discovering URLs. They discover URLs because they already have some URLs in their databases. They discover URLs from venturing from one URL to another. They discover URLs through sitemaps that are being uploaded by website owners. And then once they have the discovered kind of backlog, the list of discovered URLs, they start crawling them to understand what’s on them. They render the page so that they can understand the textual, non-textual, the layout portions of the page to see whether some of it can actually be presented later on in Google search.
Once they figure out that there are portions of that content that can be presented to users later on, this is where the indexing happens. So the cataloging portion of the content so that later on, when the user types something into Google, they can pull out the information from the index and give the user the information that they want based on the, let’s say, general quality of the search, the result– but, as well, based on the more user-focused factors like the user’s location, the language that they’re using, or the device that they’re using.
Now, once again, when we’re talking about digital marketing, we very often focus on the ranking itself but the ranking will never magically happen if the pages are not indexed. And that’s why at Onely, we focus a lot on the indexing part. And we noticed through a lot of research that mostly my colleagues have been through that indexing is quite of an underestimated problem, and a lot of big brands are struggling with it. So one of the bigger brands that is really struggling with it is Walmart that has about 50 percent of its content outside of Google’s index.
And they’re not alone. There are more brands that probably the majority of you are familiar with, that also struggle to have all of their valuable pages– putting all of their valuable pages on Google’s index. And so we have, for example, Sony, Zoom and Slack that have about one-fifth of their contents outside of Google’s index, which is pretty decent when you compare it to an e-commerce giant like Myntra that has the majority of URLs outside of the index or Menards and Amway.
So why actually is indexing such a big of a challenge? Google simply can’t crawl and index every valuable page. I always think about Google as this super-rich fantastic company that has its data sciences, data centers all over the world. that has a lot of super-smart people out there. So they seem like they have really a lot of money and a lot of resources to do what they do best, which is do Google search, but the problem is that the web is quite of an infinite space. So every single day, we have 250,000 websites added to the web. That’s 175 per minute.
So when you compare the vastness of these numbers with even gigantic resources that Google has, it simply doesn’t add up. It just cannot– it cannot match. And let’s face it. Not all of the websites that are out there are actually valuable for everybody or even for a portion of the users. So even if Google is aware of a lot of valuable pages, they’re going to make the Google bot visit only a portion of it. And then they’re going to index only a portion of that portion. So if we have these gigantic resources– so the index is kind of the amount of the URLs that are going to be actually indexed, it kind of shrinks in comparison to the overall amount of pages that they’re going to visit.
And so Google limits– they lead to Google trying to limit the number of URLs that they’re crawling, and then indexing from the very beginning. And that starts even at the moment of crawling. So Google skips some URLs without visiting them, so before they even understand what’s on those pages. How does it happen? So when Google discovers pages, they create a sort of a backlog. And every URL in that backlog is assigned a certain priority. Now, if the priority is high, then great, the URLs with the higher priority are going to get crawled. Then once they get crawled, they have a chance of being indexed, and they have a chance of being run. But if the URL is assigned a smaller priority, then there’s absolutely no way that without them being crawled, they’re not going to get indexed, and no one’s ever going to see them rank.
And this is the problem that can very easily scale up because imagine that one of the main pages doesn’t get crawled, then the subpages of that main page are not going to get discovered and crawled either. And it basically just goes on and on. So this way, the whole sections of the page may not get discovered or crawled, and then indexed and ranked by Google. But the next question is, should you even care? There are so many different ways in which you can make use of your URLs, so why care about indexing all of your valuable content? Well, I’m in SEO, so I do think that you should care because it’s organic traffic and I like it, and basta do it. [chuckles]
But jokes aside, you just think that you’ve already spent a lot of money creating those pages, right? So you’ve paid your developers, You’ve paid your content creators. You’ve put a lot of people on the job so that the URL is hanging out somewhere out there on the web. So why not make a little bit more of the money that you’ve already invested and allow that content to be indexed by Google? So if organic traffic is just not something that speaks to you, so maybe that will. And if not, there is a third kind of point that is closely related to your branding efforts.
And I stole that example from one of my marketing teams. It’s one of the girls from my marketing team. And she said, “Listen. Imagine it this way. So you’re looking for a book, right? And I love books. So that example just kind of hit me straight. And she’s like, “You’re looking for a book. You have this wonderful book that you want to buy. And you’re a fan of Barnes & Noble. Okay. So what do you do? You go to Google. You type the title of the book, and you type “Barnes & Noble,” and voila! You can see that on the first page of Google, Barnes & Noble is not even there. Well, that’s all right. You’re maybe a little sad for a millisecond but then you just buy your book elsewhere and you go home waiting for your book to arrive.
But the thought kind of stays with you that “Okay, the last time I was searching for a book, I didn’t find it on Barnes & Noble.” Now, you didn’t look for it on Barnes & Noble. You looked for it on Google. But a lot of people make this assumption that if it’s not on Google, it does not exist. And it might be wrong but it still happens. So if you’re not investing and if you’re not caring for the indexation of your valuable pages, this kind of branding loss can happen to you as well. So I do encourage everybody watching to do care about indexing.
All right. So we’re moving to the most fun part of the presentation, at least, the most fun for me. What are the most common indexing issues? So we have four of them. And I’m going to walk you through all four problems, and we’re going to focus mostly on diagnosing them.
So the first problem is URL indexing. It’s basically a situation in which the URL is not in the index at all. The second problem is mobile-first related indexing. It’s when the content that you would prefer indexed is not visible on the mobile version of your page. The third problem is related to JavaScript. It’s basically when the valuable content that you would like indexed is hidden in a way behind the JavaScript. And the fourth one is the layout based indexing problem, which is the most funky and the least clear to anybody even though we have already researched it pretty thoroughly. But I’m going to try to make it as simple as possible.
So each of these indexing problems has different solutions. They have different origins but as I said in the beginning, we’re going to focus on trying to diagnose them first. So moving on to URL indexing…
It loads a little bit. Okay, first… So we have a page, a Walmart page, product page. So you can buy a poster, you can find this page when you’re looking for products within the Walmart website. But then you take the URL. You take it to Google. You use the site: command. And you realize, “Nope, this particular product is not on Google. And this is how you get the information whether the URL is actually indexed or not. So this is a very simple method that I personally use pretty much every day. But this method has a little bit of a challenge. I’m going to call it this way.
So the site: command sometimes shows the false negatives. And what are the false negatives? So it’s a situation in which when you type the URL after the site: command, Google tells you, “No, we do not have it in our index.” But then you copy a portion of the content of the page. And you look for it using the site: command. And out of a sudden, Google tells you, “Oh yeah, we find the document. The page is totally in our index.” So this is why when diagnosing indexing issues, it’s worth actually to take your time and go through the process quite meticulously. You know, get your favorite beverage in your favorite cup and just go with it. Check it one by one and make fun of it or take a break if you’ve done it for a long time and it kind of gets a little boring for you.
So what are the main causes of URL indexing issues? So first would be the content quality. If you have a lot of thin pages or duplicate content, then you might start facing URL indexing issues as a result. Another is indexing bloat, which is, in short, the situation in which you’re indexing a lot of not-so-important pages, so there’s not enough space for your valuable pages. And the third one is the crawl budget issues when, once again, Google is crawling the pages that maybe are not that important and there is no time or resources for the ones that you find valuable. So these are all valid causes. And if you’re seeing that URLs are not getting indexed, you might go through your Google Search Console and double-check whether– you might have some of these issues.
Okay. So we’ve covered the URL indexing right now. We’re moving on to partial indexing. We call it partial indexing because that means that the URL itself is indexed but there are portions of the content that may not be for different reasons. And so the first reason why the portions of the content are not indexed is because the content that you want indexed does not appear on the mobile version of the page. Now, already, since quite a while, Google is judging the pages based on how they look on mobile. So it shouldn’t be a problem these days since a lot of websites already use responsive web design, but sometimes, we all make mistakes or the implementations are done maybe a little too quickly for a lot of different reasons.
So sometimes, it’s worth checking that the mobile version and the desktop versions exactly match because if they don’t, and if you find content that exists on the desktop version but does not exist on mobile, then when you check it for indexation on Google, it may just happen that it’s not indexed. Now, diagnosing mobile-first related indexing problems is really fun because in the first simple way, it’s basically like playing find five differences, right? You open one page with the mobile version of your page in one window. Then in another window, you open the desktop version of your page. And you just slowly scroll down, scroll down, and you take a look at the content here, content there. And if you spot a difference, then voila, there is a potential of a problem and you double-check it with Google.
The other, a little bit more geeky, way of diagnosing mobile-first related indexing issues is with a tool like Diffchecker that basically allows you to compare codes. It requires a little bit of code reading but it’s still pretty decent. And in case you wanted to pass the information on to your desk team, it might be useful to give them a code snippet that needs fixing.
On to the third indexing issue, which is related to JavaScript. So JavaScript-related indexing problem is a little bit like verifying the difference and trying to find how the two pages are different. It’s just that you’re comparing the page that is normally loaded and the one that is loaded without JavaScript. Now, in order to disable JavaScript, there is plenty of Chrome plugins you can use, Chrome dev tools as well. Or you can also use a free Onely tool that’s called What Would JavaScript Do that I’m going to show you a little bit later. But it basically is all about spotting what’s on the version of the page that has the JavaScript loaded and may not be when the JavaScript is disabled.
Now, once again, the process starts with checking whether the URL is there. So the URL that we’re analyzing in this particular example is on Google. But then we spotted a portion of the content that is loaded with JavaScript. So basically, it does not appear when the JavaScript is disabled. And we check it with Google and no, this portion of the content is basically not on Google. So if it’s important for the overall quality of your page, then you might want to change that.
And that brings us to the most funky part, to the most interesting part of the indexing issues, which is called layout-based indexing issues. Now, this part is– we’re going to have to jump a little bit into Google patents because this is the least, I’m going to say, least described or least talked about, at least, from my perspective. So the first kind of extract from Google patents tells us that content location matters, and that the text appearing above-the-fold may be considered more important than the text below-the-fold.
Now, once again, just so that we’re all on the same page, content below the fold is what you can see before you start scrolling. And it kind of makes sense, right? So the first impression of the page is what you see before you even move your finger. So Google might judge the portion that is higher on the page as more important than those more likely to index it.
Another one is that the importance of a section is based on the prominence of the section within the rendered layout. So if you have an example of a page like this, we can see that the main article is kind of the most prominent on the whole page and is the most likely to get indexed by Google. And this is kind of interesting when we think about what I’ve just said about above-the-fold content because above the fold, we can see that we have the menu and that we have the ads that Google kind of maybe not necessarily cares about so much because they do understand that the main content is the main article out there.
The third extract from the patents tells us that there are simply some portions of the page that Google is going to see as more important. And they give an example of More Top Stories from CNN, which basically Google is going to judge as more important and will give it more probability of being selected.
So now that we’ve covered all the important portions of the content, we’re going to go through the sections that are usually, absolutely, ignored or dismissed by Google. And these are the parts of the page that are very common on e-commerce websites, which are, for example,
“Customers also bought blah, blah, blah,”
or “Similar products blah, blah, blah.”
“You might also be interested in blah, blah blah.”
So Google really struggles with indexing these parts of the page. Now, if we take a look at the product page, of a sample product page from Target, we can see that the URL is getting indexed, so that’s fantastic. But then when we play around with different portions of the page, we can see that the sponsored part does not necessarily deserve Google attention and did not deserve this place in the index. So we kind of checked different pieces of content, and we realized that they were not really in the index there.
So now, we’ve covered all four different types of indexing issues. And the funny thing is that all of them, you could kind of easily identify, “Okay, so this particular page has this particular reason for not being indexed. This other page has this other reason for not being indexed.” But in real life and in the work that we do on a daily basis, it very often happens that all of these problems are combined at once on one page.
And that is exactly what’s happening with the example that I’ve used in the layout-based indexing issue. So we have the top of the page, the main content that is indexed, then another part of the product description is not indexed due to some JS and mobile-related issues, and some portions of the content that are even lower down the page are not getting indexed probably because of the layout issues. So that’s kind of how it looks in the real life. It’s much less of detective work but it’s still quite fun.
So now, I’m going to move to life examples. And I’m going to tell you a little bit of the story behind the preparation of this presentation. Even though it’s online, I really like the interaction. I thought I’m just going to gather about 50 to 20 examples of pages where I know there might be some indexing issues and I’m just going to click it there, it’s a click there and kind of show you what’s happening. But the problem or the good thing about indexing is that it’s very volatile. It changes quite a lot.
So I’m going to show you a couple of examples of pages where I’ve prepared for the presentation and being changed when I was kind of preparing the example, which for me, is pretty interesting to watch. And you have to believe me because, well, you were not here when I was looking for these examples. Okay. But let’s get straight to it. So we have the Myntra page, which is an Indian e-commerce that has quite a lot of its pages outside of Google’s index. So this is just a general category page. It looks pretty decent, normal. Most of the page are the product page then in the bottom, you have a beautiful description. Some content writer has poured their heart and soul into creating it, so fantastic.
Then you’re going to the mobile version of the page. It looks a little bit adjusted to the mobile page. So you have the products. You have the facet navigation in here. But then it already looks a little bit different because it has much more watches displayed on the mobile version than in the desktop version. And that was like this since I started preparing the presentation. So that has not changed. But when I scroll all the way down, a week ago, the presentation– not the presentation, sorry, the description was not there. So I could easily see like, “Okay, this is just the typical example of the mobile-first-related indexing issue.
And so I know this because I’ve been sleeping on that page for a very long time. But I’m thinking about some participants of this meetup who would be watching it and who would scroll down the page and see this right now. It would be really hard to pinpoint why a portion of that description—so I’ve already checked it for indexation before—is not in the Google index because this category description is actually visible with the JavaScript disabled, so it will be kind of hard to figure out. And this is usually what happens when you’re trying to diagnose indexing issues. It’s not that obvious.
So another example that I found just pretty funky is a page that has a bazillion of images and it has quite a lot of reviews as well, but when you remove JavaScript, basically, the only thing that stays is the product description in here. So when I typed in the portion of the review, you can see that it’s not on Google. So that’s just kind of a very straightforward JS-related indexing issue that I can judge without looking at Google Search Console because it’s never so obvious, but still, it can give me a pretty good understanding why this content is not indexed.
So very useful for diagnosing JavaScript-related issues is our What Would JavaScript Do tool, which basically shows us how a page looks when the JavaScript is enabled and disabled. So once again, I’ve chosen a product page of an e-commerce called Amway. So when you look at the differences, you can see that, well, when JavaScript is enabled, there’s quite a lot of content in here. But when JavaScript is disabled, yeah, it doesn’t even scroll, right? It’s just empty. But still, this page is indexed by Google. So once again, diagnosing indexing issues is not that straightforward but since Halloween is coming, I thought I would just show you one more example.
So this is a beautiful Halloween decoration with JavaScript disabled, so you cannot absolutely see the product. Only when you load the JavaScript right here, you’re going to see actually what the page is about. And so even though to me, personally, these two pages, the Amway page and the Menards page, kind of seem– kind of empty. The Amway page is indexed; however, the product page for Menards is not. So to me, diagnosing indexing issues is a little bit of a game. And I think that if you have some time, it’s really worth investigating and putting that indexing checkup on your digital marketing roadmap somewhere.
To sum up, since I know that the slides will be shared with you later on, I just included the links to some of the resources that I’ve mentioned during the presentation. And since at Onely, we’re super passionate about the indexing issues in general, I just wanted to let you know that we’ve also just launched ZipTie, which is a tool that helps you analyze your index coverage issues without access to Google Search Console. So this is pretty much the exercise that we’ve been doing so far. So thank you very much. I think I’ve spoken a little bit faster than I thought I would [chuckles] but yeah, that’s it for me.
Sara: So first of all, it was cool. It was great, the presentation. Thank you so much. I just want to say something before we start the Q&A because I’m dying to say that since the beginning of the meetup, because I’m proud of my partner in crime, Isaline, because she prepared a T-shirt about the recent SEOnerdSwitzerland that she sent me. And maybe she will prepare more. I’m still in the process of convincing her, especially, like a gift to me. So I hope that soon, we will have super cool T-shirt to share with everybody in the community. So yeah, you would say, “Sara, why are you speaking about that now?” But it doesn’t matter. I thought that I did have to say that because it was important to me. So that is the advantage of being in Austria.
Okay. So seriously speaking, let’s come back to business and let’s check if there are questions. As Isaline said, you can add the question in the chat or you can add them in the Q&A. If you have any kind of question, please go for it. I think then the presentation was quite clear but brief.