Seotistics - Analytics & SEO

๐Ÿ’ก How To Generate SEO Data For Testing [Part 1]

Published 12 months agoย โ€ขย 5 min read

Use Data Or Be Used By Data!


The May 29 issue of Seotistics is here for you!

Today we talk about practicing with data, something many of you always ask me.

But Marco, what if I don't have any data? Or even worse, "my client won't give me access".

This is quite normal in practice and can pose a threat to your SEO work.

After all, we can't really do SEO without data. And that's why we are going to create fake data.

P.S. This is only Part 1 of this fascinating topic, next week you'll get to read Part 2!

Please move this email to your Primary inbox. This is to prevent Seotistics from going into spam by accident. Gmail users can read this tutorial to do it.

๐Ÿ”‘ Key Concepts

Let's see some basic concepts:

  • Synthetic data: data that don't exist in reality. You generate them to emulate actual data (but they are still fake).
  • API: what allows you to access the features, functions or data of a software. This isn't the complete definition but let's stick with it.
  • Float number: decimal number, e.g. 6.7895
  • Integer number: whole number (positive or negative), e.g. 2 or -2

In SEO we don't have much literature or public datasets to work with.

Once again, you don't need to test on actual datasets, you can generate them yourself!

Of course, you also want to have real datasets if possible but reality isn't really a fairy tale.

๐Ÿงฎ Actionable SEO Tip - Do It Your Way

OK, you don't have any website or access to data...

Well, you still need to get them in order to work but if you need to test something, I have a solution.

It's possible to generate synthetic data with coding.

But first, let's see what you can get without any 1st party access:

  • Keyword data/Competitor data (Semrush/Ahrefs)
  • Crawl data (Screaming Frog/Sitebulb)

Not much, and I don't recommend relying on Semrush/Ahrefs if you have to make important decisions. โŒ

As said before, you may want to test your ideas on Google Search Console/Analytics data instead.

What if you are in your free time and can't use any client data?

Well, you generate them! Let's see a quick example.

I want a dataset with the following columns:

  • query: a string containing one or more words
  • page: a string with a specific format
  • date: date format, any is fine
  • clicks: integer number (you can't have 1.5 clicks, either 1 or 2)
  • impressions: integer number (as above)
  • ctr: clicks/impressions, so you get a float number

We have some requirements in place but that's not all!

We know that a page can be tied to multiple queries and dates... with different clicks and impressions.

Page A can rank for query x on a given day and get 50 clicks. The next day (for the same query) you may get 30 clicks.

This fact must be taken into account when generating data.

โš ๏ธ I don't use position except when I need to create pivot tables to count queries over time (check issue #1).

Generating such a metric is a little bit harder but we will see how in the next issues ;)

๐Ÿ’ก Using Coding To Solve The Issue

Coding comes to our rescue once more time as many of you were already giving up.

We have some idea on how we want our data to be, so now we should generate it.

What I've just said can be translated into code you can reuse when needed.

For this specific use case, I prefer R over Python because it makes more sense to me but anything goes.

The hardest part is generating queries that actually make sense when read, so for this you need to do some hard work.

You can create a list of words you want to combine to create different fictional combinations.

๐Ÿ”— Google Colab Link With R Codeโ€‹

If you have doubts, reply to this email. In any case, don't worry, Part 2 will make a lot of things clearer.

๐Ÿ’ก The SEO Insights

I have just showed you don't need any excuse to delay learning or testing data.

You can prepare some scripts or analyses in advance to persuade clients to give you data access.

I think that you shouldn't even accept in such cases but you know, sometimes you just need persuasion.

This practical exercise is also good to understand how data may be generated in practice.

Understanding how metrics work and what could be realistic values for them is a must for troubleshooting.

โ“ Are there other methods?

Yes, I have only showed you the tip of the iceberg.

Marketing literature has many examples of generating synthetic data and it's not as straightforward as you imagine.

โœ… Probability distributions are the best way to think about metrics... and this is coherent to how many professionals generate data.

But again, this topic will be covered in the next issues!

P.S. Thanks for reading! I recommend you check the resources because I mention one great library to generate fake data!

๐Ÿงต My Selection Of Twitter Threads

A quick recap for those who haven't read them all or need a refresher:

๐Ÿ‘ฅ Launching a Community (Join The Waitlist)

I and some friends have decided to launch our personal SEO community. It won't be about Analytics only, as we will cover everything about SEO.

For sure, we will preserve the focus on data skills because that's the future of SEO!

๐Ÿ”Ž Analytics For SEO Ebook (v2)

This ebook is aimed at SEOs or Business Owners who want to explore the combination of SEO and Analytics.

It will teach you or your employees to:

๐Ÿ‘‰ Avoid common pitfalls that cost you money ๐Ÿ’ธ

๐Ÿ‘‰ Create meaningful analyses that add value ๐Ÿ’ฏ

๐Ÿ‘‰ Shorten the learning time of Analytics โณ

This comes with monthly updates because I want to create the Ultimate Guide out there.

The April update includes the following new information:

โœ… Categorize Pages

โœ… More on Content Audits

โœ… Handling Large Files

v3 (coming out in a few days) will feature:

  • Quick And Simple Way Of Detecting Keyword Cannibalization
  • Statistical Inference And Statistics (Update)
  • Update For Use Cases 2 and 5
  • Going Deeper With Analysis (Google Analytics, Screaming Frog, etc.)
  • R Approach To Some Problems

๐Ÿ“š Recommended Reads

This week there are some peak recommendations you don't want to sleep on:

The first 2 reads were recommended by Benjamin Crane and honestly... it's peak quality!

โ—๏ธ Feedback and Recommendations

If you have ideas/recommendations for the next issues of Seotistics, you can simply reply to this email.

Marco Giordano
SEO Specialist & Data Analyst

Follow me on ๐Ÿ”ฝ๐Ÿ”ฝ๐Ÿ”ฝ:


Bernerstrasse Sรผd 169, Zurich, Switzerland
โ€‹Unsubscribe ยท Preferencesโ€‹

Seotistics - Analytics & SEO

by Marco Giordano

The Seotistics newsletter is written by Marco Giordano, an SEO Specialist focused on content and Data Analyst. Tired of the usual SEO content? Seotistics teaches you how to use Analytics and data in your workflow while helping you with Content Management & Strategy.

Read more from Seotistics - Analytics & SEO

Use Data Or Be Used By Data! The May 20 issue of Seotistics is here for you! Whether you are an SEO or a marketer, an entrepreneur or a data professional, content is part of our daily lives. I am here to debunk the old vision of "SEO Content" and show you why diversification is key and content is much more than text. Please move this email to your Primary inbox or reply to it. This is to prevent Seotistics goes into spam by accident. Gmail users can read this tutorial to do it. Read this in...

1 day agoย โ€ขย 7 min read

Use Data Or Be Used By Data! The May 13 of Seotistics is here for you! In the last few years companies have started caring more about their web data. The same problems still persist though, as many fall for these 5 common data traps. I help you get better results in just 10 minutes. But you need to practice and put some effort into it. Please move this email to your Primary inbox or reply to it. This is to prevent Seotistics goes into spam by accident. Gmail users can read this tutorial to do...

8 days agoย โ€ขย 7 min read

Use Data Or Be Used By Data! The May 6 issue of Seotistics is here for you! With all the AI buzz around, we don't have to forget about the useful techniques for analyzing websites. Yes, some evergreen methods you can use now to add value to businesses. P.S. I've added a referral system (scroll down)! Go invite your friends and get crazy rewards! Please move this email to your Primary inbox or reply to it. This is to prevent Seotistics goes into spam by accident. Gmail users can read this...

15 days agoย โ€ขย 7 min read
Share this post