Return to Latest SEO test results 16 mins read

What SEOs need to know about testing AI-generated content

Posted on August 29, 2024 by Demetria Spinrad

Start here: how our SEO split tests work

If you aren't familiar with the fundamentals of how we run controlled SEO experiments that form the basis of all our case studies, then you might find it useful to start by reading the explanation at the end of this article before digesting the details of the case study below. If you'd like to get a new case study by email every two weeks, just enter your email address here.

For this week's #SPQuiz, we asked our followers about one of the hottest topics in content strategy. Some SEOs have said they've seen huge boosts to organic traffic by adding AI-generated content to their sites, but others believe the limitations of generative AI outweigh the benefits. We wanted to to know how our followers, who are interested in a data-driven approach to search engine optimization, feel about using AI-generated content on their sites.

Here’s what they thought:

Poll Results:

Screenshot 2024-08-29 at 08.55.01

The majority of voters on X/Twitter and LinkedIn thought that AI-generated content could have some benefit for SEO, but that it should be used with caution. As an data-driven testing platform, we definitely have an audience of SEOs who want to see any kind of new content tested thoroughly before deployment!

If you’re reading this, you’ve probably already heard plenty of hype about content generated by large language models–the latest in a long line of tools sold to the public as “artificial intelligence.” Some hyperbolic advisors in the world of search engine optimization have advocated for creating huge amounts of content for your site with AI-generated content, while other reporters are sure that all this content flooding onto the internet is going to make search engines functionally useless. 

At SearchPilot, we don’t think AI is likely to save or destroy the search industry as we know it, but we do think that strategically incorporating AI-generated content into your testing program can make a huge difference to your bottom line. 

The trick is making sure that you understand what large language models can and can’t do, so you can use AI-generated content in a way that benefits your traffic and your users’ on-page experiences without risking the long-term stability of your site’s rankings.

Where is the content created by AI actually coming from?

To make good judgement calls about which types of pages can benefit from AI-generated content, it’s important to start with a solid understanding of what large language models actually do. Large language models, or LLMs, use machine learning to perform natural language processing tasks. 

When you enter a prompt into ChatGPT or any other LLM, the response you get back is created by taking in a huge amount of available data to find patterns that lead to a natural-sounding output. This means that you can use a large language model to create amazingly life-like text that reads just like something a human might write. But it doesn’t mean that the LLM has a genuine human-like understanding of the prompt you entered or the output it created. The same is true of generative AI models that can produce images. Just check out the featured image for this blog post: The model produced a very cute picture of a robot in our brand colors with a rocket ship in the background, just like we asked it to do. But if you look closely at the image, you can see that the model has no inherent understanding of how rock formations work, how many stabilizer fins a rocket should have, or what shape a planet should be. Those discrepancies are acceptable for our purposes, since we only needed an image for a blog post about SEO testing, but we'd be in trouble if we were an aerospace engineering company using this image to show off our rocket designs.

This makes AI a great tool for refreshing or enhancing human-written content, turning boilerplate templates into unique content, or generating meta content for thousands of pages on a large site. With carefully considered prompts and the right training materials, an LLM can take a simple prompt and expand on it in your brand’s voice. 

But before any of that AI-generated content goes live on your website, it still needs to be reviewed by a team of human editors. Even an LMM trained specifically in your brand’s style and subject matter doesn’t understand your products or your company’s mission the way a human can. You also need editorial oversight to make sure that content produced by an AI isn’t making false promises about your product, describing a competing product instead of your own, or otherwise “hallucinating” (a term for the tendency of these models to accidentally introduce falsehoods into their output because some of the data they were trained on included errors, jokes not intended to be taken literally, or content that the model has failed to place in its proper context).

Successful SEO testing with AI-generated content

We’ve already seen a variety of positive results from clients testing AI-generated content. We’ve even had the opportunity to write one of our successful tests up as a case study. One of our customers in the travel industry saw a predicted 12.6% uplift in organic traffic by adding AI-generated content to their pages. That’s a convincing signal that Google can consider content created by large language models to be valuable and relevant to your pages.

There’s more than one way to test content created by generative AI. We recommend incorporating a variety of tests using AI-generated content into your SEO testing program. Here are a few examples of the kinds of tests you could run using content created by LLMs and other forms of generative AI.

Title tag tests

When we start looking at a roadmap for testing with a new Searchpilot client, we always recommend running at least one title tag test early on in the process. Google considers the content of title tags to be an important signal about the purpose of pages, helping the algorithm determine which search queries pages should rank well for. When Google respects a page’s title tags in the search results, enticing title tag content can also have a big impact on how many users click through to a page from search results. Testing AI-generated content in title tags is a great way to find a low-effort way to make a big difference in your organic traffic.

Meta description tests

While meta descriptions don’t have a direct impact on rankings, they’re an important piece of information Google can use to determine which description Google should show in the search results for a page. Writing unique meta descriptions for thousands of pages can eat up a large amount of available copywriting resources, so this is a great candidate for outsourcing to AI with editorial oversight. 

To test whether variant AI-generated meta description can increase your click-through rates, we sometimes recommend using the data nosnippet attribute to encourage Google to show the meta description in search results instead of pulling from other page content. For other tests, changing meta descriptions without using data nosnippet can give you a more holistic picture of whether Google is actually respecting your meta descriptions.

Page content tests

When you have thousands or even tens of thousands of pages on a large site, it can be a big project to create enough useful content for all of them. AI can help you generate unique content and turn boilerplate templates into more appealing descriptions. It can also help you refresh old, stale content.

While AI can help you with content generation, it doesn’t know your products better than you do. Any tests of AI-generated content that will be seen on your website’s pages need close attention from human editors to make sure that you’re not putting inaccurate or incomplete content where users can see it.

Image alt text tests

Google values alt text on images, both as an accessibility feature and as a way for its algorithm to help determine the content of an image and its relationship to other content on the page. Writing sufficiently descriptive alt text for thousands of images can be a heavy load on a copywriting team, so an AI trained in generating image descriptions can help you speed up this process. 

Be aware, however, that AI image recognition is an emerging field and the output of these programs needs to be checked by editors. If your pictures are specific and important to your brand, you may need more than what an image recognition AI can provide; for example, the AI may successfully describe a picture of a woman wearing a red coat, but fail to recognize that the coat has a specific style and brand name associated with it.

Text translation tests

Expanding into new markets can be a challenge when you need to translate millions of words into a new language. Translation is a slightly different field from generative AI, but modern translation programs use machine learning to translate words in the context of phrases rather than attempting to find a match for each word in a sentence individually. Over the last decade, this has allowed for stunning breakthroughs in the speed and accuracy of translations and emerging technologies like text recognition and translation in images.

Adding translated content to pages in non-English speaking markets can result in big wins for organic traffic and conversion rates. As with all content tests, we recommend having a team of human editors check the content you’re adding before it goes live, since you want to make sure that all of your content is accurate and matches your brand voice in every language.

Image tests

Image generation is a rapidly evolving field in the wider world of artificial intelligence. While early models could only produce surreal, dreamlike pictures, powerhouses like DALL-E, MidJourney, and Stable Diffusion can now create crisp and engaging images that look just like real photos. If you’re looking closely at your conversion rates, trying out AI-generated pictures as featured images or header images may produce a better on-page experience than generic stock photos. Your graphic design team may also be able to test new image templates quickly using AI-assisted workflow tools like generative fill, spot healing, and image composites.

We’d recommend caution around using AI in product photos. Touching up photos with AI tools can produce cleaner images, but scammers’ use of AI to trick people into buying low-quality products based on high-quality images has made users suspicious of certain kinds of ecommerce product listings. Try testing a variety of images, including pictures of models actively using your products in lifestyle photographs.

How to set up a robust testing program for your AI-generated content

Do more than just copying the competition

When you’re competing for a high-value keyword, it’s tempting to look at what your competitors are doing so you can create content with the same structure. But just because Google has chosen a particular page as the best available result for a user’s search query doesn’t mean it’s the best possible result anyone could create.

Instead of relying on AI to copy what the competition is doing, think about the kind of content you believe Google is looking for, and be prepared to put those beliefs to the test with a data-driven approach. Would an article that gets straight to the point be better for users than one with a meandering lede? Is there a way to make your headings clearer and more informative? Would users like to see an overview of the subject or a detailed step-by-step breakdown? Thinking clearly about how you want to prompt your LLM instead of asking it to rehash existing content can help your testing program start off on the right track.

Go beyond chasing keywords

Yes, everyone wants to rank better for high-value keywords. But if the content on your pages isn’t actually what users want to see, they won’t stick around on your site. Even if you temporarily win a coveted ranking, you’re at risk of losing it again when someone produces the kind of content that users actually want to see.

Think about your SEO testing program as only one part of a holistic approach to your site’s health. We recommend full funnel testing to see how users’ experience on your site is impacted by the AI-generated content you’re adding to your pages. Do they stick around longer when there’s more to read, or do they bounce when they hit a wall of text? Is all that extra content encouraging them to convert, or is it so overwhelming that they leave without completing any goals?

Plan on more than one type of test

When you have a really great test idea for fresh content that your whole team loves, it’s tempting to try it over and over until you get winning results. It’s true that in SEO, sometimes finding just the right format or phrasing for something as simple as a title tag test can result in a huge increase in organic traffic. 

In our experience, testing the same element over and over until it’s “perfect” leads to diminishing returns. Instead, look at all the elements on your site that you can test holistically, and incorporate your content tests into a testing program that also includes schema, meta data, internal linking, canonicals, and other types of tests.

Make sure you cool down between tests

Now that you have access to these fantastic text generation tools, it’s tempting to try to test every possible iteration of new content as quickly as possible. Although Google hasn’t released exact guidelines on what it considers to be a spammy use of AI-generated content, recent core updates have penalized sites that added large amounts of content quickly in a way that could be perceived as an attempt to manipulate rankings.

To make sure you get the most out of your testing program, we recommend at least a one-week cooldown between tests in the same site section to make sure that you’re using the most accurate data about how your pages are performing without interventions. As we said above, it’s also a good idea to rotate between test types instead of trying to test adding different versions of the same content element one after another.

Be willing to consider more than content when testing page elements

Sometimes, a negative content test isn’t about the content at all. Putting new content on a page sometimes means you’ve built an entirely new element, shifted over elements further down the page, or replaced other content that users found helpful on the page.

If you see unexpected results from your content tests, consider testing that content again in a different format or a different location on the page after a sufficient cool-down period. You may find out that the content itself has a positive impact on your rankings, and the first round of results was actually a signal that you’d changed the page experience or displaced other content on the page that Google considered more relevant.

Make sure you have editorial oversight of your AI-generated content

Large language models are a great way to improve the workflows of your content team, but they’re not a replacement for the whole department. Google, and your users, have to take your word for it that the content you put on your pages about your business is accurate and up-to-date. We recommend having your team run quality checks on any content that’s going to be live on your site; don’t assume that a large language model understands your business better than you do.

Be prepared to retest

As generative AI evolves and black hat SEOs find ways to exploit cheap tools to temporarily win rankings, Google is constantly updating its algorithm to make sure that users can get to the pages they intended to search for instead of landing on unhelpful, spammy sites. At the same time, companies like OpenAI are refining the models they use to generate text and incorporating more content from all over the internet into their training data. And while all this is happening, your competitors will also be testing AI-generated content on their own sites to see if they can pull ahead in rankings for valuable keywords.

This means that the way Google evaluates content created by large language models is in flux, the content those models can create is changing rapidly, and your competitors can add content to their sites faster than ever before. Don’t assume that the test that was a winner six months ago is still guaranteed to boost your rankings today. Be willing to go back and retest winning results to make sure you’re still sending the right signals to Google about which keywords you should rank for.

What are the potential pitfalls of using AI to generate content for your testing program?

If you need to generate a lot of text quickly, a large language model can help your copywriting team get work done fast. But just because you can get content finished quickly doesn’t mean that immediately testing all that content at once is a great idea.

1. Don’t assume that “more is better” when it comes to site content. 

Avoid tests that will cause a sudden massive increase in content across many pages. While Google does not automatically penalize all AI-generated content, recent updates have discouraged sites from suddenly putting up huge amounts of new content at once in a way that may appear spammy. 

Sometimes, you can even improve your rankings by removing content if that content is unhelpful, keyword-stuffed, stale, or irrelevant. Make sure that you test removing content as well as adding it to get a full picture of what’s actually valuable on your pages.

2. Don’t use AI-generated content for every part of your site.

Be cautious when using AI-generated content in any part of your site that requires legal language or makes a legally binding promise to your customers. Large language models aren’t experts in your product or in the law. The writing they produce might look convincing to users, but it’s on your team to check to make sure that it’s actually correct. 

We’ve seen some recent high-profile legal cases about AI chatbots making promises about deals and discounts to customers that the company didn’t actually intend to offer, so be extra careful about any feature of your site that might generate text without human oversight.

You should also avoid tests that rely on being able to get topical content from large language models immediately after news breaks. Even the most cutting-edge generative AI companies will not be able to report on breaking news stories, because the data they’re working with stops before the present day’s news was published. While LLMs can produce convincing articles that may look like news stories, they’re not actually reporting on current events.

3. Avoid tests that add content to multiple elements at the same time.

Remember that the best A/B testing methodology involves changing just one element at one time to measure the impact. Instead of changing up all the content on a page all at once, test content elements individually to make sure that the changes you’re making to a page are actually producing meaningful results.

4. Don’t look at your SEO test results in isolation.

Even if you see some positive SEO tests, don’t just assume that your AI-generated content is resulting in increased revenue for your business. We recommend incorporating your content tests into a holistic testing program that includes conversion rate optimization as well as SEO testing. You don’t want to put yourself in a situation where you’re seeing SEO wins from adding large content blocks to your pages without considering the page experience’s impact on your users. We recommend using our full funnel testing methodology to see your test’s impact on organic traffic and user behavior.

5. Don’t get too attached to your new content before you test it.

Don’t get too attached to new content before you see positive test results, even if your team loves it. The purpose of SEO testing is to get objective data on whether making a change to a set of pages will impact organic traffic. That means that we always have to be prepared for unexpected results, including negative results for tests that we thought would be winners.

6. Don’t assume that a negative result means the new content is worse.

Consider that the full impact of adding content to a page may go beyond the actual text. Adding a new content block that slows down the page, results in layout shifts on page load, interferes with users’ conversions, or is hidden from view on page load may not produce the results that you’re expecting.

If you’re not sure what produced an unexpected result in your SEO testing, consider retesting the same content in a different page element or a different location. You may find that the content itself is great, but something else about its presentation on the page caused a rankings drop.

By strategically testing AI-generated content and avoiding common pitfalls, you can enhance your site’s SEO performance. Always remember to prioritize user experience and maintain rigorous editorial oversight to safeguard your brand’s reputation.

How our SEO split tests work

The most important thing to know is that our case studies are based on controlled experiments with control and variant pages:

  • By detecting changes in performance of the variant pages compared to the control, we know that the measured effect was not caused by seasonality, sitewide changes, Google algorithm updates, competitor changes, or any other external impact.
  • The statistical analysis compares the actual outcome to a forecast, and comes with a confidence interval so we know how certain we are the effect is real.
  • We measure the impact on organic traffic in order to capture changes to rankings and/or changes to clickthrough rate (more here).

Read more about how SEO A/B testing works or get a demo of the SearchPilot platform.

Sign up to receive the results of two of our most surprising SEO experiments every month