Prompt-based experimentation is already redundant, so where's the real AI value?

In my humble opinion, prompt based experimentation is redundant already.

This was an uncomfortable one to write, so it probably won't be fun to read. But it's an important topic, so let's explore it in more detail.

On Linkedin across a series of 6 posts, I've been exploring Claude Code's Chrome Extension as an AI-driver for your web browser. And, specifically it's ability to:

- Write experiment code, based on minimal instructions.

- Interpret experiment results.

- Operate our UI.

Why did I explore this topic?

It costs around $20/mo per seat. And compared to what I see other platforms charging for their AI services, this is peanuts.

It's also an agnostic platform - one of the best on the market today, but it's not "sandeep's random side project that only works on Tuesdays". It's built by an established company, and something most of you can get access to.

A lot of you can't write code, but aren't afraid to "handle" it. Many of you can't interpret difficult experiment results, but can verify what you're being told. And most of you, I daresay, can't be bothered to spend an hour doing the same checkin every morning for your live tests. 

That's who I see a lot of these AI features being aimed at. Whether it's PBX, Opal, MCPs - we're helping the same category of user to "punch-up" or move faster. You’re not helping a Lead Dev make something that would normally take them 30 mins. It’s for the people who would manage to figure things out if they spent a day looking into it, plus help from ChatGPT.

What did I find?

My experiments were controlled, purposeful and I was looking for specific things. 

When it came to writing code, I wanted to know if it’d handle abstract problems, vague instructions, rough mockups - things that would form the base level of Project Plans from people who’ve used the platform over the years.

On every account, it was great. At times, showing me up.

One early challenge surprised me in particular - I got it to change a PLP layout from 4 products across to 5. I’d asked Claude to do this 6 months ago, and it managed with around 20 lines of CSS which I thought was reasonable. This time round - it was 3.

Its ability to interact with the page as part of the transformations shows it’s not just there for basic changes either. “When I click this link, go and open up the filters, find the relevant one and make sure it’s selected” isn’t a trivial change. One that’s beyond the capability of most if not all Visual Editors.

A feature which I think is where it’s biggest value comes from, was its ability to take screenshots, reason over them, and then continue to poke and probe, which I saw repeatedly and was excellent.

UI Homepage

But actually, the bit of its functionality that really surprised me was its ability to use the console. Apply some JS, get console logs back out, reason over them and do it over and over again to success. “It should be blue, let me go inspect the site and figure out what a good shade of blue will be” - these are things that developers are rarely told, and we spend time figuring out. And here - it did it all on autopilot.

To summarise - every test, whether it was writing code for XS to M sized tests, interpreting challenging results, or operating our UI, was an overwhelming success.

This puts all platforms in a weird place

Everyone thinks their “AI Moat” is to build these proprietary features. Trained with data only they hold (except there’s a lot of us platforms in the market). And then charge people significant money to use them. I don’t just mean “those guys” - I’m talking about our platform too.

But when something (Claude Code) with no historical data of yours can still achieve the same result - where is their (and my) value? Claude is only going to get better, and probably at a rate faster than we ever will. 

Smaller platforms have the advantage of being able to iterate quickly for you, and handle really advanced use-cases. Claude requires you to build your own "Skills" of internal knowledge and if it doesn't work - too bad.

There is perhaps a niche for some agency team to become known for writing you bespoke Claude Skills - if anyone wants to take that idea and gets paid, you owe me a pizza. 

But for bigger platforms, or those that can’t iterate at the drop of a hat, where’s the unique selling point? I’m really not sure right now.

So where can we provide value?

I don’t think it’s reasonable to say “my prompt engine is better than yours” anymore. They’re all redundant, and Claude Code is just as good as anything else I’ve seen. Even with intentionally vague instructions and zero “Skills”, it performs wonderfully.

The approach of Webtrends Optimize was, and still is, to provide you with access to tooling, models and pipelines that you can't handle through a chatbot. And from everything I've seen, this is still valuable.

Predictive heatmaps, trained on eye-tracking studies, is not something you can operate from a chat interface. It's an actual service we run.

Visually Similar Product Recommendations gives you access to AI embedding models and databases that most teams aren't capable of running themselves.

AI Survey analysis - same deal. We glue together a bunch of models, and create a solution that scales to millions of responses at low cost. You'd definitely hit API limits in Claude trying to do even 1/100th of that.

Peer reviews, and understanding what proactive support actually means.

Presenting experiment results back to stakeholders, and teaching you what good looks like from a team that's been doing this for 15+ years.

In a nutshell, AI with purpose.

There is no value in building the best chatbot, when the alternative is cheap and great. It’s a bubble that only exists from a currently new market, but as people get smarter that won’t last. There is, however, value in infrastructure, pipelines, and getting tooling into your hands. This is something I don’t yet see any other Experimentation platform orient themselves to provide.

This is good for (almost) everyone

You want to get more experiments live, analysed and concluded each year. Velocity is a perfectly reasonable success metric for your programmes. Having an AI click about, figure things out and work on autopilot undoubtedly means you're churning through more experiments each year.

Specialists want to do valuable work. But are often stuck babysitting the next campaign, tiny UX tweaks etc. - things that don't require much skill, but are still important enough that it needs to be done. If the AI can do 70% of this basic work for you, you're actually able to demonstrate experience and prove your worth with valuable, transformative work.

It's scary, but I also don't ever want to build button tests for anyone after 14+ years and 10,000s of hours of doing this. That’s your job now, or the AI’s job for you.

If you launch twice as many tests per year, find twice as many winners, and get twice as much value from our platform - this feels like a win.

The real question - Sovereign AI?

Throughout all of this, I was acutely aware of just how much data was being fed into Claude. Every experiment run, the results, my test ideas - to get it to work for me, it needs to know all of these things. And that’s very uncomfortable for me.

Be it tooling you'll find from Optimizely which uses Gemini (and soon, Claude) or Kameleoon and Statsig who use OpenAI for their PBX, or VWO/AB Tasty who likely use Gemini and OpenAI - everyone’s use raises massive security concerns for me.

You're running programmes to patch over vulnerable parts of your site; you're getting survey responses which could be hugely negative, and this is all feeding a model you have no control over. ChatGPT had data leaks in 2023, and Deepseek did in 2025. “It’ll never happen to me” is only accurate until it happens to you.

It might not be obvious either, but when your competitor fires up a chat and asks "what do customers raise as the biggest concerns with [your company]?" it might have answers in the future.

"And what are they trying to do to address it?" - they might have answers in the future.

Our answer to this is straightforward. Use models that we host and run, and so your data doesn't leave our infrastructure. And it should go without saying, but don’t use it to train anything.

We’ve been doing this since 2022 with our first batch of Product Recommendations. We do this again today with AI Predictive Heatmaps and AI Survey Analysis. And it’s something we take seriously for every other feature we build - if self-hosted models do the job well, we don’t want to send data outwards.

Two routes from here

If you’re going to use tech from a vendor that wraps Claude/Gemini/OpenAI, you might as well use Claude Code. It’s faster, flexible, and considerably cheaper to run especially the more you do. 

If you care about your data and don't want to leak all your problems to the world (bit dramatic - sorry), use a who takes Sovereign AI seriously. And at the minute, Webtrends Optimize appears to be the only one.

UI Homepage