#65: Risks of Consumer-Facing AI
When AI fails the consumer, brand trust is eroded
Ever asked ChatGPT a question and gotten an answer you know is wrong? Maybe the reasoning is off, or it cites an event that’s clearly made up. It happens occasionally. Thankfully, the frequency of wrong answers has gone down over time, and I can now more reliably depend on a prompt response. Still, I operate under the mindset of “trust, but verify.”
Foundational model companies like OpenAI, Google, and Anthropic are investing billions of dollars in developing large language models (LLMs) that simulate human reasoning, and it will take billions more to fine-tune them for greater accuracy. Now, why do people give ChatGPT a pass when it’s wrong? My guess is that the technology has evolved so quickly that users are willing to give it a grace period.
The fact that AI can still arrive at the wrong answer creates opportunities for improvement. Many companies are addressing this by adding more accurate and relevant data on top of foundational models. This is happening within the vertical application layer, which sits between the foundational model and the end user. Vertical layer companies leverage a model like OpenAI’s GPT-5 or Anthropic’s Claude Opus 4.1 as the core knowledge base, then add proprietary data through a process called retrieval-augmented generation (RAG) to improve accuracy.
Even with RAG and other fine-tuning techniques, implementing AI into company workflows or customer journeys carries risk. The level of risk depends on whether the company serves businesses (B2B) or consumers (direct-to-consumer, or DTC). Accuracy is paramount; it matters less when experts can interpret results, but much more when consumers can’t.
Let’s start with B2B companies. Say you’re developing AI for finance or law, like Hebbia or Harvey. For a banker or lawyer, relying on ChatGPT’s or Gemini’s output can be risky if you aren’t sure the answer is correct. Clients aren’t paying for you to be only 80% right.
Hebbia and Harvey are B2B software companies that have invested hundreds of millions to help finance and law professionals bridge the gap between 80% accuracy and 100%. They can’t yet guarantee perfect accuracy, but that’s acceptable as the end user is a skilled professional who can validate the AI’s output.
But what happens when the end user isn’t a professional? If there’s a mistake in the output, or even the prompt itself, what are the consequences of an incorrect answer?
DTC brands have lower tolerance for error when implementing AI into their customer journey because consumers usually don’t have the expertise to vet the credibility of an AI response. Their customers trust the brand to provide accurate and relevant information that guides their buying decision. This applies whether AI is part of the buying process or not. For instance, I filled out a questionnaire from Gainful and, based on their recommendations, purchased a specific protein powder. I relied on the brand’s guidance rather than deep personal knowledge. In case you are wondering, I’m happy with my purchase.
It gets riskier when DTC brands integrate AI into recommendation engines. Take La Roche-Posay’s personal skin analysis product, MyRoutine AI. You upload a photo, answer a few questions about your age and skin type, and receive a personalized skincare routine made up of their products.
If you read the fine print, La Roche-Posay uses an algorithm to analyze facial features and additional algorithms to estimate age, ethnicity, and signs of skin aging. At a high level, the company combines a foundational model with specialized skincare data to deliver a routine it claims is 95% accurate. That’s impressive, but not perfect, and users should still sense-check the results. This level of accuracy is possible because La Roche-Posay is backed by L’Oréal’s massive engineering and capital resources.
Most customers won’t read the Terms of Service. They simply trust that the brand’s AI recommendation engine works. For smaller DTC brands, this is risky. Without the technical expertise or infrastructure to build a reliable application layer, accuracy often suffers. These quizzes or analyses aren’t foolproof.
Clients trust bankers and lawyers for guidance; consumers trust brands for the same. In B2B, there’s a skilled professional between the AI and the customer. In DTC, that buffer doesn’t exist. B2B has an extra line of defense; DTC does not.
Whether brands realize it or not, AI doesn’t just support the brand: it becomes part of it. Most brands are not technically qualified to develop or represent AI tools. When customers receive poor recommendations, trust erodes, and they may switch to competitors. It’s not natural for a consumer to verify a brand’s guidance, nor should it be their responsibility.
If ChatGPT recommends a serum that doesn’t work, you might shrug it off, thinking, “It’s ChatGPT, it sometimes gets it wrong.” Tools get more slack than brands when it comes to wrong answers because it’s their business to move quickly and build innovative technology. Some screws are bound to fall off in the process. When AI tools fail, we shrug. When brands fail, we are likely to switch.

