Beyond the hype: building LLM applications for production

It’s been nearly four months since we launched the first LLM-based feature in Talkdesk: call summarization. An evident first candidate given the relatively simple nature of the use case and the expected impact on agent productivity. While beta testing this new feature with a few selected customers, we moved on to a few other use cases, namely topic extraction, to provide the customer with a list of topics discussed in their calls or chats; and question answering, to support agents handling customer queries.

Many other use cases are already lined up, from automated agent evaluation to message writing helpers. But as a product manager’s mind navigates the sea of possibilities, it’s also important to take a step back and think about the lessons we’re learning along the way.

The ambiguity trap – how users write instructions

I’ve seen many posts arguing that LLMs will revolutionize user experience for the better. With users being able to express their intent using their own language, we can eliminate the learning curve when interacting with an application. I understand this claim, but I think it is somehow naive to think that people are able to always express themselves clearly and concisely. Just think of how many times you, as a human, have received instructions from other humans that were unclear or incomplete. As I read in another post about the challenges of LLMs – just doing what someone asks for isn’t always the right thing. Designing interfaces where user input is in natural language may increase accessibility, but will likely reduce the quality of the output, therefore increasing frustration.

We’ve seen this happen with one particular tool that we made available, where users can instruct the system to generate model training phrases for them by describing a specific user intent in text. This used to be a daunting manual task, so automation is welcomed. However, past the initial excitement, some users started asking the hard questions: what are the best practices to instruct the model for phrase generation? What sort of information needs to be included in the description? Ambiguity generates anxiety, and we don’t want to transform users into prompt engineers. Interacting with a clean UI with affordance can have much lower cognitive load than writing a thorough description of the outcome you’re looking for. This article by Davis Treybig is great and describes all of these design challenges in detail.

The ambiguity trap II  – how LLMs respond to instructions

Downstream applications relying on LLMs expect outputs that conform to a particular structure for effective parsing. While we can tailor our prompts to explicitly outline the desired output format, this is never guaranteed. We are seeing this with question answering. We prompt the model not to return an answer when it isn’t sure of one, but as a prolific chatter, sometimes it will still respond with an “I don’t know” type of answer. How do we deal with this in the UI? The application’s frontend is not able to tell the difference between this answer and a “valid” one, with meaningful content.

Despite this being a nuisance, to someone using a helper tool there’s one thing that is worse than not getting an answer, which is getting a wrong one. This is not a new problem as poor search engines can also produce noisy, non-relevant results with high confidence. But dealing with hallucination is a whole different challenge. Conservative prompt engineering can be effective at tackling this problem, but in the end we can never be 100% sure that the model will comply with instructions.

It’s still unclear to me how we can deal with the lack of predictability. For now, the only feasible option seems to be through prompt engineering. We need to make it a systematic task, with version control – essentially yet another function that needs to be integrated in the product development lifecycle. 

Working with context is hard

Building a question answering solution for business requires informing the LLM of the knowledge context of each customer. Answers need to be strictly based on company vetted data. However, LLMs have context windows, that is, they limit the amount of tokens the model can access when generating responses. Some businesses can have hundreds of thousands of documents with relevant information. We use embeddings to measure content relatedness and select the correct data snippets – ultimately, the success of this operation will dictate the overall success of the feature. It’s just good old search – if that doesn’t work, LLMs won’t save you when it comes to knowledge retrieval.

Cost estimation: mission impossible

The more context you give the model, the better the performance – or at least that’s what we hope for. However, this will also increase latency… and costs – OpenAI charges for both input and output tokens. If we factor in the “natural” output unpredictability, customer variations when it comes to context and constant prompt improvements, making a sound prediction is a difficult task. Also, the LLM world is moving so fast that any prediction is bound to become outdated quickly – hopefully the trend is for cost to continue to go down as competition grows.

Conclusion

“It was the best of times, it was the worst of times” – said Dickens on LLMs. 

It’s impossible not to be excited with the quantum leap of conversational abilities by machines. You can have so much fun experimenting with it, and demos can be mind blowing. However if an incredible demo isn’t followed by actual usage – and more, by a productivity gain (or at least, a super pleasant user experience) – the result will sooner or later be churn. And the market seems to be moving fast from the initial excitement phase to converge in a handful of low risk use cases, particularly in the B2B space – as with any other new tech. Some people will argue that we cannot afford to be conservative – I argue that we cannot afford to ship products that will not solve the customer’s job. Be bold, but learn fast! 

Internet business: is privacy our biggest problem?

Big Brother is watching you, and he's bored

The good news about privacy is that eighty-four percent of us are concerned about privacy. The bad news is that we do not know what we mean.

Anne Wells Branscomb

The Great Migration

Last week we witnessed a modern age boycott movement with millions of users replacing Whatsapp with Signal or Telegram in protest against Whatsapp’s privacy policy changes, which people believed would allow data sharing with its parent company, Facebook. A few days later, the media clarified that not only was the update related to something else entirely, but that data sharing with Facebook has been happening since 2016 for a majority of users.

This episode is the perfect illustration of Branscomb’s quote. The public is confused. People feel someone’s taking something from them and using it for profit. Not only is this true, but it’s pretty much the business model of the internet as we know it, and it has been happening for years.

The web: a broken business model?

If we hold this as unacceptable, we need to acknowledge that the whole thing is broken, and we need to start paying to use search engines, maps, social networks, email services, browsers, video streaming platforms, productivity apps and, last but not least, to access media websites, both news and entertainment. I can hear you in the background: what about open source? There are wonderful projects out there, but can we really imagine what the internet would look like today if it wasn’t for the money poured into tech companies by investors? Where would we be if the entire web was donation-based?

Almost 90% of Google’s revenue comes from advertising

Can advertising survive without our data?

So let’s assume for now that the advertising industry will keep financing most of our beloved services. Can’t they survive without our data?

They can – but not as efficiently. Segmenting the population using socio-demographic data and choosing target audiences is not something that was born with digital advertising – it’s what people in marketing and communication have been doing for ages. The difference now is that not only can they reach that target much more effectively, but they can also measure differences in the return of their investment when they fine-tune their audiences. So, in theory, everyone who owns a business can benefit from this.

What are we giving away, and to whom?

Going back to privacy. Where is the line crossed? Think of your browsing history from this morning. You read the news on ABC, went to Amazon, and checked your Gmail inbox. In the first two, you might or might not log into a user account. Even if you don’t, you will see a message asking for your consent to collect browsing information – although this will not include any Personal Identifiable Information (PII), such as your name, phone number or email address. The same isn’t true for your inbox, which necessarily requires login. However, although they own it, Google’s promise is that they do not sell your PII. Note that If you do all of this browsing using Chrome, then Google automatically knows all about your browsing history.

This, by the way, makes Chrome’s third-party cookie ban all the more ironic. All major browsers are banning third-party cookies: bits of information that are created by domains other than the one you are visiting (hence the “third-party” designation) and that are placed on your browser to collect information about your behaviour across websites. This information is usually stored in platforms called DMPs – Data Management Platforms – marketplaces where companies can buy audiences for their ads. The third-party cookie phase out reflects society’s growing concerns about their privacy: when you enter a website and accept cookies, this should mean that that the company behind that particular website may use your data for specific purposes (stated in the consent declaration), rather than having it for sale in undisclosed ad platforms. However, what this means in practice is that segmentation and profiling power will be even more concentrated in the hands of the two giants: Google and Facebook.

And here’s where I come to my conclusion. Yes, it’s annoying to see companies profiting from their knowledge of what we do online, particularly when their ways to inform us and obtain our consent are all but transparent. However none of us seems predisposed to start paying for their services instead. Furthermore, as far as we know, our PII is still protected. What frightens me is not the business model, but the size of these data monopolies, how much power they hold, and how they can use it to influence policy-making to their own advantage – and this may include privacy regulation.

It’s scary that, as citizens, rather than demanding our governments for more regulation and control over competition, taxation and privacy, we just blame company executives or company policy – as if companies were created to safeguard social ethics rather than maximising profit. I accept there is no such thing as a free lunch, but I don’t accept the world to be ruled by a handful of CEOs.