Focal

Transcript

Aman Singh [00:00:00]:In Contextual, what we want to give is a specialized model for everyone, for every use case.

Daniel Darling [00:00:16]:Welcome to the Five Year Frontier Podcast, a preview of the future through the eyes of the innovators shaping our world. Through short impact discussions, I seek to bring you a glimpse of what a key industry could look like five years out. I'm your host Daniel Darling, a venture capitalist at Focal, where I spend my days with founders at the very start of their journey to transform an industry. The best have a distinct vision of what's to come, a guiding North Star they're building towards, and that's what I'm here to share with you. Today's episode is about the future of AI in the enterprise. We cover building specialized agents, eliminating hallucinations, creating AI that remembers, cloning your company experts, the emergence of thin AI corporations and the future of work. Guiding us will be Aman Singh, co-founder and CTO of Contextual AI and a company helping enterprises deploy AI and agents with accuracy, control and safety. Founded by the team that pioneered Retrieval-augmented generation or RAG, Contextual is building the next generation AI that is deeply specialized to each business, answering with the context of your organization, not just the Internet corpus.

Daniel Darling [00:01:21]:With customers like Qualcomm and over 100 million raised from Greycroft, Lightspeed, Bain Capital Ventures and Nvidia, Contextual is fast becoming one of the most important startups in the enterprise AI landscape. Aman leads the company's technology development and brings a deep background in research engineering. He previously worked at Hugging Face, a leading force in open source AI, and at Meta's FAIR lab where he contributed to the development of RAG and cutting edge multimodal models. He holds a Master's in Computer Science from New York University where he focused on the intersection of language and vision.

Daniel Darling [00:01:56]:Aman, welcome to the show. Nice to see you.

Aman Singh [00:01:58]:Nice to see you as well. Thank you for having me.

Daniel Darling [00:02:01]:Today's large language models are incredibly powerful, right? But they're sometimes a little too brittle.

When deployed into the enterprise setting. And often that's because AI lacks this context or sufficient context around its deployment to answer reliably, can you explain where AI is struggling when deployed in the enterprise today?

Aman Singh [00:02:22]:So this is exactly the observation we had when we started Contextual AI, that today's AI is really good at consumer based use cases where there is no context required and people can just ask any questions that they generally would ask Google Search or something. But once you go into enterprises they have very specific domain knowledge, special culture, specific way of working. A lot of the things that model would otherwise wouldn't know because it's deployed. Think of like joining a new company in the moment without knowing anything about the company. So you learn all of this as you go through the notion, you go through the wikis, you go through a slack, you learn like okay, this is how this company works and you build a knowledge and then you adjust according to that. So this is where I think language models when deployed in enterprises without any context today will struggle. It's very hard for model to understand unless you give it access to the data in a return form. So this means like in enterprises these models will hallucinate without the right context or the right data in the context itself.

Daniel Darling [00:03:23]:Understood. And maybe can you just share a little bit about how then you solve for that problem.

Aman Singh [00:03:27]:A lot of people are fixing this by adding human in the loop by like prompting it and hard coding it to specific scenarios. But when you're talking about millions of documents, millions of use cases, you can't hard code all of this stuff into the context. So what you want is an agent that learns with you as you use it more and more. Our case we rely on Retrieval-augmented generation technology RAG, which is what my co-founder founded when he was at Meta. And we allow it to put the context in the generation. But then further we allow users to tune this whole system as they use it more and more and give feedback on what they like versus what they don't like. So once the model understands, okay, didn't like the user, didn't like this answer, what is the reasoning? So it can figure those stuff out and learn tribal knowledge in the process as well. So a better way to think about it, this system or the agent grows with you as you use it more and more and learns more about you.

Aman Singh [00:04:21]:And it keeps on adjusting according to your preferences. Your user behavior changes. And as your document changes, you basically retrieve content which is most relevant to the query in the picture by using state-of-the-art technology. And we released one of those yesterday as a product launch, the reranker and you retrieve them, you find the right precise content that is relevant to the current query, you pass it to the language model and then you learn through the preferences that you achieve from the users as you go through the journey.

Daniel Darling [00:04:51]:Can you just share a little bit about more about this Retrieval-augmented generation RAG given that was a key breakthrough that happened a couple of years ago and your co-founders you said was core to that, how is that really enabled for this more contextualized learning?

Aman Singh [00:05:06]:Yeah, so at Contextual we actually use something called RAG 2.0, which takes RAG, the original RAG paradigm to the next level. So how does RAG paradigm work? So RAG or like any agent in general has to get two things like one is the LLM or the foundation model and one is the environment that it is working on. Right? So environment has all of the data of the enterprises. So LLM doesn't know anything about this environment. Think of you joining new company. You are the foundation model and company is the environment. So this LLM or the foundation model now needs to explore this environment to understand it better. This is where retrieval part of the RAG comes in.

Aman Singh [00:05:45]:Where you are given a query, you can retrieve or search for the most relevant information from the environment. And that's what our tools enable it to do. Think of like search engine that gives you like most relevant information, then you rerank it based on the preference you have and then you give back the most relevant information to the language model. So language model along with the context of the company, whatever the use cases and the retrieved content can make now a reasoned informed choice on top that this is the answer relevant to the query. This is how RAG works. So this is how we optimize LLM and environment together and they grow with each other together as you continue using it more and more.

Daniel Darling [00:06:25]:And is this RAG innovation kind of like a stopgap measure along the way to some other kind of core future of how you see it all working going forward? Or do you think that this will be a central part to how LLMs operate in the enterprise in the next sort of couple of years?

Aman Singh [00:06:40]:RAG is a crucial and very important part to deploy any AI agent into production or enterprise. Why? Because one way is you throw all of the data of enterprise onto the LLM and then you train it. But then you will lose all of the role-based access controls, which means that everybody will see everybody's data. So you have to give this data in context at certain point. This is where RAG helps you. So regardless, RAG is taking various forms, right? Like agents are now like using web search to basically get the relevant data back, or like calling certain tools, writing Python scripts. But they're all giving context back to the model. So RAG as a paradigm still stays it just keeps on taking different shapes.

Aman Singh [00:07:27]:So I think RAG becomes like a tool for LLMs toolbox which it can help, which can help the agent explore the environment much better. So when LLM is thrown a query, it can search for something. If it didn't find the answer in the retrievals, it can search for something else. You have this query when you search for Google and you find your right results, you change the query, add some specific keyword to it and then you start. That's a creative process we do. Then we go through each of the link one by one by reading it and then seeing whether this is relevant to me or not. Then we continue going into next sections. This is what agent can do automatically and learn.

Aman Singh [00:08:07]:You don't have to ever do a Google web search. Agent can do it it for you. But these are getting more and more powerful, right? They can browse your local file system, they can browse the APIs, they can browse your Google Drive, they can do bunch of stuff. So this all at the core is powered by RAG, which is Retrieval-augmented generation. You're augmenting the generation using the retriever. This paradigm is very general in nature. It keeps on evolving over time.

Daniel Darling [00:08:30]:One of the other ones that you've introduced as a company is the concept of grounded language models or GLMs, which obviously has RAG at its center as well. But maybe can you give us a little bit of what is that unlocking for you and your customers, that innovation?

Aman Singh [00:08:45]:Yeah, great question. So groundedness basically means given this information and this query, can you basically rely on the information only to answer the question? So you're not using your parametric or world knowledge to answer the question because for enterprises, they don't want to want the model to say something about some entity that is not in the context. They want it to rely on the documentation or the materials they have created over time. For example, I don't know how much you use ChatGPT in detail, but like it has its own viewpoints of the world, right? Own worldview, like this is good, this is bad. But then it's trying to be very helpful to answer your query. So enterprises don't necessarily want that. They want the answer to be precise, concise and factual. By what that we mean is like if the answer is in the query, in the, in the information, answer it.

Aman Singh [00:09:40]:Otherwise don't answer it at all. Because if it can't answer it, then they will involve a human in the loop. But there's no risk of giving wrong information because in enterprises the stakes are higher, the error tolerance is low. For example, for a coding agent, if it gives you wrong code, you can just ignore it, right? But in financial world, if you make wrong decision on a transaction, money is lost, stakes are higher, right? So in that kind of scenario you don't want to basically rely on ambiguous knowledge from the language model that might be outdated also. So you want reliable information grounded in the facts from today. And that's what GLM does for you. GLM, the grounded language model, takes the knowledge that you have provided and only answers the query if it can answer based on the facts in the information. Otherwise it says I can't answer your query.

Daniel Darling [00:10:31]:And that's something that I guess we don't really appreciate as consumers with LLMs where they will answer your question regardless of if they actually have the knowledge to answer that question or not. And you're saying they should just come back and say okay, sorry, I do not know based on the data set that we have from here and essentially eliminate the hallucinations altogether.

Aman Singh [00:10:50]:I think it depends on the use case. In consumer use case they're trying to be helpful, so they're trying to give you some answer regardless so that you don't have to ask the query again. Right. So this is just how they're trained. They're trained to be very helpful to human beings. But in enterprises where you're using these models to make critical decisions, you can't have ambiguity there. You need to ground into the context that Enterprises has provided.

Daniel Darling [00:11:14]:And that release that you said, reranker is a really fascinating concept too. Do you anticipate essentially a value tree of responses coming back where you have the experts in your organization being ranked higher than the documents within your organization being ranked after. And then more generalist knowledge that the LLMs have as part of their worldview coming in at the end. Or how does it all start to map out going forward? If you had to really start to rank the types of quality of the data and the quality of the expertise that's coming back in the results.

Aman Singh [00:11:50]:That's what exactly the reranker has been trained for actually. So retrieval or search engine, for example Google will give you back results like hundred results, right. But you have to find this needle in the haystack now, what is the right information. So reranker helps you sort of like sort these results based on what is most relevant to the current query. So the general information general problem with enterprises is you don't know beforehand what is relevant. So but a user with our reranker can specify this is what is more relevant than the other things. Like you said, like these internal docs are higher rank than this. CEO is always right.

Aman Singh [00:12:26]:Like Jensen Huang's notes are more important than anybody else in Nvidia. So that kind of like specifics are a tribal knowledge that you can induce in the model. So longer-term vision there. Like since you asked for it is that ultimately LLM or the foundation model will become the planner, right? So it will plan what it needs to do to execute a task. If it can't control the reranker with natural language instructions to tell it like this is how you should rank. Because this is what my plan is. By the nature of it, the ceiling is much lower in what it can do. But now the language model or the person using it gets control over specifying how to do the analysis.

Aman Singh [00:13:04]:Like how do you actually do that analysis? And because the context windows are still limited and if you're talking about millions of pages, you want it to be very specific when passed into the context. So this is what all the longer-term vision is that language models has these tools that it can use to do its job much better by specifying how to rank or do certain tasks. And that wasn't possible with rerankers before, but now it is possible for both the users building the RAG or agents applications in their production system where you can specify okay, rank GitHub logs higher than or rank PR comments higher than Slack messages, some kind of things. When you're debugging something, all those preferences naturally occur and we just want them to codify into the reranker and eventually the system will automatically learn. But we want LLM to give that power to have that control over these tools as well. Same for the humans using it.

Daniel Darling [00:14:00]:And speaking of the humans using it, how are you enabling or how do you foresee enabling them to really prompt the system in a very precise way to understand those kind of tribal knowledge and ranking systems internally to structure the queries where they're getting the most out of these systems and where do you start to see that being injected into the system?

Aman Singh [00:14:23]:So generally I think a better way to think about it is that LLM is the planner. Everything else is environment. So environment will not change. LLM has to adjust according to it. So because environment is generally fixed humans we can learn how to use LLM better. And that's what the current technology of LLM is. Or the AI foundation models where like if you know how to use ChatGPT, you are very productive. But if you don't know how to use ChatGPT, it will not be able to figure out what you want.

Aman Singh [00:14:52]:So I think the next generation of AI models will be able to understand the context that you are asking the question in and understand like what is the question underneath the question? So how will they do it? So they will be like the clarification mechanism if you use Deep Research, it does this very actively like it before it even starts the query. It has this knowledge gathering phase where it will ask you a lot of questions to clarify what exactly you are asking for. So I think models, if they see ambiguous queries, they should be able to ask clarification questions to do it better. The current generation model does this somewhat, but in reality there's a lot of ambiguity in when you ask it because you have some, some context in your mind. The model doesn't have it. But the point is once it clarifies and understands how you what, what you mean, it should be able to learn this context and reuse it next time you ask it. So that's how you end up with more personalized, better models that know what you want. And this is what we call specialization at Contextual AI, where like you can specialize the model to the context that matters to you.

Daniel Darling [00:15:56]:Yeah, and that's something that I think there's a lot of demand for this specialization because, you know, learn my working style, learn how I like to interrogate a question, what matters, what priorities I have. And that kind of starts to link into this notion of memory for AI, which is a big area of research. And most of AI feels like it just has amnesia today where it doesn't remember what you're doing, it doesn't remember how you like to ask questions from then each prompt is really independent. So how do we build AI models that have this more persistent memory and this sort of longer recall?

Aman Singh [00:16:34]:Yeah, so I think the current iteration of the models do this through long context modeling where you like keep on managing a knowledge module which is a knowledge you know about the user. But I would be surprised if we can fit your and my whole life into a context window because there's so much going on there. Right. So what you ideally want to do is like instill this knowledge into the parameters of the language model or have a separate agent that just knows about you and keeps on learning about you. There's an agnostic agent that doesn't know anything about it. And there's a model, there's an agent that keeps on learning about you more and more. So over time, as it learns more, how would it learn? It can fit it into context or it can fine-tune itself on the information that it knows about you. So I think in the short term we will see a lot of these long context implementation, but eventually it will become very costly because every human will have their own context window.

Aman Singh [00:17:30]:So you can't even cache it. Right. So you have to like build a new context window for everybody. So other way to think about it is that you basically create a knowledge corpus for the humans, right? Like for everyone. And then you retrieve from the knowledge corpus better based on the query that you're asking. But how do you maintain this knowledge corpus is also that agent's job. So either you tune the model, you put it in long context, or you create a knowledge corpus. Each of them have their own limitations.

Aman Singh [00:17:58]:So ultimately you want a way where like the models can learn with you and actually instill that information in the parameters itself. And for the things that are like not common across users, you will offload it to some knowledge base and you can retrieve it on demand. For example, I think a good way to think about it is like all of the books we read, we don't remember every single content of it, like every single word of it, but we remember like, sort of like the what's the main summary of the book? Right. But if I need to go back, I can search the book because I know this is the summary in my mind and this is where the book is. So think of it like a mind map where you keep the most important aspects in your memory, but if you need more detailed information, you can retrieve it on demand. As a human, we don't remember everything, we remember the most. It's like bookmarks. Like we remember the most important things and then we retrieve that information on demand as needed.

Daniel Darling [00:18:48]:And I think that's a really fascinating departure from where we are today with the current state of the art. When you talk about these agents that start to learn exactly your context, like who would be doing that? Is that the foundation models that provide an agent? Is it more the layers on top of that from Contextual within the enterprise? Is it every single individual consumer? How do you start to see that map out?

Aman Singh [00:19:11]:Yeah, so I think generally in Contextual, what we want to give is a specialized model for everyone, for every use case. So we achieve this through like a bunch of our proprietary technology as well as RAG 2.0. So there's a general planner that can plan for you, but there are aspects of the environment that you can model that the agent can learn to interact with. Right. So there is something else that is learning about the user and personalizing it for specific users. It can be a smaller weight model and there's a bigger model that basically does the whole thing, like a pre-trained model. So think of it like a multi-agent system where they can interact with each other, but the planner is a stronger planner which can generalize across tasks. And then there is a personalized module which helps you like get the personal information.

Aman Singh [00:19:55]:There is a knowledge module which helps you get the knowledge information from the enterprise. So you can think of like various people. It's like people are experts in different things. But the difference between agents and humans is that humans have limited lifespan and you can only get specialized in so many things. But with agents, you can deploy them on GPUs and you can scale them 100x so you can have 100 agents doing 100 different tasks and specializing in each of them. So the inherent scalability limitations of humans don't apply to agents. And they can specialize in every aspect that they care about as they keep on scaling more and more. So how I see it is like it's going to a mix of things.

Aman Singh [00:20:34]:It's going to be like mix of tools that represent the environment really well. The agents that learn really well about the particular use case and can handle like specific of like who has access to what. And there will be wrappers on top of it. And I think contextually as a platform extracts all of that and gives you like APIs to build your own agents on top of this.

Daniel Darling [00:20:57]:Let's put that in your context for your job as a CTO of an organization. How would you see that sort of agent for you or multiple agents for you, specializing in different domains? What does that kind of look like in the next couple of years?

Aman Singh [00:21:11]:If you think about my day job, right, like it's about like meeting with customers, like having like a context about each customer that I'm talking to. So I need to understand what use case, what we have talked with them in the past. What is their use case, what is their pain point? And then I, I have this summary in my mind that I have to capture myself. So what if I have one agent who can just like summarize this for me and give me the also fetch their LinkedIn information and give me like a contextualized information on what would be the right context for this. Then I have my reports, right, who are doing specific things. So what about like an agent that can keep me updated with any things I should be aware about or anywhere like one of my reports. One of the jobs as a founder is to unblock company at every step, right? So anything that is happening you want to unblock.

Aman Singh [00:21:56]:You want to make sure that the ship is sailing, the engine is just churning and then the cogs in the machine are just moving, right? So one of the main, primary goal is to unblock whatever is blocked. So I want, instead of like going into Slack and reading every single message, I want those things to be highlighted to me automatically so I can pull in some other agent who can give me context about it and then I can like focus on that specific issue. It will save us time to do more impactful things in life so that you can do even higher impact. Humanity will start growing even faster as, as we get more leveraged tools. So that's why we can make more progress than before, because we're saving more time than before.

Daniel Darling [00:22:36]:It's a really exciting and optimistic outlook. How do we get over the kind of the trust barrier of trusting these agents and the quality of that work? And how do we even know what they're missing if we start to delegate so much of our day to them?

Aman Singh [00:22:52]:Yeah, this is where I think the enterprise aspect of these things come into the picture. Like deploying an agent just doesn't mean you deployed and left it alone. You want audit controls, you want measures where you can see when the agent is not doing great. Right. So one of the things we do at Contextual is like one of this is the GLM, the grounded language model, where it says, I don't know if it doesn't know. Right. So you can involve humans when needed. This is a function we focus on like very, very precise way that if the information is not in the context, don't answer the question.

Aman Singh [00:23:25]:Second is if the model, if it generates something and it's uncertain about its own output, why it would be uncertain because it generated something that wasn't in the information. So you can do a post check and if it's uncertain, it will, it will sort of flag it that I generated something that wasn't right. So now the human comes into the picture and they see, okay, this, this is what the content was generated. We provide attributions or citations to why it generated from which text. Now they can go and click on that and they see these boxes on what exactly was used to generate this content so they can verify. So that's how you build trust that the system or the process is auditable.

Daniel Darling [00:24:04]:And if you had to look at some of these enterprises that you're working with or maybe that you've started to get exposure to, how do you think the org structure starts to change and evolve in the coming years, given the fact that there's so much identic work being done underneath each person to be more productive?

Aman Singh [00:24:22]:Yes, I think I have a mental model for this which I can share. So how I think about this is like algorithmic tasks which have a defined algorithm where you know, like you do A, B, C, D, E and then the task is complete. I think those will get automated over time. So this can be like replying to a question from the customer which has been answered before. So like you know the answer to it already. There is nothing to be done there, right? Like or like tasks like writing code for like something that is not very unique or like a new problem. Right. So those all tasks which had defined algorithms with it will be sort of automated which leaves heuristic tasks like you and I are doing, which is like a very unstructured in a way like podcast that can go into various directions.

Aman Singh [00:25:12]:Right? So there is no algorithm to this. This is what CEO or founders do every day, every day looks different. So there's no learning for it, right? So those tasks, you can automate small parts of it, but the decision-making process can't be automated unless you have full context and you can reason very deeply about things. Still you will not have the anecdotes experience that comes to the humans as they go through those things. There's a reason why someone is specialized into something, right? So agent will reach that stage at some point. But the gap is very huge to like capture the whole whole thing, everything that is happening day to day, right. So these heuristic tasks will become, we will be able to do them faster, but I think those will be the one that will be harder to automate.

So Contextual, I think if you look at the org chart, the people on the top of the org chart are generally the ones who are focused on heuristics tasks. And then if you keep on going down like things start becoming more algorithmic. So I think each we call the people who are the top knowledge workers, right? So we want to automate or like help these knowledge workers to do their tasks faster and be 10x more productive than what they are. So that's how I see the org chart evolving. And like a lot of the offshoring that happens is also called algorithmic tasks. So you can offshore it to agent in these cases.

Daniel Darling [00:26:30]:How do you view Contextual as an organization, let's say five years out, what does that kind of org chart start to look like? Is it just those sort of senior-level people who are issuing their judgment and expertise on top of a whole bunch of agents?

Aman Singh [00:26:44]:So I think how I would see it is like not necessarily like a shrinking org chart, but we will be able to do even more than what we are. We will be shipping even faster. We'll be shipping more impactful features with better reliability because software engineering as a principle I think we underestimated. It's like a very heuristics-based principle. Like every day even though you are writing code, every challenge looks different. So you will be able to solve more challenges per day, which means you will be able to ship more and you will be able to achieve things faster than you were able to achieve before. So people will be doing even more impact in the future and shipping more, building more. So things that took five years before will take one year and the time will keep on shrinking because the collective knowledge of humanity is at your fingertips now.

Daniel Darling [00:27:30]:I'd love to get your opinion on the foundation models. Do you anticipate just a cohort of foundation models like OpenAI, Anthropic, Google, just advancing the best of breed? Do you think there's consolidation going on in that industry? I'd love to see what your outlook would be.

Aman Singh [00:27:47]:Yeah, I think there's definitely a consolidation going on because if you look at the past, like when Llama was released, first Llama was released, everybody was building their own fine tunes on top of it, like Alpaca, Vipuna, all of these type. So as field keeps on maturing, you stop seeing these fine tunes because it doesn't make sense. It's an investment that doesn't pay off because the next generation of the models will be better than the previous generation and will likely capture the capability that was already in this fine tunes in the base model itself. So I think generally for the foundation model player you need a lot of money and it's like a very difficult game to play because the margins keep on shrinking over time. So consolidation happens not because people can't do it, it happens because the margins keep on getting shrunk. So you need a lot of capital investment to make sure you are sustainable. So people who have like companies who have money, a lot of money, they can play this game longer than the who are have less money. And that's why the consolidation happens, that's why acquisitions happens because ultimately capital plays a big role in foundation model development.

Aman Singh [00:28:54]:And also I think the amount of expertise required to train a foundation model, you can put that expertise somewhere else instead of like using it to build something that somebody already has done. On the consumer side, I think foundation models will consolidate over time. It will become a commodity because like most of the incumbents have distribution. So by distribution they get a lot of data from usage, right? So for example Meta through WhatsApp, Instagram distribution they can instantly get ChatGPT level distribution for all of their models. And they already have it. They are in India like people are using it in villages on WhatsApp. There's a kind of distribution they have. And because data is the mode and if everybody starts getting this data through distribution, these models will start becoming commodity over time.

Aman Singh [00:29:40]:I think the difference is in enterprises which is not accessible by everyone at the same level. So even if you look at OpenAI, Deep Research is not like a, it's like a mix. It's not really a consumer tool, but it's more like something you will use for knowledge work when / while doing your job. For example, if you want to research about me before this podcast, you can use Deep Research for that, right? It will give you like a really good summary of like what kind of questions you can ask me and all of those things. So if you look from that angle, I think enterprise is a lot of domain knowledge that can't be commoditized out of the box. So you have to do some sort of customization for each of them because every enterprise is unique at the end of the day. And this is where I see the layer growing.

Aman Singh [00:30:22]:I think in the enterprise space there will be a lot of specialization aspect, whereas in the consumer space I think there will be commoditized models that anybody can use to serve a basic chat application. But most of the consumer use case also a little bit different that they have a lot of error tolerance, right? Like in enterprises you don't, don't have, you need to have like really high accuracies in a particular use case. So I see that divergence happening in both things and I think enterprises are moving towards more specialization, which just means more and more companies, verticalized companies will start popping up, whereas consumers I think in the longer term will consolidate and these enterprises solutions can build on these foundation models. But it will not be the end of the line that needs to be done lot more to push it towards the finish line.

Daniel Darling [00:31:06]:That maps a lot to what we're seeing as well. And if you had to look a little bit beyond the LLMs, are there other kind of innovations that you maybe expect to come out of the AI landscape in the coming years that maybe aren't getting a lot of airtime or that you're paying attention to?

Aman Singh [00:31:20]:I feel like why everything converges to LLM is because language is the medium humans communicate in. And I think that's why everything ends up being converted into LLM. For example, like if you think about a text to image, wasn't in LLMs, it ended up getting merged into LLMs because for humans to explain what they want, language is the medium. So you think one from the medium perspective, even if something other technology comes out, ultimately the way to expose it or give the interface to the users to that particular thing is LLM or the natural language. So think of LLM as a UX to all of these tools, right? Like this is, this is the crucial part. There's, there are a few things I'm very excited about, especially world models which sort of model the world and helps you generate. I think if you think about the future of Metaverse and all those things, I think this is where the world models become super relevant. So I think language models only touching our world.

Aman Singh [00:32:15]:But can you create other sort of imaginary worlds where like you can, you can generate games out of the box in like one shot, right? So I think there is a lot of entertainment potential or like the artistic side potential in these models, that creativity potential that yet hasn't been explored. And I think I'm very excited to see where that all of that goes.

Daniel Darling [00:32:37]:We're hearing a lot about the world models as well. And in the context of things like AI being unleashed in the physical world, whether it's robotics or general understanding of what is happening in the physical environment, do you think that there would be living side by side, you'd have LLMs that would really focus on knowledge work and then these world models that would focus on everything else.

Aman Singh [00:32:57]:Like I said, LLM is an interface. So I, I would think that this world model becomes a tool for LLM. It's the part of the environment, it's just like tool. And through LLM you can specify in natural language what do you want this tool to do. It will do it for you. And this is true for text to image. This is for true for text to audio. This is true for audio to text to audio.

Aman Singh [00:33:17]:Like all of the modalities you can think of, they can converge through the medium we communicate in is that is the natural language. So I think I expect LLM to be able to use world models in longer term.

Daniel Darling [00:33:29]:Look, Aman, you know, we're coming up on time, but I just wanted to end on one question. What does Contextual look like to you five years out? What would that company be doing and what do you hope for that future?

Aman Singh [00:33:40]:Yes, our mission is to change the way the world works. And what it means is like we want to make every enterprise employee more productive than they are. So five years from now, I, I would see that corporations using small corporation, the corporation would be thinner, much more impactful and these knowledge workers will be doing much more impact in like a shorter amount of period. So thinner, more impactful organization that can achieve the impact that takes thousand people today. That's what I would want to see or like I want to achieve five years from now through Contextual. So the world where everybody is doing meaningful work in the fastest way possible while delivering it in high quality.

Daniel Darling [00:34:24]:What an exciting future. Well, thank you so much for sharing that with us today and congratulations on all the success you've had today.

Aman Singh [00:34:29]:Thank you for having me. This was a great conversation.

Daniel Darling [00:34:32]:What an awesome conversation it was with Aman. As we look ahead, the future that he describes isn't just transformative, it's surprisingly tangible. We're not talking about a distant AGI, but practical breakthroughs happening now. Grounded language models that know when to stay silent when they don't know the answer, retrieval systems that learn your internal hierarchy of trust and agents that grow alongside you and your team, learning tribal knowledge, adjusting to your workflows, and scaling expert judgment across the organization. What surprised me was how these innovations don't just automate tasks, they actually change how work is structured entirely. In Aman's vision, a thin corporation, as he puts it, powered by a network of specialized agents, can achieve what once took a thousand employees. This is the new operating system for the modern enterprise, and it's here today. To follow Aman, head over to his account on X @APSDEHAL.

I hope you enjoyed today's episode and please subscribe to the podcast to listen to more coming down the pipe. Until next time, thanks for listening and have a great rest of your day.

subscribe and listen today

5YF Episode #34: Contextual AI CTO Aman Singh

Transcript