February 5, 2025

February 5, 2025

5YF Episode #30: Fireworks AI CEO Lin Qiao

DeepSeek Deployed, How Open Source Wins, The Small Model Revolution, AI Infrastructure Wars, and the Future of AI Infrastructure w/ Fireworks AI CEO, Lin Qiao

5 year frontier

Transcript

Lin Qiao: The open community, the open science community, the collective effort we pull together can really beat proprietary. That's my prediction - the open-source model is going to be better than proprietary. DeepSeek is just one instance. I'm very sure that there will be a lot more coming in 2025 this year.

Daniel Darling: Welcome to the 5 Year Frontier podcast, a preview of the future through the eyes of the innovators shaping our world. Through short, insight-packed discussions, I seek to bring you a glimpse of what a key industry could look like five years out.

I'm your host Daniel Darling, a Venture Capitalist at Focal, where I spend my days with founders at the very start of their journey to transform an industry. The best have a distinct vision of what's to come, a guiding north star they're building towards and that's what I'm here to share with you. Today's episode is about the future of AI infrastructure. We cover open-source supremacy, DeepSeek disruption, building small expert models and staying ahead of the blazing pace of AI innovation. Our guide will be Lin Qiao, the CEO and co-founder of Fireworks AI, a company pioneering scalable AI infrastructure to help businesses build, customize and deploy AI models and applications with both speed and efficiency. Fireworks provides seamless access to over 100 AI models, including those from OpenAI, DeepSeek and other leading providers, making enterprise AI adoption faster and more flexible. Founded in 2022, Fireworks AI is already gaining high-profile customers such as DoorDash, Verizon and Upwork. Backed by Sequoia, the company has raised $77 million to date and is currently valued at $550 million. Lin brings a wealth of experience in AI infrastructure and engineering leadership. She has held key technical roles at IBM and LinkedIn, but is best known for her tenure at Meta, where she led a team of 300 world-class engineers and AI to build frameworks and novel platforms. She played a pivotal role in scaling PyTorch, deploying across Facebook's global data centers and billions of devices, cementing it as one of the most widely used open-source AI frameworks today. Lin holds a PhD in Computer Science from UC Santa Barbara and a Master's in Computer Science from Fudan University in China. Lin, so nice to see you. Thanks for coming on to talk with me today.

Lin Qiao: Very nice to meet you too. Thanks for having me.

Daniel Darling: What inspired you to leave Meta in 2022? A company that has really led the open-source movement in AI and where you're managing a team of over 300 people in that space and you wanted to go off at that point in time to start Fireworks. What was the inspiration around that?

Lin Qiao: You're right. Meta is on the cutting edge of building up AI and also bringing AI into production. We built PyTorch. It's a fantastic framework now dominating the industry whenever it comes to models, deep learning models - almost like the majority of the models especially in GenAI are written and deployed in PyTorch. So in that journey as I engage with the open-source community, I start to realize they are in the space where Meta was about five years ago when Meta started this AI journey. There's no software, there's no hardware, there's no people. We felt starting this company to support industry transition to build AI-first products is going to have a lot of impact because the need is very strong there. So that's kind of our mission.

Daniel Darling: Starting Fireworks and there's a big commentary around the enterprise are just in this kind of testing phase with AI. They're not really deploying it into production. I guess that is a role that you want to help them do more efficiently and at scale. Does that notion of just testing and tinkering, not deploying map out to your own experience in the field?

Lin Qiao: So I would say quite contrary to that I've seen very active development within enterprises. So this is kind of also breaking my own assumption. When I just started the company, in my mind I always felt like the startups or the AI-native applications, they would be on the cutting edge of adoption and then digital native, right? And then traditional enterprise. So I always had in my sequencing this order, like maybe they are gapped two to three years apart from each other, but it's happening right now across the board. We have customers from native startups fast-growing in an explosive phase. We have companies in very large digital-native areas and we have companies in traditional enterprise like insurance companies, healthcare, banks and so on across the board. I would say this is a wave, this current AI wave is unprecedented. It's a shock wave across the whole entire industry. Eventually, it's a tech-led wave. Also by tech-led, it's mostly driven by developers who are engineers and they have been taking the chance of building and pushing the boundaries and redefining what this new user experience looks like whether they are sitting within enterprise or outside of enterprise. I would say this is a very surprising observation.

Daniel Darling: When we look at all of this AI innovation it comes with a huge amount of resource-intensive demands and infrastructure and Fireworks really addresses that and speeds it up and makes it more efficient. So maybe what are some of the innovations that you've developed that allow for that deployment of GenAI to happen more efficiently in the enterprise? And how does that start to evolve?

Lin Qiao: Right. The general models are on the highest end of model size and complexity. So that just means bringing it to production is kind of challenging to application developers. They are building highly interactive apps, right? So humans are on the receiving end. They kind of need to talk to this app and do things. So it needs to be very responsive. Latency is the key. And second is they need to scale to a lot of users, right? So when they scale, they need to scale quickly and they also need to scale with unit economics. But GenAI is breaking it because the model is so big, it's so complex, so it's slow and second, it's very, very expensive. It's very expensive because it's very big and it has to run on GPU and sometimes it has to run on multiple GPU nodes for one model. And we know that GPU is expensive, but it's not just GPU. The whole entire infrastructure around GPU is expensive because GPU is very power-hungry. Power is expensive. Power produces heat. Heat you have to have cooling. Cooling is expensive. So everything adds up. The unit economics of running AI to power applications has been drastically different from applications as is before as you run on top of commoditized CPU. So then the application developer, although the content being generated by GenAI models is awesome, they struggle. They struggle as in they could no longer produce this highly interactive experience. Their latency is regressing or they cannot scale. Even if they find the market, they can scale into bankruptcy.

Lin Qiao: So that's kind of a very interesting paradox here - it's a perfect product-market fit but you cannot scale. So that's where we come in. We're coming because all these models are PyTorch models. We know how to productionize PyTorch models at Meta's scale. We know how to optimize to extreme for both latency and cost efficiency.

Daniel Darling: Fantastic. And you know, cost efficiency and some of those efficiency gains is really top of mind at the moment. And it is at this intersection between using proprietary models like OpenAI or open-source models like Llama and we're in this news cycle at the moment that's blowing up around DeepSeek and its innovations in dropping down the cost of model training and then model deployment. So maybe you can give us a little bit of your take on the battle between open source and proprietary. How do you see it today? How do you see it evolving?

Lin Qiao: So it's very interesting. I think that two years ago we had the first open-source model like LLaMA that emerged. Before then it's all proprietary and there's a chart someone showed that open-source model capabilities are approaching, converging with closed source until recently DeepSeek got released. I think it's the first time it's not converging, it's not approaching, it's kind of surpassing.

So of course when we say compare models there's no one model that beats all the models. It's just kind of impossible because the way models are trained, the model has to be kind of focused on certain kinds of loss functions. That just means the model is kind of from the day it was designed to be really good at doing certain kinds of things really well. But DeepSeek has demonstrated that it can really beat state-of-the-art models, the proprietary models in logical reasoning. For example, there's this expert-level logical reasoning benchmark called Human's Last Exam. It's a very, very difficult benchmark and DeepSeek is number one. And also in the chatbot arena where they just compare results across multiple models when they question, they look at the result and DeepSeek is also topped in that. It just gave us a trend. I believe in this trend. The trend is directing towards the open community, the open science community. The collective effort we pull together can really beat proprietary. That's my prediction - the open-source model is going to be better than proprietary. We just see DeepSeek is just one instance. I'm very sure that there will be a lot more coming in 2025 this year. This year is going to be a very, very exciting year.

Daniel Darling: Absolutely, I totally agree there. And do you think by the nature of it being open-sourced though that the proprietary models can adopt the best practices from the open-source community and stay one step ahead and capture some of those economics, or do you think there'll be just eventually a tipping point where the open-source community will propel ahead?

Lin Qiao: Yeah, so I think we have numerous examples that the open-source models are becoming the state of the art. We have many examples. Kubernetes is a great open-source example that it becomes the de facto standard and then there's no point of having anything proprietary there because you get the best. The open source is constantly being tested, being maintained, being updated, being enhanced. I think what is open source, it's not about just being open. It's about the community around it. The community keeps fixing small problems in it, compound it, compound. So just as one example, I was seeing some discussion that there are already institutions trying to reproduce DeepSeek training, and multiple ones of them. And I'm sure that it's a good practice, also it's a good test that I'm very keen to see what comes out of those and I'm pretty sure it also gives information to Meta what are other kinds of model training best practices they can leverage and build on top of their existing model training process.

So I'm also super excited about LLaMA 4.

Daniel Darling: So are we just destined for a dramatic drop in the cost of intelligence and the cost of running these models for the benefit of those building on top of it?

Lin Qiao: Absolutely. Because again I think DeepSeek is the significance of those models. V3 and our models, it's not just model themselves, it's actually a proof point that training can be very extremely affordable. And we here we're talking about pre-training not post-training. And that just means the model quality evolution will go much faster. I believe the future will be hundreds of small expert models.

Daniel Darling: Got it.

Daniel Darling: Sitting from the Fireworks perspective, you've got over 100 models on your platform including DeepSeek which is an amazing turnaround given they just released on your behalf.

Lin Qiao: And this will just significantly accelerate that journey of many emerging small expert models in their own domain. It's going to beat general-purpose proprietary models. It's a given. It has already happened and 2025 is going to happen in a more pronounced way.

Daniel Darling: What are the kinds of signals that you're already seeing these specialized expert models being adopted?

Lin Qiao: Right. We have seen a variety of use cases. We have seen people building all kinds of assistants. They are building assistance in the legal domain. Even legal has so many different subdomains. I'm learning actually where I bring up our customers, there's plaintiff, there's defensive, there's legal research, it's all different. And there's education assistance to the teachers, to students, to people who want to learn foreign languages. There's medical assistance, that we actually have a shortage of 1 million nurses. We have also seen applications and they are streamlining the business process, the business process of reporting KPIs, the business process of doing analytics and so on. So and then there's a whole slew of applications building all kinds of search experiences. So I think search is also going to be disrupted because there are so many places that we put our content in, right? And then those companies where we put our content in, they're going to enable a very powerful search capability and then we can kind of pick and pull intelligence from the content in a much easier way. So search is like huge, I think - huge explosion in terms of applications.

Daniel Darling: Fascinating. And you know, when you look at that and you look at this idea of these kinds of smaller specialized models, how does that map to the other trend which is plowing huge amounts of dollars into build out of AI infrastructure and data centers, most notably the Stargate program at the cost of 500 billion dollars. Does that run true to the narrative that we will still need all of that compute power or are we just going to see hyper-efficiency mean that we don't need these giant data centers to achieve our AI goals?

Lin Qiao: What's gonna happen is the training side, as like DeepSeek proved, that it can be extremely cost-efficient and that just means the model quality iteration is gonna accelerate, especially accelerating in the kind of domain-specific experts area. The velocity is gonna speed up for new applications to adopt those models and bring those into production, right? Because the quality gap is diminishing. Here I'm not talking about some benchmark, whatever. I'm talking about the quality gap to build a high-standard product that can deliver the result as needed to drive user engagement and so on. It's going to diminish very quickly. So that just means there is a lot of demand that we're driving into inference. And that's why like from the inception of Fireworks we saw that and we have, and that's kind of almost like two and a half years ago. And everyone is working on training, everyone's working on training and we have decided, we believe the future demand is inference. And the demand on inference is going to be way, way, way bigger than training.

Daniel Darling: And so that underlines the need for that infrastructure spend for the inference side.

Lin Qiao: Absolutely, absolutely. So we spend all our effort building up our highly scalable inference, GenAI inference infrastructure and bring it to the next level. So right now I think we're leading the industry in terms of our capability, in terms of our inference speed and the cost efficiency.

Daniel Darling: What does the whole shift towards inference or the refocusing around inference mean for models in 2025? Where do you see it kind of evolving and is it around making them far more specialized and fine-tuned and expert from that general-purpose foundation model or how does it play out?

Lin Qiao: Yeah. So what we see is especially for enterprises they have a lot of data assets and they need to convert those data assets into AI assets. And we have seen the biggest gap of doing that is a lack of proper customization process, an ease-of-use customization process to infuse their dataset into those models. With the emergence of DeepSeek models it makes those infusion of data assets and transformation much easier. And what we see is customization is key to production especially for enterprises and they all need to adapt and adjust their model based on their internal data. That's where we are heavily invested in. It's not just kind of an inference platform but an inference platform that is specialized, that is specialized towards individual enterprise and that specialization process is also fully automated so it's very easy to get started. And for every individual company they will have their own inference engine.

Daniel Darling: Is this notion of running out of high-quality data to train these models a facade? Will the enterprise have enough proprietary data or access to proprietary data to be able to train its own models?

Lin Qiao: So I think it's always a race as in there are a lot of domain-specific data. You can argue whether different enterprises in the same domain have completely different domain-specific data or it's pretty much the same form or shape. Regardless, right now the gap is very big for a general-purpose model into like a company-specific use case. I think some of those gaps will be bridged by some model providers that are verticalized or some kind of platform that is verticalized to kind of cover that. But some gaps may not be fully covered by that and then they need to further tune using company-specific data. Right now it's an actively developing space and for the enterprises who are entering GenAI right now, they do need to have a pretty deep customization process and that's where we jump in to help.

Daniel Darling: Hearing you talk about enterprise customization of models does give rise to another topic which is around AI agents and their deployment within the enterprise. What are you seeing today and how do you see that whole agentic space playing out?

Lin Qiao: Starting this year we saw a lot of interest in agent workload. We have a lot of customer engagement. They are directly building demos and building proof of concepts. We're still quite early if we look forward. I do feel like this will happen pretty fast. Change on the infrastructure level, its latency is gonna be an even bigger issue. To solve a problem you will have multiple round trips across multiple models and before it's just kind of one model. So those models have to be really much faster, those models have to be much smaller. Those models have to be more optimized. So I think the development of agent workflow is going to push towards the direction I just mentioned. We will see hundreds of small expert models emerge and getting there faster will really help the maturity of the agent workload.

Daniel Darling: And there are other parts of the AI industry on the frontier that you're keeping a watchful eye on maybe this year or next?

Lin Qiao: Yeah. So because we are at the AI infrastructure level, we keep a very close eye on hardware development as well. Of course we run across multiple hardware SKUs from both NVIDIA and AMD. I think the hardware landscape is also getting extremely interesting. Before I start this company, usually every three years there's a new hardware SKU and now there's like every one year there's new hardware SKU from each hardware vendor. So it's interesting. Right now it's great news for developers because they have a lot more variety to pick and choose from but also it's challenging for them because they have no idea which one to use. And even picking and choosing the models becomes exhausting. There's every few weeks or every week there's new model coming up and you know they're confused like should they use it or not, let alone different hardware SKU and different hardware SKU has different design points, some has higher FLOPS, some higher memory bandwidth, some has better interconnect cross GPU interconnect and so on. And we want to harvest all this dynamism across the industry from the best innovative model, state-of-the-art model and the state-of-the-art hardware and give that power to the application developers to innovate and innovate to change our life. I'm very looking forward to what new applications are going to emerge from this AI wave.

Daniel Darling: How do you stay and organize your company at Fireworks to remain dynamic with all of this change? You're acting at such an important interface between all the model and hardware innovation and the end developers and application and enterprises and they're looking to you almost for guidance and being able to stay on top of all of this.

Lin Qiao: We are very customer-obsessed. We really believe our success indexes our customer success and we want to give our best to them. If a new model comes in that will significantly advance the product development of our customer we will just jump in and make that happen in no time. We are also extremely flat as a company. There's no hierarchy. We don't want internal communication to become slow or anything down. So we're focused on top priority goals and the whole entire company just dashes towards that. And then lack of hierarchy helps us to get there really fast. It's very easy to mobilize our people across different functions from engineering to sales to marketing, all the functions to laser focus on one goal.

Daniel Darling: What advice would you have for founders looking to embark and develop an AI-first or an AI-native company?

Lin Qiao: Today in a startup there's just not enough data. When you bootstrap a business you don't have enough data to make an extremely high-precision decision. It's much more important to start a product with your hypothesis and going in to validate as soon as possible and start to create the feedback loop. It's okay to make wrong assumptions but it's not okay to not validate it for a long time in vacuum and because time is of the essence here all we have is race with time and get to what people need as soon as possible. So that's my advice. Kind of don't analyze too extreme because it's guaranteed there's no perfect data.

Daniel Darling: I couldn't agree more. I really love that advice. What about for venture capitalists and investors? What advice do you have for them?

Lin Qiao: I have excellent board members and investors and they are very dynamic. They don't fix it as in "oh you are early stage, oh you are growth stage, oh that is time we should have that." They don't ever give me a box to fit into and we always brainstorm how to be the company in the industry. I think it's clear in my mind there's a lot to disrupt. So of course we as a company, Fireworks, we're disrupting the technology GenAI platform space as a SaaS platform. But I also shared with my board even the go-to-market needs to be disruptive because we have never seen this market before. The way people are adopting GenAI products has never existed before and how we structure go-to-market team, how the process looks like it's going to be different from before and I think they're all very much on the same page with me.

Daniel Darling: And that's an area that we're really passionate about at our firm. How do you see a modern go-to-market work in this environment?

Lin Qiao: The one trait that we're seeing right now is just everything moves so fast. If there's a runbook it will take you two years to get to the mindshare. It's not just going to work, it is too slow. I do not like the go-to-market strategy where marketing is like one year, two years ahead of product. I really do not like that. And even when we started we are like the opposite. We don't even market anything because through the PyTorch experience we're like, hey, PyTorch will speak for itself. Of course that's not the right way to think about it. Marketing is important. Getting the word out is important. But I just don't believe in this one year ahead marketing. I don't believe in that because no one knows after one year where we are in the AI space.

Daniel Darling: So just rapid release cycles of product, letting it speak for itself, letting people come to the platform and experience it in a lightweight manner and then having marketing support. That is a big driver. Fascinating.

Lin Qiao: At least that's how we operate. And we are also happily experimenting. And we'll see.

Daniel Darling: That aligns nicely with what we're seeing too.

Well, look, Lin, we've come up on time unfortunately, but thank you so much for spending time with me today. I know there's so much going on in your world and in the whole AI industry that even taking a moment out of your day to come and chat, you could be missing the next big release. I'll let you get back to it. Congratulations on all the success so far with Fireworks. No doubt we're going to hear a lot more about the company as it grows. And thank you again for coming to chat.

Lin Qiao: Yeah, it's awesome. Thanks.

Daniel Darling: It was a real privilege to speak to Lin at a time when AI infrastructure is becoming the key battleground, with enterprises moving rapidly from experimentation to full-scale deployment. Lin makes a compelling case for a future dominated by open-source, cost-efficient foundation models, enabling a surge of specialized expert AI models that are fine-tuned for real-world applications. This shift will dramatically lower the barrier to AI adoption, unlocking new opportunities for developers, founders and enterprises alike. Our conversation really couldn't be more timely, just as the industry grapples with the rise of DeepSeek, the acceleration of inference at scale and the reshaping of enterprise AI strategy. If Lin is right, we're on the cusp of an explosion of AI-native applications built not by a privileged few, but by anyone with a vision to create and the tools to execute them. To follow Lin and the innovation she's developing at Fireworks, head over to her account on X @LinQiao that's spelled L-I-N Q-I-A-O. I hope you enjoyed today's episode, so please subscribe to the podcast and drop us a rating to listen to more coming down the pipe. Until next time, thanks for listening and have a great rest of your day.

back to episode thoughts