After starting a conversational AI company in 2018, I’ve experienced a lot variations in language models. I've run quite a few A/B tests with various model combinations and APIs. Through this journey, I've learned some valuable insights for engineers and product builders that want to work with AI.
First and foremost, it's imperative to remember that end-users are more interested in the experience you offer rather than the specific model you employ. This is key to understand when building an AI oriented product. The model is merely a tool to deliver that experience seamlessly.
Building your own models might seem tempting, but it won't necessarily secure your position in the market. If GPT-4 can provide the same results as your in-house model, your competitive advantage is merely an illusion (and creates a LOT of overhead).
So, for businesses contemplating the deployment of Large Language Models (LLMs), the key considerations are:
Can you achieve the latency/speed and quality of output that your users demand?
What is the budget you are willing to allocate for achieving that level of quality?
Of course, the decision-making process involves a complex interplay of cost and time factors. Yet, here are some practical and quick guidelines for determining when to wrap GPT-3.5/4 or fine-tune your own model:
You are still searching for Product-Market Fit (PMF):
For rapid iterations and quick feedback, almost always go for closed APIs or a pre-trained hosted open-source model.
If your budget allows and you have the data, consider fine-tuning. However, remember that it extends the time needed to discover PMF.
Congrats! You are Post-PMF:
Once you've identified your product's direction, your focus should shift towards optimising the combination of cost, latency, and quality.
If GPT-3.5/4 is cost-prohibitive or too slow, consider training your own model.
If GPT-3.5/4 aligns with your budget and latency requirements, continue using it.
If your task is beyond the capabilities of closed APIs, training your own model becomes the obvious choice.
Additional points to consider:
Assess your margins; using GPT-4 may make sense even if it's costlier than fine-tuning, as it allows for faster service improvements.
Evaluate whether a fine-tuned model will truly outperform a competitor's use of GPT-4, particularly if you lack extensive data.
Keep an eye on future trends; the costs of models like GPT-4 will likely decrease over time, and new models will emerge.
It's important to note that many at-scale use-cases thrive on a combination of closed APIs and in-house models.
In my exprience, people often lean toward training and fine-tuning models unnecessarily. While it can be fascinating, it may not always be the most efficient path. In the end, the product experience is the ultimate goal, and the model is just a tool to achieve it.