Evaluating Open Source vs Commercial AI/ML APIs

Our methodology: A comprehensive analysis (beyond performance metrics)

OCT 2023
BY MCKENZIE LLOYD-SMITH

Summary: We outline the methodology we employ to support our partners in choosing between open source ML models and commercial ML APIs.

Over the past few months, you've likely encountered numerous debates regarding whether to utilize open source or commercial APIs for Large Language Models (LLMs). However, this debate isn't unique to LLMs; it extends to the broader field of Machine Learning (ML). How to decide between commercial APIs and open source models is an important question, and has become increasingly common amongst our partners.

Many methodologies we've come across examine model performance as the only variable in the decision-making process. But model performance is only one of multiple dimensions of analysis that should be considered when deploying ML in a production environment. That's why we developed our own methodology that encompass dimensions including both direct model & hosting costs, as well as indirect costs, including engineering time, ownership, and maintenance. Having worked with partner organizations ranging in size from startup through to global enterprise, we know these factors are equally, if not more, critical when deploying ML in a commercial contexts.

Comparing costs of open source and commercial models

Let's start by comparing the cost of an open source model and an equivalent commercial API-based model. In our example we'll look at ResNet50, an open source model for image classification, and Amazon's Rekognition API:

Open Source Model: Running a ResNet50 model for image classification on a g4dn.xlarge AWS instance costs $0.526 per hour. This setup can process approximately 75 predictions per second. Operating the instance 24/7, you can make about 200 million predictions, at a cost of around $400pm.
Commercial API: Running a similar operation via Amazon's proprietary Rekognition API cost is $0.00025 per prediction. Achieving 200 million predictions in a month would set you back a staggering $50,000pm.

Comparing these options, running an open source model seems like the obvious choice. But a more appropriate model for costing utilizes predicted monthly volume as well as non-model costs. That's why we advocate for considering the Total Cost of Ownership (TCO).

Let's explore TCO by exploring the features associated with open sources and commercial ML models.

Comparing features of open source and commercial models

A common misconception is an ML API is an ML model wrapped in an API. That's not quite the case. To understand what a commercial ML API offers, in comparison to an equivalent open source model, let's examine the layers surrounding the models and their functions.

Open Source Model: An open source ML model is just the pre-trained ML model, without any extras (and often without fine-tuning, but more on this later).
Commercial API: The commercial API-based model contains a pre-trained and fine-tuned ML model, plus:

Model upgrades
Infrastructure hosting
ML serving framework
Engineering logic
API and infrastructure scaling
Customer support and SLA

When opting for a commercial API-based model, these six features are bundled into the call price. When deploying an open source model, these same features have to be considered and developed in-house. Let's briefly examine each, to help understand what it is:

Model Upgrades: Choosing a commercial ML API means you're likely to have access to new models as they arrive, with limited infrastructural changes. Newer models are likely to come with a higher API call price, but upgrading (or downgrading) can be as easy as a single-line code change. In contrast, as an open source model becomes redundant with the release of a new model, depending on how it's been deployed, upgrading models can be an extremely time-consuming process (especially when fine-tuning is required).
Infrastructure for Hosting: If choosing to deploy an open source ML model, the first decision after selecting a model is deciding where to run it. For small-scale experiments, a local machine may suffice (but this isn't recommended for models over 7B parameters without a dedicated GPU). For scalability, you'll likely opt for cloud providers like AWS, GCP, or Azure to rent servers. Commercial ML APIs take away infrastructure concerns entirely.
ML Serving Framework: A model isn't merely a Python function. For optimal prediction serving, you'll likely use a serving framework like TorchServe or TFServing. These frameworks optimize latency and handle high concurrency but need to be setup and deployed appropriately. Maintaining effective models requires continuous MLOps, often benefitting from proprietary tools & services (which come at a price). When using commercial ML APIs, the serving framework is taken care of.
Engineering Logic: The precise logic of how a model is deployed varies case-by-case. When done in-house, organizations can benefit from full control over data processing, but developing efficient instances requires skilled ML engineers and can be highly time-consuming. For instance, consider sentiment analysis. Deploying a model from HuggingFace may look as simple as this:

# Code snippet from Hugging Face

from transformers import BertTokenizer, TFBertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

model = TFBertModel.from_pretrained('bert-base-uncased')

text = "Replace this text as needed."

encoded_input = tokenizer(text, return_tensors='tf')

output = model(encoded_input)

For small tasks, this simple code suffices. But for larger tasks a more comprehensive logic needs to be developed. For example, imagine needing to conduct sentiment analysis on 10k documents of varying formats. For this, you'll need to:

Use an OCR model for PDFs.
Handle edge cases like page orientation.
Employ different models based on document types.
Implement parallelization pipelines.

Commercial ML APIs abstract these complexities, allowing you to focus on core tasks.

API and Infrastructure Scaling: Commercial models wrap the above components into an API, and ensuring they scale for increased demand. When self-deploying, we work with partners to carefully consider scalability within their infrastructure setup. Depending on use-case, it's also important to consider traffic spikes, infrastructure redundancy, and maintenance downtime, all of which is handled by commercial models.
Customer Support and SLA: When using open source models, we recommend having a least one ML engineer dedicated to support and bug fixing. Commercial ML APIs, on the other hand, include support under their SLAs, including maintenance and guaranteeing fixes within specified timeframes.

In conclusion, Commercial ML APIs offer far more built-in functionality and service than open source models. This doesn't mean you can't build these features and provide these services for yourself. What's crucial is understanding the difference in functionalities between open source models and commercial models. When planning on deploying a model, this comprehensive understanding helps decision-makers map & cost the elements that'll be required in-house, versus those offered via a commercial API.

Considering Costs

We started by exploring model costs, and now that we've looked at the comparable features of open source and commercial ML models, we can review the TCO of each.

We know that every deployment case will be unique. But based on our experience, as a rule-of-thumb, if your monthly prediction count is less than 400,000, a commercial API is typically more cost-effective. For higher volumes, consider hosting an open source model. Why? Because commercial APIs scale costs with usage, while GPU-based setups have more fixed costs.

Analysis from 'Towards Data Science' corroborates this, finding that—at lower usage levels—commercial APIs like those offered by OpenAI, Microsoft, and Amazon, are cheaper than running open source models on AWS. However, when usage reaches millions of daily requests, the latter is more economical. Let's use an example to explore this further:

Image: simplified model costs

Open Source Model: AWS offers a standard architecture involving AWS Sagemaker, Lambda, and API Gateway for deploying models. Lambda and API Gateway costs are relatively negligible, but LLMs often require more expensive computing instances. For example, deploying a 20-billion parameter model like Flan UL2 on AWS can cost around $150 per day for 1000 requests, rising by only $10 (to $160) per day for 1M requests, or roughly $58,500 annually.
Commercial API: ChatGPT charges $0.002 per 1,000 tokens. A token roughly equates to 3/4 of a word. If you process 1,000 text chunks daily (each at approx. 500 words), the cost is approximately $1.3 per day. For a million documents daily, the cost skyrockets to $1,300 per day, or roughly $500,000 annually.

But let's not forget those additional features we've considered previously. Infrastructure costs, engineering time, operational efforts, and maintenance costs are all included within commercial model's pricetag, but should be factored into the cost of operating an open source model. Since these costs are often paid for in human capital they can be challenging to quantify. However, as a baseline, 'Towards AI' recommend a minimum of two engineers when running open source ML models in commercial settings—one specializing in ML and the other in backend/cloud technologies. This is a minimum combined annual salary of $240,000.

When evaluating ML options, it's imperative to look beyond just prediction costs, and delve into TCO for a comprehensive view.

Fine-tuning

Fine-tuning your models to fit your unique data and use-cases is possible with both open source and commercial ML APIs. However, the level of customization and the effort required vary between the two.

Open Source Model: Open source models come pre-trained but often lack any fine-tuning. Depending on the model, fine-tuning may be required in order to generate usable output. While open source models provide significantly more flexibility, as you have access to the underlying architecture (allowing users to make granular changes to both the model's architecture and parameters), such customization comes at a cost due to the need for advanced ML expertise.
Commercial API: In contrast, commercial APIs are pre-trained, fine-tuned, and may have had additional classifier fine-tuning. Commercial models often offer a streamlined fine-tuning process for customers. For instance, OpenAI's GPT-4, Amazon's Rekognition, and Mindee's OCR all offer API-based fine-tuning. With commercial APIs, you provide your data and labels via their interface, and they handle the rest—training, optimizing, and deploying a custom model for you, all built on their existing technology stack.

Returning to the features which accompany commercial ML models, if an API's performance lacking, clients can request support from the provider who can evaluate a use case to determine its viability. Using an open source model requires employing an ML specialist to determine the use-case, the appropriate model, and engage in fine-tuning to adapt the model to the specific needs.

Security & Privacy

We often see open source, self-hosted models being labeled as 'secure' simply because they are managed in-house. This is an unfortunate misconception. Security concerns the robustness of any ML system against potential threats. Unless an organization utilizes security expertize, they're more vulnerable to risks compared to using a commercial API, which often boast dedicated security teams, whilst adhering to industry standards, and undergoing continuous system monitoring.

Privacy, however, is a distinct issue. When you use an API, your data leaves your controlled environment, potentially exposing it to third parties. Regulatory compliance adds another layer of complexity, and we work carefully with partners to ensure adherence to specific federal, provincial, and industry specific guidelines.

It's important to remember that API providers build their business on trust. We support our partners in researching providers in-depth and scrutinizing their privacy policies. Most established vendors are transparent about data handling protocols, offering data encryption both in transit and at rest, and sometimes even allowing you to opt-out of using your data for training their models. In sectors where data privacy is a top priority, we work with teams to deploy in-house models which fully comply with internal and external requirements.

Summary and Takeaways

It's crucial to understand that both open source ML models and commercial ML APIs have their own strengths and limitations; they serve different needs. The optimal choice is dependent on the various factors we've outlined. Below is a summary of key recommendations:

Initial Approach: Start by experimenting with commercial ML APIs. If they satisfy your requirements and are budget-friendly, continue using them. Even if the costs are marginally higher, weigh those against the expenses of hiring staff to develop and maintain a comparable service.
High Monthly Prediction Volume: If your operation demands a large volume of monthly predictions—exceeding roughly 400,000 to 500,000—you're likely better off leveraging open source models and hosting them yourself.
Fine-Tuning: Should the commercial APIs fall short in performance, explore their fine-tuning capabilities. Work on improving your data quality and consult with the service provider for optimization tips. If performance issues persist, it might be worthwhile to invest in an ML specialist. Conversely, if the model performs exceptionally well, consider hiring an ML engineer or DevOps to assist with production deployment.

Deciding between an open source ML model or commercial API-based model is a complex decision. Model implementation comes with a significant time, costs and data investment. Changing models and re-designing architecture due to a suboptimal initial choice can be hugely expensive. That's why invest so much time in understand our partners' unqiue needs and ambitions; to ensure they make the right decision, first time.

This is a broad overview of the methodology we employ when helping our partners make their ML decisions. If you're in the process of evaluating your options, feel free to get in touch to discuss how to make a decision that best suits your specific needs and circumstances.

Want to learn more about Agents?

Autonomous Agents at Work

For more on human-AI interaction see our new insight:

AIX and Human-Centered Design

Click here for more: INSIGHTS

Sign up receive our insight & reports straight to your inbox. Always interesting, and never more than once per month. We promise.

Google Sites

Report abuse