The Vercel AI SDK abstracts against all LLMs, including locally running ones. It even handles file attachments well, which is something people are using more and more.
I found that [LangChain](https://www.langchain.com/langchain) has pretty good abstractions and better support for multiple LLMs. They also have a good ecosystem of supporting products - LangGraph and LangSmith. Currently supported languages are Python and Javascript.
My problem with LangChain (aside from dubious API choices, some a legacy of when it first started) is that now it's a marketing tool for LangGraph Platform and LangSmith.
Their docs (incl. getting started tutorials) are content marketing for the platform services, needlessly pushing new users into more complex and possibly unnecessary direction in the name of user acquisition.
(I have the same beef with NextJs/Vercel and MongoDB).
Some time ago I built a rather thin wrapper for LLMs (multi provider incl local, templates, tools, rag, etc...) for myself. Sensible API, small so it's easy to maintain, and as a bonus, no platform marketing shoved in my face.
I keep an eye on what the LangChain ecosystem's doing, and so far the benefits ain't really there (for me, YMMV).
I agree that the LangChain docs and examples shouldn’t rely on their platform commercial products. Then the LangGraph and LangSmith documentation should layer in top of the LangChain docs.
> How is this allowed/possible in npm? Don't they have mandatory namespaces?
No, scopes (i.e., namespaces) aren’t mandatory for public packages on npm. If a name is available, it’s yours. Sometimes, some folks have published packages that are either empty or not used, so you can reach out to them to ask them to pass it on to you. At least a few times, I’ve had someone reach out to me asking for a package I had published years ago that I did nothing with and I passed it on to them.
I would have thought this was impossible, I contribute to llama.cpp and there's an awful lot of per-model ugliness to make things work, even just in terms of "get it the tool calls in the form it expects."
cries at the Phi-4 PR in the other window that I'm still working on, and discovering new things, 4 weeks later
I’d guess it supports a small set of popular HTTP APIs (particularly, its very common for self-hosting LLM toolkits—not the per-model reference implementations or low-level console frontends—to present an OpenAI compatible API), so you could support a very wide range of local models, through a variety of toolkits, just by supporting the OpenAI API with configurable endpoint addresses.
At some point someone tried a toy direct integration, but the only actually supported way is via a Python library that wraps llama.cpp in an OpenAI compatible API endpoint.
I've been using using [BAML](https://github.com/boundaryml/baml) to do this, and it works really well. Lets you have multiple different fallback and retry policies, and returns strongly typed outputs from LLMs.
I just had very bad JSON mode operation with gemini-1.5-flash and 2.0-flash models using their own library 'google-generativeai'. Either can't follow JSON formatting correctly, or renders string fields with no end until max_tokens. Pretty bad for Gemini, when open models like Qwen do a better job of a basic information extraction to JSON task.
Things to note:
1) supply a JSON schema in `config.reponse_schema`
2) set the `config.response_type` to `application/json`
That works for me reliably. I've had some issues with running into max_token constraints but that was usually on me because I had let it process a large list in one inference call, which would have resulted in very large outputs.
We're using gemini JSON mode in production applications with both `google-generativeai` and `langchain` without issues.
I would recommend looking at OpenRouter, if anyone is interested in implementing fallbacks across model providers. I've been using it in several projects, and the ability to swap across models without changing any implementation code/without managing multiple API keys has been incredibly nice:
Anecdotally, there's been no obvious performance hit, but this is something I should test more thoroughly. I'm planning on running a benchmark across a couple of proxies this week--I'll post the results to HN, if anyone is curious
The Vercel AI SDK abstracts against all LLMs, including locally running ones. It even handles file attachments well, which is something people are using more and more.
https://sdk.vercel.ai/docs/introduction
It uses zod for types and validation, I've loved using it to make my apps swap between models easily.
There is also ai-fallback [0] to automatically switch to a fallback provider in case of downtime
[0]: https://github.com/remorses/ai-fallback
I found that [LangChain](https://www.langchain.com/langchain) has pretty good abstractions and better support for multiple LLMs. They also have a good ecosystem of supporting products - LangGraph and LangSmith. Currently supported languages are Python and Javascript.
Langchain is nice but it's psychotic in TypeScript. Everything is "any" wrapped in layers of complex parameterized types.
My problem with LangChain (aside from dubious API choices, some a legacy of when it first started) is that now it's a marketing tool for LangGraph Platform and LangSmith.
Their docs (incl. getting started tutorials) are content marketing for the platform services, needlessly pushing new users into more complex and possibly unnecessary direction in the name of user acquisition.
(I have the same beef with NextJs/Vercel and MongoDB).
Some time ago I built a rather thin wrapper for LLMs (multi provider incl local, templates, tools, rag, etc...) for myself. Sensible API, small so it's easy to maintain, and as a bonus, no platform marketing shoved in my face.
I keep an eye on what the LangChain ecosystem's doing, and so far the benefits ain't really there (for me, YMMV).
I agree that the LangChain docs and examples shouldn’t rely on their platform commercial products. Then the LangGraph and LangSmith documentation should layer in top of the LangChain docs.
> npm install ai
How is this allowed/possible in npm? Don't they have mandatory namespaces?
> > npm install ai
> How is this allowed/possible in npm? Don't they have mandatory namespaces?
No, scopes (i.e., namespaces) aren’t mandatory for public packages on npm. If a name is available, it’s yours. Sometimes, some folks have published packages that are either empty or not used, so you can reach out to them to ask them to pass it on to you. At least a few times, I’ve had someone reach out to me asking for a package I had published years ago that I did nothing with and I passed it on to them.
That’s a good spot. Is it open source or is it paid software? I’ve been using Braintrust Proxy for this until now.
https://github.com/vercel/ai
Its Apache 2.0 licensed
agree that TFA's advice is not really useful when open source libraries like these exist
Locally running, like, llama.cpp? Or Python?
Either way I guess.
I would have thought this was impossible, I contribute to llama.cpp and there's an awful lot of per-model ugliness to make things work, even just in terms of "get it the tool calls in the form it expects."
cries at the Phi-4 PR in the other window that I'm still working on, and discovering new things, 4 weeks later
> Locally running, like, llama.cpp? Or Python?
I’d guess it supports a small set of popular HTTP APIs (particularly, its very common for self-hosting LLM toolkits—not the per-model reference implementations or low-level console frontends—to present an OpenAI compatible API), so you could support a very wide range of local models, through a variety of toolkits, just by supporting the OpenAI API with configurable endpoint addresses.
I am not sure about llama.cpp but it works with Ollama.
That's not to say won't need to tweak things when you cut down to smaller models, there are always trade offs swapping models.
At some point someone tried a toy direct integration, but the only actually supported way is via a Python library that wraps llama.cpp in an OpenAI compatible API endpoint.
I've been using using [BAML](https://github.com/boundaryml/baml) to do this, and it works really well. Lets you have multiple different fallback and retry policies, and returns strongly typed outputs from LLMs.
I just had very bad JSON mode operation with gemini-1.5-flash and 2.0-flash models using their own library 'google-generativeai'. Either can't follow JSON formatting correctly, or renders string fields with no end until max_tokens. Pretty bad for Gemini, when open models like Qwen do a better job of a basic information extraction to JSON task.
Things to note: 1) supply a JSON schema in `config.reponse_schema` 2) set the `config.response_type` to `application/json`
That works for me reliably. I've had some issues with running into max_token constraints but that was usually on me because I had let it process a large list in one inference call, which would have resulted in very large outputs.
We're using gemini JSON mode in production applications with both `google-generativeai` and `langchain` without issues.
Did you provide a JSON schema? I've had good experience with that.
I would recommend looking at OpenRouter, if anyone is interested in implementing fallbacks across model providers. I've been using it in several projects, and the ability to swap across models without changing any implementation code/without managing multiple API keys has been incredibly nice:
https://openrouter.ai/docs/quickstart
Have you measured how much latency it adds to each query? Naively I'd expect adding in an extra network hop to be a pretty big hit.
Anecdotally, there's been no obvious performance hit, but this is something I should test more thoroughly. I'm planning on running a benchmark across a couple of proxies this week--I'll post the results to HN, if anyone is curious
It's really pretty reasonable. I don't notice it at all. Nothing as bad as trying to relay 1min.ai or something
Just use Litellm
Prompt won’t just work from one mother transplanted into another.
[dead]
Typescript looks so ugly visually. It gives me PHP vibes. I think it's the large words at the first column of the eye line:
export const
function
type
return
etc
This makes scanning through the code really hard because your eye has to jump horizontally.
Silly comment, but I concede this bit is very ugly — they should have extracted the inner type to an alias and used that twice.
like thiswhich of those words are large?
Ah yes, such large words like const, function, or return, that only exist in TypeScript and PHP.
So, like, which programming language do you think is not ugly? J? K?