And the only way to get AI is to invest 100s of billions yourself? Doing all the investing for a insane datacenter in space thing?
Instead of just saying "anybody that wants to put data-centers in space, please pay us".
SpaceX has been incredibly successful without massive acquisition for a long time. Every product the made was a banger. And now they bought fucking Twitter.
I used it (well, a skill based on the same idea) to optimise a prompt that does data extraction from UGC.
However there isn't really a "correct" answer that's easy to define in code (I could manually label a training set, but wanted to avoid that) so I had the LLM just analyse the results itself and decide if they are better or not. It wrote deterministic rules for a few things, but overall it just reviewed the results of each round and decided if the are better or not.
Reviewing the before and after results, I would say yes, it's a big improvement in quality. It also optimised the prompt size to reduce input tokens by 25% and switched to a smaller/cheaper model.
Maybe someone knows any tips to optimise prompt processing as that's the slowest part? It takes a few minutes before OpenCode with ~20k initial context first responds, but subsequent responses are pretty fast due to caching.
Many of us tested 27B and 35B side by side, and the dense model is significantly smarter. It indeed is slower, but 35B makes a lot of mistakes 27B doesn't.
I haven't honestly dug around to figure out if there's a hardware reason for it, but prompt processing has always been a lot slower for me on macs in general. I mostly use MLX on my 24GB M4 Pro though, so I will pull llama.cpp on it as well to see what the prefill is like.
I've gotten around 16 t/s gen with 4bit and mxfp4 on that model for generation. The 3090 I mentioned has a little over 900 gb/s, while those macs i think are around 270 GB/s. If my understanding is correct, macs do utilize the bandwidth better in this case, but it still doesn't make up the difference (on the 3090 it's around 30-35 t/s depending on size of ctx).
Also, do run a quick experiment removing the cache quants if you want to tinker with it a bit more, iirc KV quant does add a small overhead during prefill.
I would be very interested to know your prefill and generation numbers.
- If you pay for unlimited trips will you choose the Ferrari or the old VW? Both are waiting outside your door, ready to go.
- Providers that let you choose models don't really price much difference between lower class models. On my grandfathered Cursor plan I pay 1x request to use Composer 2 or 2x request to use Opus 4.6. Until the price is more differentiated so people can say "ok yes Opus is smarter, but paying 10x more when Haiku would do the same isn't worth it" it won't happen.
Agreed on both points. We’re dealing with a cost/benefit analysis, and to this point, coders have been subsidized, coerced…maybe even mandated into using the most expensive option as if it was a limitless resource. Clearly not true, and so of course we’re going to see nerfing of the tools over time.
Obviously we’re a long way away from being able to rationally evaluate whether the value of X tokens in model Y is better than model Z, let alone better in terms of developer cost, but that’s kind of where we need to get to, otherwise the model providers are selling magic beans rated in ineffable units of magicalness. The only rational behavior in such a world is to gorge yourself.
it's much funnier now, that by putting it behind a paywall, they're explicitly saying "it's okay for you to do this, you just have to purchase a license first"
Our company (~25 engineers) uses it across the entire engineering and product orgs, and yes we are quite deep into agentic coding. We use their cloud agents for a lot of things, e.g. automated investigations of alarms, handling most customer support issues that end up hitting engineering, pre-processsing linear tickets before humans triage them, bugbot for PR reviewed with learned knowledge. Although recently they have felt like they are pulling the rug out on our legacy plan, so we may end up switching.
Everyone is thinking Apple is the target, but they are actually one of the better companies with this. You can buy first-party replacement parts, tools are available. If you take a look at Chinese or sometimes even Samsung phones it's basically impossible to get replacement parts and if you do it may need other parts like the glass back to be replaced as it's impossible to remove it without breaking it.
Isn't this just because they will be refreshed soon, rumours are around June. I'd imagine Apple stops making the old hardware a few months before a refresh and then just sells the old stock. Maybe they had shorter contracted orders and/or demand is higher than expected.
Last week I got my (customised) M5 MacBook Pro that was ordered during launch week, not really any longer than expected when ordering a new model.
The work going into local models seems to be targeting lower RAM/VRAM which will definately help.
For example Gemma 4 32B, which you can run on an off-the-shelf laptop, is around the same or even higher intelligence level as the SOTA models from 2 years ago (e.g. gpt-4o). Probably by the time memory prices come down we will have something as smart as Opus 4.7 that can be run locally.
Bigger models of course have more embedded knowledge, but just knowing that they should make a tool call to do a web search can bypass a lot of that.
reply