More

fy20 · 2026-04-29T15:23:21 1777476201

Isn't this all just the long game for Mars? Humanoid AI robots would be beneficial there too.

panick21_ · 2026-04-29T15:26:05 1777476365

And the only way to get AI is to invest 100s of billions yourself? Doing all the investing for a insane datacenter in space thing?

Instead of just saying "anybody that wants to put data-centers in space, please pay us".

SpaceX has been incredibly successful without massive acquisition for a long time. Every product the made was a banger. And now they bought fucking Twitter.

fy20 · 2026-04-29T09:03:22 1777453402

I used it (well, a skill based on the same idea) to optimise a prompt that does data extraction from UGC.

However there isn't really a "correct" answer that's easy to define in code (I could manually label a training set, but wanted to avoid that) so I had the LLM just analyse the results itself and decide if they are better or not. It wrote deterministic rules for a few things, but overall it just reviewed the results of each round and decided if the are better or not.

Reviewing the before and after results, I would say yes, it's a big improvement in quality. It also optimised the prompt size to reduce input tokens by 25% and switched to a smaller/cheaper model.

fy20 · 2026-04-27T19:14:03 1777317243

Running it on a Macbook Pro M5 48GB:

        -hf unsloth/Qwen3.6-27B-GGUF:UD-Q6_K_XL \ 
        -c 128000 \
        --parallel 1 \
        --flash-attn on \
        --no-context-shift \
        --cache-type-k q8_0 \
        --cache-type-v q8_0 \
        --temp 0.6 \
        --top-p 0.95 \
        --top-k 20 \
        --min-p 0.0 \
        --presence_penalty 0.0 \
        --reasoning on \
        --jinja \
        --chat-template-kwargs "{\"preserve_thinking\": true}" \
        --spec-type ngram-simple \
        --draft-max 64 \
        --timeout 1800

Maybe someone knows any tips to optimise prompt processing as that's the slowest part? It takes a few minutes before OpenCode with ~20k initial context first responds, but subsequent responses are pretty fast due to caching.

jonaustin · 2026-04-27T19:45:16 1777319116

https://github.com/jundot/omlx

note: 27b is going to be slow; use the 35b MoE if you want decent token/sec speed.

dexterlagan · 2026-04-28T07:21:38 1777360898

Many of us tested 27B and 35B side by side, and the dense model is significantly smarter. It indeed is slower, but 35B makes a lot of mistakes 27B doesn't.

sleepyeldrazi · 2026-04-27T21:16:54 1777324614

I haven't honestly dug around to figure out if there's a hardware reason for it, but prompt processing has always been a lot slower for me on macs in general. I mostly use MLX on my 24GB M4 Pro though, so I will pull llama.cpp on it as well to see what the prefill is like.

I've gotten around 16 t/s gen with 4bit and mxfp4 on that model for generation. The 3090 I mentioned has a little over 900 gb/s, while those macs i think are around 270 GB/s. If my understanding is correct, macs do utilize the bandwidth better in this case, but it still doesn't make up the difference (on the 3090 it's around 30-35 t/s depending on size of ctx).

Also, do run a quick experiment removing the cache quants if you want to tinker with it a bit more, iirc KV quant does add a small overhead during prefill.

I would be very interested to know your prefill and generation numbers.

fy20 · 2026-04-22T13:54:06 1776866046

Interesting that the memory in the laptop is upgradeable or bring-your-own, where as in the Framework Desktop it is soldered. How does that work?

robotnikman · 2026-04-22T20:56:18 1776891378

The desktops use AMD Strix Halo chips which require soldered on memory for the required bandwidth.

fy20 · 2026-04-22T04:01:12 1776830472

I think the reason is two fold:

- If you pay for unlimited trips will you choose the Ferrari or the old VW? Both are waiting outside your door, ready to go.

- Providers that let you choose models don't really price much difference between lower class models. On my grandfathered Cursor plan I pay 1x request to use Composer 2 or 2x request to use Opus 4.6. Until the price is more differentiated so people can say "ok yes Opus is smarter, but paying 10x more when Haiku would do the same isn't worth it" it won't happen.

timr · 2026-04-22T05:53:52 1776837232

Agreed on both points. We’re dealing with a cost/benefit analysis, and to this point, coders have been subsidized, coerced…maybe even mandated into using the most expensive option as if it was a limitless resource. Clearly not true, and so of course we’re going to see nerfing of the tools over time.

Obviously we’re a long way away from being able to rationally evaluate whether the value of X tokens in model Y is better than model Z, let alone better in terms of developer cost, but that’s kind of where we need to get to, otherwise the model providers are selling magic beans rated in ineffable units of magicalness. The only rational behavior in such a world is to gorge yourself.

fy20 · 2026-04-22T03:47:54 1776829674

They changed that recently, you need to be paying €10/mo for that now. The free plan and/or access for the basic Twitter plan are gone.

pjc50 · 2026-04-22T06:21:02 1776838862

That doesn't make it better! It did somehow slow down the regulatory response because politicians are dumb, though.

IshKebab · 2026-04-22T06:37:18 1776839838

It means X can identify users at least, so they are probably quite a bit less likely to do that.

hsbauauvhabzb · 2026-04-22T10:11:42 1776852702

You’ve obviously never attempted to complete a purchase while working under a regulatory body, required to test the theory.

AlecSchueler · 2026-04-22T06:41:45 1776840105

What difference does that make?

deinonychus · 2026-04-22T15:40:47 1776872447

it's much funnier now, that by putting it behind a paywall, they're explicitly saying "it's okay for you to do this, you just have to purchase a license first"

Cytobit · 2026-04-22T10:48:23 1776854903

Security through enshittification. Nice.

fy20 · 2026-04-22T03:44:46 1776829486

Our company (~25 engineers) uses it across the entire engineering and product orgs, and yes we are quite deep into agentic coding. We use their cloud agents for a lot of things, e.g. automated investigations of alarms, handling most customer support issues that end up hitting engineering, pre-processsing linear tickets before humans triage them, bugbot for PR reviewed with learned knowledge. Although recently they have felt like they are pulling the rug out on our legacy plan, so we may end up switching.

fy20 · 2026-04-20T17:21:20 1776705680

Everyone is thinking Apple is the target, but they are actually one of the better companies with this. You can buy first-party replacement parts, tools are available. If you take a look at Chinese or sometimes even Samsung phones it's basically impossible to get replacement parts and if you do it may need other parts like the glass back to be replaced as it's impossible to remove it without breaking it.

fy20 · 2026-04-20T04:05:48 1776657948

Isn't this just because they will be refreshed soon, rumours are around June. I'd imagine Apple stops making the old hardware a few months before a refresh and then just sells the old stock. Maybe they had shorter contracted orders and/or demand is higher than expected.

Last week I got my (customised) M5 MacBook Pro that was ordered during launch week, not really any longer than expected when ordering a new model.

muro · 2026-04-20T07:36:51 1776670611

There was some article today claiming the updated versions won't launch before October.

fy20 · 2026-04-19T09:43:42 1776591822

The work going into local models seems to be targeting lower RAM/VRAM which will definately help.

For example Gemma 4 32B, which you can run on an off-the-shelf laptop, is around the same or even higher intelligence level as the SOTA models from 2 years ago (e.g. gpt-4o). Probably by the time memory prices come down we will have something as smart as Opus 4.7 that can be run locally.

Bigger models of course have more embedded knowledge, but just knowing that they should make a tool call to do a web search can bypass a lot of that.