@SuspciousCarrot78

SuspciousCarrot78@lemmy.world · edit-2 12 hours ago

Not sure how we’re quantifying intelligence here. Benchmarks?

Qwen3-4B 2507 Instruct (4B) outperforms GPT-4.1 nano (7B) on all stated benchmarks. It outperforms GPT-4.1 mini (~27B according to scuttlebutt) on mathematical and logical reasoning benchmarks, but loses (barely) on instruction-following and knowledge benchmarks. It outperforms GPT-4o on a few specific domains (math, creative writing), but loses overall (because of course it would). The abliterated cooks of it are stronger yet in a few specific areas too.

https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF

So, in that instance, a 4B > 7B (globally), 27B (significantly) and 500B(?) situationally. I’m pretty sure there are other SLMs that achieve this too, now (IBM Granite series, Nanbiege, Nemotron etc)

It sort of wild to think that 2024 SOTA is ~ ‘strong’ 4-12B these days.

I think (believe) that we’re sort of getting to the point where the next step forward is going to be “densification” and/or architecture shift (maybe M$ can finally pull their finger out and release the promised 1.58 bit next step architectures).

ICBW / IANAE

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

Qwen3-4B HIVEMIND (abliterated) got it in 2, though it scores a lot higher on PIQA, HellaSwag and Winogrande benchmarks than normal Qwen3-30B. I think the new abliteration methods actually strengthen real world understanding.

https://imgur.com/a/7YZme4i

https://imgur.com/a/25ApzDN

I wonder if an abliterated VL model could do even better? They tend to have the best real world model benchmarks. Perhaps a Qwen3-VL-30B ablit (if such a thing exists) could one shot this.

I’d like to think a lot of these gotcha prompts rely on verbal misunderstanding, rather than failure in world models, but I can’t say that for certain.

PS: Saw a pearler of a response to this: Chatgpt recommend “yeah, lift the car and carry it on your back. Make sure to bend your knees” (though I’m guessing someone edited that for the lulz)

SuspciousCarrot78@lemmy.world · 1 month ago

Additionally, in windows (linux too?) one could use Moonlight / Sunshine to compute on the GPU and stream to secondary device (either directly, like say to a Chromecast, or via the iGPU to their monitor). Latency is quite small in most circumstances, and allows for some interesting tricks (eg: server GPUs allow you to split GPU into multiple “mini-gpus” - essentially, with the right card, you could host two+ entirely different, concurrent instances of GTA V on one machine, via one physical GPU).

A bit hacky, but it works.

Source: I bought a Tesla P4 for $100 and stuck it in a 1L case.

GPU goes brrr