• 0 Posts
  • 3 Comments
Joined 6 months ago
cake
Cake day: August 27th, 2025

help-circle
  • Not sure how we’re quantifying intelligence here. Benchmarks?

    Qwen3-4B 2507 Instruct (4B) outperforms GPT-4.1 nano (7B) on all stated benchmarks. It outperforms GPT-4.1 mini (~27B according to scuttlebutt) on mathematical and logical reasoning benchmarks, but loses (barely) on instruction-following and knowledge benchmarks. It outperforms GPT-4o on a few specific domains (math, creative writing), but loses overall (because of course it would). The abliterated cooks of it are stronger yet in a few specific areas too.

    https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF

    So, in that instance, a 4B > 7B (globally), 27B (significantly) and 500B(?) situationally. I’m pretty sure there are other SLMs that achieve this too, now (IBM Granite series, Nanbiege, Nemotron etc)

    It sort of wild to think that 2024 SOTA is ~ ‘strong’ 4-12B these days.

    I think (believe) that we’re sort of getting to the point where the next step forward is going to be “densification” and/or architecture shift (maybe M$ can finally pull their finger out and release the promised 1.58 bit next step architectures).

    ICBW / IANAE



  • Additionally, in windows (linux too?) one could use Moonlight / Sunshine to compute on the GPU and stream to secondary device (either directly, like say to a Chromecast, or via the iGPU to their monitor). Latency is quite small in most circumstances, and allows for some interesting tricks (eg: server GPUs allow you to split GPU into multiple “mini-gpus” - essentially, with the right card, you could host two+ entirely different, concurrent instances of GTA V on one machine, via one physical GPU).

    A bit hacky, but it works.

    Source: I bought a Tesla P4 for $100 and stuck it in a 1L case.

    GPU goes brrr