All briefs

May 22, 2026

12 stories cleared the bar, led by GitHub confirms breach of 3,800 repos via malicious VSCode extension, OpenAI to confidentially file for IPO as soon as Friday, and An OpenAI model has disproved a central conjecture in discrete geometry.

Worth mentioning

GitHub confirms breach of 3,800 repos via malicious VSCode extension

GitHub has confirmed that a malicious VSCode extension was used to steal developer credentials and access over 3,800 repositories. This is a supply chain attack vector targeting developer workstations directly. Immediate action: audit all installed VSCode extensions, remove anything unfamiliar or low-reputation, and check your repositories for unauthorized access or committed secrets.

bleepingcomputer.com

OpenAI to confidentially file for IPO as soon as Friday

OpenAI is filing confidentially with the SEC for an IPO, potentially as soon as Friday May 22. Going public changes OpenAI's corporate incentives significantly — quarterly earnings pressure, shareholder priorities, and regulatory scrutiny all increase. Builders relying on OpenAI APIs should watch for any pricing or rate limit changes that could follow increased investor visibility.

cnbc.com

An OpenAI model has disproved a central conjecture in discrete geometry

OpenAI's model found a valid counterexample to a longstanding conjecture in discrete geometry, verified by Fields medalist Tim Gowers. This is the first credible instance of a frontier AI model making a genuinely original mathematical contribution — not solving a known problem but disproving a believed-true conjecture. A landmark AI capability milestone with implications for how we think about frontier model reasoning.

openai.com

Cohere launches Command-A Plus as open weights

Cohere cofounder Nick Frosst confirmed on Reddit that Command-A Plus is being released as open weights (BF16 on HuggingFace). This is a significant open-source model release from a credible enterprise AI company targeting RAG and agentic use cases. Worth evaluating against current open-weight alternatives if you need an enterprise-focused open model.

reddit.com

Comparing coding agents: GitHub Copilot, Pi, Claude Code, and opencode with Qwen3.6 27B

A builder created a reusable test harness to run identical coding tasks across multiple AI agent environments (Copilot, Pi, Claude Code, opencode) with both cloud and local models. The key finding is that harness design contributes significantly to coding agent performance, independent of the underlying model. Relevant if you are choosing or designing a coding agent workflow.

reddit.com

Qwen 3.6 35B GGUF: NTP vs MTP quantization guide across GPUs and CPUs

ByteShape released Qwen 3.6 35B GGUF quantizations in both NTP and MTP families with benchmark-backed guidance on which to pick for different hardware configurations. If you are running Qwen 3.6 35B locally, this is the go-to reference for selecting the right GGUF and saves hours of personal benchmarking.

reddit.com

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

Detailed benchmarks of Qwen3.6 35B MoE on RTX 5080 16GB at real coding-agent context lengths. Key finding: MTP provides no benefit for this model at 128k context because memory bandwidth is the bottleneck. 56 tok/s is the practical ceiling. Useful data point for planning local inference for agentic code workloads.

reddit.com

AMD Ryzen AI Halo PC: $3,999 with 128GB unified memory

AMD announced pricing for their Ryzen AI Halo PC at $3,999 with 128GB of unified memory on-board. This establishes a new consumer-grade tier for local AI inference with large memory capacity, competing with Apple Silicon Mac Studio for local LLM use cases.

reddit.com

AMD BC-250 (salvaged PS5 APU board): $50-150 for 16GB GDDR6 local inference

Salvaged PS5 APU boards (AMD BC-250) are available on eBay at $50-150 each, featuring Zen 2, 16GB unified GDDR6, and RDNA 2. ROCm has been confirmed working on them. At this price point they are the cheapest viable GPU inference node available, significantly cheaper than any current discrete GPU with similar VRAM.

reddit.com

How fast is N tokens per second really? (interactive visualizer)

An interactive tool that converts raw tokens-per-second into human-comprehensible equivalents (reading speed, typing speed). Useful for calibrating expectations about LLM inference performance and explaining token generation speed to non-technical stakeholders or clients.

mikeveerman.github.io

Reviving old scanners with an in-browser Linux VM bridged to WebUSB over USB/IP

A builder created a system running a Linux VM inside the browser (via WebAssembly) that bridges physical USB scanners via WebUSB + USB/IP protocol. This enables legacy SANE-compatible scanners to work in-browser without native drivers. The architectural pattern — browser WASM VM + WebUSB bridge — is novel and potentially applicable to other USB hardware classes.

yes-we-scan.app

HuggingFace benchmark datasets now let you filter by model size

HuggingFace added a model size filter to their benchmark dataset view, enabling comparison of models within a given parameter budget. Minor but genuinely useful for model selection tasks.

reddit.com