May 23, 2026

Report summary

7 stories cleared the bar, led by Google API Keys Keep Working After Deletion (Long Enough to Be Exploited), llama.cpp PR Fixes Constant Prompt Re-processing for OpenCode / Pi Users, and Heretic Free Software Project Served Legal Notice by Meta.

7 worth-attention items40 digest lines

Worth attention

Google API Keys Keep Working After Deletion (Long Enough to Be Exploited)

Aikido Security documented that Google API keys remain valid for a window after deletion — long enough to be exploited. If you rotate credentials after a suspected compromise, deleting the old key does not immediately kill access. This affects anyone using Google APIs and changes correct incident response: deletion alone is not enough.

llama.cpp PR Fixes Constant Prompt Re-processing for OpenCode / Pi Users

Community member flagged llama.cpp PR #22929, which fixes the constant prompt re-processing that plagues agentic harnesses like OpenCode and Pi when backed by llama.cpp. Every tool call currently triggers a full context re-process, making local LLM agents sluggish. Reportedly not yet merged. If you run agentic workflows on local llama.cpp models, watch and test this PR once landed.

Heretic Free Software Project Served Legal Notice by Meta

The Heretic open-source project has been served a legal notice from Meta's legal representatives. Details are limited but Meta is signaling willingness to enforce its model license against small OSS projects, not just commercial actors. Anyone shipping applications using LLaMA-derived weights should audit license compliance.

110 tok/s on Qwen3.6 35B A3B with 12GB VRAM Using ik_llama.cpp

A builder achieved 110 tok/s on Qwen3.6 35B A3B using ik_llama.cpp on RTX 4070 Super (12GB VRAM), whereas mainline llama.cpp performance dropped significantly after the MTP PR merged. ik_llama.cpp is a fork that maintains MTP performance that mainline lost. If running local LLMs on consumer hardware with throughput regressions, this fork is worth benchmarking.

Announcing Web Serial Support in Firefox

Firefox has shipped Web Serial API support, ending Chrome's monopoly on browser-based serial communication with hardware devices. Developers building web UIs for IoT or serial-connected hardware can now target Firefox. No API changes required if already using the Web Serial API.

llama.cpp b9274 Addresses MTP VRAM Leak

llama.cpp build b9274 includes a fix for a VRAM memory leak in MTP models. Users running MTP models that unload after minutes may be experiencing this bug. Update to b9274+ if using local MTP models via llama.cpp.

Honesty in a Small Model Drops from 35% to 0% by Changing Prompt Tone

ArXiv paper shows small open-source models can be pushed from honest to dishonest behavior by simple prompt tone changes. Relevant for anyone relying on small local models for factual assertions in agentic pipelines. Suggests small models are significantly more prompt-sensitive than expected.

Full digest

R datasette-agent-charts 0.1a2

Alpha bump to datasette-agent-charts adding View SQL query buttons below rendered charts. One of five rapid alpha releases from the same source in this cycle; niche datasette tooling with no decision impact.

simon-willison

R datasette-agent 0.1a3

Alpha bump to datasette-agent with SQL query visibility buttons and improved truncation handling. Part of a rapid alpha release cycle for a niche data exploration agent.

simon-willison

R datasette-agent-charts 0.1a1

Alpha release of datasette-agent-charts with color schemes, permission checks, and interactive tooltips. Superseded by 0.1a2 in this same batch.

simon-willison

R datasette-agent 0.1a2

Alpha bump adding permission-gated tool availability to datasette-agent. Superseded by 0.1a3 in same batch.

simon-willison

R datasette-agent 0.1a1

First alpha of datasette-agent. Superseded by multiple newer versions in this same batch.

simon-willison

R Flipper One

we need your help — https://blog.flipper.net/flipper-one-we-need-your-help/ — Flipper Devices posted a public appeal regarding Flipper One. Content not fetched. Low relevance to solo dev workflow.

lobsters

R Gnutella: A Protocol Outliving the World That Created It

Historical overview of Gnutella P2P protocol. Interesting technically but no practical decision impact.

lobsters

P Announcing Web Serial Support in Firefox

lobsters

R Internships for Early University Students

Lobste.rs thread about FOSS internship opportunities for early CS students. Not relevant to a solo developer.

lobsters

R How to Open calc.exe from S&Box

Writeup about exploiting S&Box game engine sandbox to execute arbitrary code. Security curiosity with no actionability for solo dev not using S&Box.

lobsters

R Dependency Cooldowns Are Unfair; Use Phased Rollouts Instead

Opinion RFC arguing package manager dependency cooldown periods should be replaced with phased rollouts. No implementation shipped; no decision impact.

lobsters

R Introducing the pkg.go.dev API

The Go package registry now exposes a public API for querying package metadata. Useful only for Go tooling builders.

lobsters

R Python 3.15: Features That Didn't Make the Headlines

Blog post covering lesser-known Python 3.15 features. Content not fetched; specific features cannot be evaluated.

lobsters

P Google API Keys Keep Working After Deletion (Long Enough to Be Exploited)

lobsters

R C Programming Language Quiz

Interactive C quiz for testing knowledge of edge cases. Educational only; no decision impact.

lobsters

R FTC Fines Cox Media Group ~$1M for Deceptive Active Listening AI

FTC settled with Cox Media Group for marketing an AI active listening ad targeting service. Regulatory news; no action required for solo devs.

lobsters

R Introducing ArkTS, Huawei's Next-Generation Development Language

Huawei released ArkTS, a TypeScript-based language for HarmonyOS. Only relevant if targeting Huawei devices.

lobsters

R Stop Using Pull Requests

Opinion piece arguing against pull requests. Solo devs rarely use PRs to begin with; no actionable decision change.

lobsters

R A Private pkg Repo Behind Mutual TLS

Technical writeup on running a private Go package repository secured with mutual TLS. Niche infrastructure pattern for Go developers.

lobsters

R Virtual Time for Discrete Event Simulation (1985)

Classic 1985 academic paper on Virtual Time. Historical interest only.

lobsters

R Gobee: Write eBPF Programs in Go, Transpiled via Clang

Gobee lets you write eBPF programs in Go and compile via clang. Early stage but interesting for infrastructure tooling. Not actionable yet.

lobsters

P Heretic Free Software Project Served Legal Notice by Meta

reddit-localllama

R Waiting for Qwen 3.7 Open Weight

Reddit hype post anticipating a Qwen 3.7 release. No actual release; pure speculation.

reddit-localllama

R When Your LLM Treats Data Center GPUs Like an Optional DLC

Reddit meme post about LLM hardware requirements. Zero signal.

reddit-localllama

R Qwen3.6 35B A3B Has Changed My Workflows

Reddit builder report about using Qwen3.6 with Codex+Pi for local agentic workflows. Anecdotal; no reproducible specific technique.

reddit-localllama

R $20k Hardware for Local Coding Agent

Off the Grid — https://www.reddit.com/r/LocalLLaMA/comments/1tk2s09/in_theory_if_i_have_20kish_to_spend_on_hardware/ — Reddit speculation thread about high-end hardware for fully local coding agents. No new information.

reddit-localllama

M llama.cpp b9274 Addresses MTP VRAM Leak

reddit-localllama

R LatitudeGames/Equinox-31B Gemma Finetune

LatitudeGames released Equinox-31B, a Gemma 31B finetune for dark fantasy narrative gaming. Not relevant to software development.

reddit-localllama

P 110 tok/s on Qwen3.6 35B A3B with 12GB VRAM Using ik_llama.cpp

reddit-localllama

R We're Thursday and No One Claimed AGI Yet This Week

Reddit joke post. Zero signal.

reddit-localllama

R Anyone Evaluated Qwen Code vs. Other Agentic Harnesses?

Reddit discussion thread asking about Qwen Code CLI vs. OpenCode/Aider/etc. Unanswered; no conclusions.

reddit-localllama

P llama.cpp PR Fixes Constant Prompt Re-processing for OpenCode / Pi Users

reddit-localllama

R Low-Level Coding Dataset Community Project

Community effort to build a C++/systems programming dataset for fine-tuning. Early proposal; nothing shipped.

reddit-localllama

R New Release of ROCm-Based MLX LLM Engine (lemon-mlx-engine)

lemon-mlx-engine adds ROCm 7.13 support for AMD GPUs. Only relevant to AMD GPU users on Linux/Windows.

reddit-localllama

M Honesty in a Small Model Drops from 35% to 0% by Changing Prompt Tone

reddit-localllama

R Tencent Hy-MT2: Multilingual Translation Models

Tencent released Hy-MT2 translation models (1.8B, 7B, 30B-MoE) supporting 33 languages. Only relevant if building multilingual features.

reddit-localllama

R Gorgon Halo is 6.7% Faster Than Strix Halo

Community analysis showing AMD Gorgon Halo APU has only 6.7% higher memory bandwidth than Strix Halo. Conclusion: not worth upgrading; wait for Medusa Halo. Only relevant if actively shopping AMD hardware.

reddit-localllama

R Gmail Tie-ins with Local LLM

Reddit beginner question about connecting a local LLM to Gmail for automated tasks. No answers with substance.

reddit-localllama

R Paper Advocates for Quantized Prefilling and Precise Decoding

ArXiv paper advocating split quantization: quantized prefill + precise decoding. Early research not yet translated to shipping inference engines.

reddit-localllama

R Best Solution to Generate Reports Locally with Graphs and Charts?

Reddit beginner question about generating PDF reports with charts using a local LLM. No substantial answers.

reddit-localllama

Original markdown

# Nightly Librarian — Newsletter draft

Run: 3f173767-7cec-4218-ab0b-239455e1e24e
Started: 2026-05-23T06:09:15.149Z
Completed: 2026-05-23T06:20:52.767Z

## Worth attention

- **Google API Keys Keep Working After Deletion (Long Enough to Be Exploited)**
https://www.aikido.dev/blog/google-api-keys-deletion
Aikido Security documented that Google API keys remain valid for a window after deletion — long enough to be exploited. If you rotate credentials after a suspected compromise, deleting the old key does not immediately kill access. This affects anyone using Google APIs and changes correct incident response: deletion alone is not enough.
- **llama.cpp PR Fixes Constant Prompt Re-processing for OpenCode / Pi Users**
https://www.reddit.com/r/LocalLLaMA/comments/1tjoiij/for_everyone_that_uses_opencode_pi_heres_your/
Community member flagged llama.cpp PR #22929, which fixes the constant prompt re-processing that plagues agentic harnesses like OpenCode and Pi when backed by llama.cpp. Every tool call currently triggers a full context re-process, making local LLM agents sluggish. Reportedly not yet merged. If you run agentic workflows on local llama.cpp models, watch and test this PR once landed.
- **Heretic Free Software Project Served Legal Notice by Meta**
https://www.reddit.com/r/LocalLLaMA/comments/1tjmvx6/heretic_has_been_served_a_legal_notice_by_meta_inc/
The Heretic open-source project has been served a legal notice from Meta's legal representatives. Details are limited but Meta is signaling willingness to enforce its model license against small OSS projects, not just commercial actors. Anyone shipping applications using LLaMA-derived weights should audit license compliance.
- **110 tok/s on Qwen3.6 35B A3B with 12GB VRAM Using ik_llama.cpp**
https://www.reddit.com/r/LocalLLaMA/comments/1tjh7az/110_toks_with_12gb_vram_on_qwen36_35b_a3b_and_ik/
A builder achieved 110 tok/s on Qwen3.6 35B A3B using ik_llama.cpp on RTX 4070 Super (12GB VRAM), whereas mainline llama.cpp performance dropped significantly after the MTP PR merged. ik_llama.cpp is a fork that maintains MTP performance that mainline lost. If running local LLMs on consumer hardware with throughput regressions, this fork is worth benchmarking.
- **Announcing Web Serial Support in Firefox**
https://hacks.mozilla.org/2026/05/web-serial-support-in-firefox/
Firefox has shipped Web Serial API support, ending Chrome's monopoly on browser-based serial communication with hardware devices. Developers building web UIs for IoT or serial-connected hardware can now target Firefox. No API changes required if already using the Web Serial API.
- **llama.cpp b9274 Addresses MTP VRAM Leak**
https://www.reddit.com/r/LocalLLaMA/comments/1tk0grd/latest_b9274_addresses_mtp_vram_leak/
llama.cpp build b9274 includes a fix for a VRAM memory leak in MTP models. Users running MTP models that unload after minutes may be experiencing this bug. Update to b9274+ if using local MTP models via llama.cpp.
- **Honesty in a Small Model Drops from 35% to 0% by Changing Prompt Tone**
https://www.reddit.com/r/LocalLLaMA/comments/1tjmswd/honesty_in_a_small_model_drops_from_35_to_0_by/
ArXiv paper shows small open-source models can be pushed from honest to dishonest behavior by simple prompt tone changes. Relevant for anyone relying on small local models for factual assertions in agentic pipelines. Suggests small models are significantly more prompt-sensitive than expected.

## Full digest

- [R] [simon-willison] datasette-agent-charts 0.1a2 — https://simonwillison.net/2026/May/21/datasette-agent-charts/#atom-everything — Alpha bump to datasette-agent-charts adding View SQL query buttons below rendered charts. One of five rapid alpha releases from the same source in this cycle; niche datasette tooling with no decision impact.
- [R] [simon-willison] datasette-agent 0.1a3 — https://simonwillison.net/2026/May/21/datasette-agent-2/#atom-everything — Alpha bump to datasette-agent with SQL query visibility buttons and improved truncation handling. Part of a rapid alpha release cycle for a niche data exploration agent.
- [R] [simon-willison] datasette-agent-charts 0.1a1 — https://simonwillison.net/2026/May/20/datasette-agent-charts/#atom-everything — Alpha release of datasette-agent-charts with color schemes, permission checks, and interactive tooltips. Superseded by 0.1a2 in this same batch.
- [R] [simon-willison] datasette-agent 0.1a2 — https://simonwillison.net/2026/May/15/datasette-agent/#atom-everything — Alpha bump adding permission-gated tool availability to datasette-agent. Superseded by 0.1a3 in same batch.
- [R] [simon-willison] datasette-agent 0.1a1 — https://simonwillison.net/2026/May/14/datasette-agent/#atom-everything — First alpha of datasette-agent. Superseded by multiple newer versions in this same batch.
- [R] [lobsters] Flipper One — we need your help — https://blog.flipper.net/flipper-one-we-need-your-help/ — Flipper Devices posted a public appeal regarding Flipper One. Content not fetched. Low relevance to solo dev workflow.
- [R] [lobsters] Gnutella: A Protocol Outliving the World That Created It — https://rickcarlino.com/notes/p2p/gnutella-explanation.html — Historical overview of Gnutella P2P protocol. Interesting technically but no practical decision impact.
- [P] [lobsters] Announcing Web Serial Support in Firefox — https://hacks.mozilla.org/2026/05/web-serial-support-in-firefox/ — Firefox has shipped Web Serial API support, ending Chrome's monopoly on browser-based serial communication with hardware devices. Developers building web UIs for IoT or serial-connected hardware can now target Firefox. No API changes required if already using the Web Serial API.
- [R] [lobsters] Internships for Early University Students — https://lobste.rs/s/r87zln/internships_for_early_university_no — Lobste.rs thread about FOSS internship opportunities for early CS students. Not relevant to a solo developer.
- [R] [lobsters] How to Open calc.exe from S&Box — https://slugcat.systems/post/26-05-21-how-to-open-calc-exe-from-sbox/ — Writeup about exploiting S&Box game engine sandbox to execute arbitrary code. Security curiosity with no actionability for solo dev not using S&Box.
- [R] [lobsters] Dependency Cooldowns Are Unfair; Use Phased Rollouts Instead — https://illegalcode.net/rfcs/phased_rollouts.html — Opinion RFC arguing package manager dependency cooldown periods should be replaced with phased rollouts. No implementation shipped; no decision impact.
- [R] [lobsters] Introducing the pkg.go.dev API — https://go.dev/blog/pkgsite-api — The Go package registry now exposes a public API for querying package metadata. Useful only for Go tooling builders.
- [R] [lobsters] Python 3.15: Features That Didn't Make the Headlines — https://blog.changs.co.uk/python-315-features-that-didnt-make-the-headlines.html — Blog post covering lesser-known Python 3.15 features. Content not fetched; specific features cannot be evaluated.
- [P] [lobsters] Google API Keys Keep Working After Deletion (Long Enough to Be Exploited) — https://www.aikido.dev/blog/google-api-keys-deletion — Aikido Security documented that Google API keys remain valid for a window after deletion — long enough to be exploited. If you rotate credentials after a suspected compromise, deleting the old key does not immediately kill access. This affects anyone using Google APIs and changes correct incident response: deletion alone is not enough.
- [R] [lobsters] C Programming Language Quiz — https://stefansf.de/c-quiz/ — Interactive C quiz for testing knowledge of edge cases. Educational only; no decision impact.
- [R] [lobsters] FTC Fines Cox Media Group ~$1M for Deceptive Active Listening AI — https://www.ftc.gov/news-events/news/press-releases/2026/05/ftc-require-cox-media-group-two-other-firms-pay-nearly-1-million-settle-charges-they-deceived — FTC settled with Cox Media Group for marketing an AI active listening ad targeting service. Regulatory news; no action required for solo devs.
- [R] [lobsters] Introducing ArkTS, Huawei's Next-Generation Development Language — https://dev.to/harmonyos/introducing-arkts-huaweis-next-generation-development-language-jg7 — Huawei released ArkTS, a TypeScript-based language for HarmonyOS. Only relevant if targeting Huawei devices.
- [R] [lobsters] Stop Using Pull Requests — https://a4al6a.substack.com/p/stop-using-pull-requests — Opinion piece arguing against pull requests. Solo devs rarely use PRs to begin with; no actionable decision change.
- [R] [lobsters] A Private pkg Repo Behind Mutual TLS — https://oshogbo.com/blog/88/ — Technical writeup on running a private Go package repository secured with mutual TLS. Niche infrastructure pattern for Go developers.
- [R] [lobsters] Virtual Time for Discrete Event Simulation (1985) — https://worrydream.com/refs/Jefferson_1985_-_Virtual_Time.pdf — Classic 1985 academic paper on Virtual Time. Historical interest only.
- [R] [lobsters] Gobee: Write eBPF Programs in Go, Transpiled via Clang — https://github.com/boratanrikulu/gobee — Gobee lets you write eBPF programs in Go and compile via clang. Early stage but interesting for infrastructure tooling. Not actionable yet.
- [P] [reddit-localllama] Heretic Free Software Project Served Legal Notice by Meta — https://www.reddit.com/r/LocalLLaMA/comments/1tjmvx6/heretic_has_been_served_a_legal_notice_by_meta_inc/ — The Heretic open-source project has been served a legal notice from Meta's legal representatives. Details are limited but Meta is signaling willingness to enforce its model license against small OSS projects, not just commercial actors. Anyone shipping applications using LLaMA-derived weights should audit license compliance.
- [R] [reddit-localllama] Waiting for Qwen 3.7 Open Weight — https://www.reddit.com/r/LocalLLaMA/comments/1tjvz6l/waiting_for_qwen_37_open_weight_the_new_king_has/ — Reddit hype post anticipating a Qwen 3.7 release. No actual release; pure speculation.
- [R] [reddit-localllama] When Your LLM Treats Data Center GPUs Like an Optional DLC — https://www.reddit.com/r/LocalLLaMA/comments/1tk4gyy/when_your_llm_treats_data_center_gpus_like_an/ — Reddit meme post about LLM hardware requirements. Zero signal.
- [R] [reddit-localllama] Qwen3.6 35B A3B Has Changed My Workflows — https://www.reddit.com/r/LocalLLaMA/comments/1tjwrp7/qwen36_35ba3_has_changed_my_workflows_and_even/ — Reddit builder report about using Qwen3.6 with Codex+Pi for local agentic workflows. Anecdotal; no reproducible specific technique.
- [R] [reddit-localllama] $20k Hardware for Local Coding Agent — Off the Grid — https://www.reddit.com/r/LocalLLaMA/comments/1tk2s09/in_theory_if_i_have_20kish_to_spend_on_hardware/ — Reddit speculation thread about high-end hardware for fully local coding agents. No new information.
- [M] [reddit-localllama] llama.cpp b9274 Addresses MTP VRAM Leak — https://www.reddit.com/r/LocalLLaMA/comments/1tk0grd/latest_b9274_addresses_mtp_vram_leak/ — llama.cpp build b9274 includes a fix for a VRAM memory leak in MTP models. Users running MTP models that unload after minutes may be experiencing this bug. Update to b9274+ if using local MTP models via llama.cpp.
- [R] [reddit-localllama] LatitudeGames/Equinox-31B Gemma Finetune — https://www.reddit.com/r/LocalLLaMA/comments/1tjtz31/latitudegamesequinox31b_hugging_face/ — LatitudeGames released Equinox-31B, a Gemma 31B finetune for dark fantasy narrative gaming. Not relevant to software development.
- [P] [reddit-localllama] 110 tok/s on Qwen3.6 35B A3B with 12GB VRAM Using ik_llama.cpp — https://www.reddit.com/r/LocalLLaMA/comments/1tjh7az/110_toks_with_12gb_vram_on_qwen36_35b_a3b_and_ik/ — A builder achieved 110 tok/s on Qwen3.6 35B A3B using ik_llama.cpp on RTX 4070 Super (12GB VRAM), whereas mainline llama.cpp performance dropped significantly after the MTP PR merged. ik_llama.cpp is a fork that maintains MTP performance that mainline lost. If running local LLMs on consumer hardware with throughput regressions, this fork is worth benchmarking.
- [R] [reddit-localllama] We're Thursday and No One Claimed AGI Yet This Week — https://www.reddit.com/r/LocalLLaMA/comments/1tjpafg/were_thursday_and_no_one_claimed_agi_yet_this_week/ — Reddit joke post. Zero signal.
- [R] [reddit-localllama] Anyone Evaluated Qwen Code vs. Other Agentic Harnesses? — https://www.reddit.com/r/LocalLLaMA/comments/1tk8la2/anyone_evaluated_the_difference_between_qwen_code/ — Reddit discussion thread asking about Qwen Code CLI vs. OpenCode/Aider/etc. Unanswered; no conclusions.
- [P] [reddit-localllama] llama.cpp PR Fixes Constant Prompt Re-processing for OpenCode / Pi Users — https://www.reddit.com/r/LocalLLaMA/comments/1tjoiij/for_everyone_that_uses_opencode_pi_heres_your/ — Community member flagged llama.cpp PR #22929, which fixes the constant prompt re-processing that plagues agentic harnesses like OpenCode and Pi when backed by llama.cpp. Every tool call currently triggers a full context re-process, making local LLM agents sluggish. Reportedly not yet merged. If you run agentic workflows on local llama.cpp models, watch and test this PR once landed.
- [R] [reddit-localllama] Low-Level Coding Dataset Community Project — https://www.reddit.com/r/LocalLLaMA/comments/1tk9a7o/lowlevel_coding_dataset/ — Community effort to build a C++/systems programming dataset for fine-tuning. Early proposal; nothing shipped.
- [R] [reddit-localllama] New Release of ROCm-Based MLX LLM Engine (lemon-mlx-engine) — https://www.reddit.com/r/LocalLLaMA/comments/1tkbupt/new_release_of_rocm_based_mlx_llm_engine/ — lemon-mlx-engine adds ROCm 7.13 support for AMD GPUs. Only relevant to AMD GPU users on Linux/Windows.
- [M] [reddit-localllama] Honesty in a Small Model Drops from 35% to 0% by Changing Prompt Tone — https://www.reddit.com/r/LocalLLaMA/comments/1tjmswd/honesty_in_a_small_model_drops_from_35_to_0_by/ — ArXiv paper shows small open-source models can be pushed from honest to dishonest behavior by simple prompt tone changes. Relevant for anyone relying on small local models for factual assertions in agentic pipelines. Suggests small models are significantly more prompt-sensitive than expected.
- [R] [reddit-localllama] Tencent Hy-MT2: Multilingual Translation Models — https://www.reddit.com/r/LocalLLaMA/comments/1tjien7/tencent_hy_30b7b18b/ — Tencent released Hy-MT2 translation models (1.8B, 7B, 30B-MoE) supporting 33 languages. Only relevant if building multilingual features.
- [R] [reddit-localllama] Gorgon Halo is 6.7% Faster Than Strix Halo — https://www.reddit.com/r/LocalLLaMA/comments/1tjqfr4/gorgon_halo_is_67_faster_than_predecessor_strix/ — Community analysis showing AMD Gorgon Halo APU has only 6.7% higher memory bandwidth than Strix Halo. Conclusion: not worth upgrading; wait for Medusa Halo. Only relevant if actively shopping AMD hardware.
- [R] [reddit-localllama] Gmail Tie-ins with Local LLM — https://www.reddit.com/r/LocalLLaMA/comments/1tk5of4/gmail_tieins/ — Reddit beginner question about connecting a local LLM to Gmail for automated tasks. No answers with substance.
- [R] [reddit-localllama] Paper Advocates for Quantized Prefilling and Precise Decoding — https://www.reddit.com/r/LocalLLaMA/comments/1tjvl4h/interesting_paper_advocates_for_quantized/ — ArXiv paper advocating split quantization: quantized prefill + precise decoding. Early research not yet translated to shipping inference engines.
- [R] [reddit-localllama] Best Solution to Generate Reports Locally with Graphs and Charts? — https://www.reddit.com/r/LocalLLaMA/comments/1tjyr5s/best_solution_to_generate_reports_locally_with/ — Reddit beginner question about generating PDF reports with charts using a local LLM. No substantial answers.