High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction (jchandra.com)

16 points by jchandra 2 days ago

1 comment:

by vivahir215 2 days ago

Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?

Data from: Hacker News, provided by Hacker News (unofficial) API