| dc.contributor.advisor | Kim, Yoon | |
| dc.contributor.advisor | Ghassemi, Marzyeh | |
| dc.contributor.author | Puri, Isha | |
| dc.date.accessioned | 2025-11-17T19:08:08Z | |
| dc.date.available | 2025-11-17T19:08:08Z | |
| dc.date.issued | 2025-05 | |
| dc.date.submitted | 2025-08-14T19:33:10.349Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/163701 | |
| dc.description.abstract | Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating a pivot to scaling test-time compute. Existing deterministic inference-time scaling methods, usually with reward models, cast the task as a search problem, but suffer from a key limitation: early pruning. Due to inherently imperfect reward models, promising trajectories may be discarded prematurely, leading to suboptimal performance. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods. Our method maintains a diverse set of candidates and robustly balances exploration and exploitation. Our empirical evaluation demonstrates that our particle filtering methods have a 4–16x better scaling rate over deterministic search counterparts on both various challenging mathematical and more general reasoning tasks. Using our approach, we show that Qwen2.5-Math-1.5B-Instruct surpasses GPT-4o accuracy in only 4 rollouts, while Qwen2.5-Math-7B-Instruct scales to o1 level accuracy in only 32 rollouts. Our work not only presents an effective method to inference-time scaling, but also connects rich literature in probabilistic inference with inference-time scaling of LLMs to develop more robust algorithms in future work. Code, videos, and further information available at probabilistic-inference-scaling.github.io/ | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Probabilistic Inference for Inference Time Scaling of Language Models | |
| dc.type | Thesis | |
| dc.description.degree | S.M. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Science in Electrical Engineering and Computer Science | |