A GitHub project called Ollama Model Tester is giving local AI users a straightforward way to compare model outputs from Ollama without installing extra Python packages.
The repository, published by developer ulyssestenn, centers on a command-line tool designed to run the same prompt across local Ollama models and save each result for later review. The goal is to make it easier to decide which model works best for a given task, or to compare repeated runs from the same model.
The project stands out for its minimal setup. According to the repository README, the tool relies only on the Python standard library, so users do not need to run pip installs before trying it. It requires Python 3.7 or newer, a local Ollama instance running on the default localhost port, and at least one model already pulled into the local environment.
Once Ollama is running, users can start the script with a single command. The tool then walks them through a short prompt-based workflow. It asks which installed model to use, accepts a multi-line prompt that is finished with a /done line, and then requests the number of runs, the temperature setting, and whether responses should stream live in the terminal.
The project also supports optional command-line flags for users who want to script the process. These include parameters for the model name, number of runs, temperature, prompt file input, and stream or no-stream behavior. Any setting that is not supplied ahead of time is still gathered interactively, allowing the tool to work for both hands-on testing and automated runs.
The tool saves results under an ollama-runs directory, grouping outputs by prompt. Each prompt gets its own folder with a short hash in the name, and the saved files include the original prompt, metadata for the runs, and response files for each model tested against that prompt.
That structure is meant to support side-by-side comparison. Because the folder is tied to the prompt rather than the model, users can run the same prompt against multiple models and keep everything in one place. The repository says each model file records every response alongside Ollama metadata such as token counts, timing information, and other run details.
Recent commits show the project is still being refined. One update trimmed the saved metadata to keep only the most useful fields, removing larger pieces of data that were considered unnecessary for comparison. Another commit added an MIT license, while the repository also ignores run output and Python cache files so they do not clutter version control.
The project is aimed at users who already work with local large language models and want a simple way to evaluate them without a larger framework. Its topic tags on GitHub include Python, CLI, model evaluation, AI tools, prompt testing, local LLM, and Ollama.
The repository had 48 stars at the time of the source material and no published releases. While small in scope, the tool addresses a common workflow for local AI experimentation, where users often need to compare models on the same prompt set and inspect the outputs closely.
By focusing on prompt reuse, saved outputs, and a dependency-free setup, Ollama Model Tester offers a practical option for developers and hobbyists who want to benchmark local models from the terminal.