Graphsignal launches profiler for production-scale AI inference visibility

Graphsignal has released Graphsignal Profiler, a new observability tool aimed at helping teams inspect and optimize AI inference workloads running at production scale. The open-source project is designed to give engineers a clearer view into how models, engines, GPUs and other accelerators behave during live inference.

The company says the profiler is built for continuous visibility across the inference stack rather than isolated snapshots. According to the project documentation, it provides high-resolution timelines that track operation durations and resource use, along with generation tracing for large language model workloads. The tracing layer includes per-step timing, token throughput and latency breakdowns for supported inference frameworks.

Graphsignal also says the profiler collects system-level metrics from inference engines and underlying hardware, including CPU, GPU and accelerator data. In addition, it monitors device-level failures and other inference errors, which can help teams identify issues that affect reliability or performance in production environments.

Focus on production deployments

The tool is intended for teams that need to measure behavior in deployed systems rather than in benchmarks alone. Graphsignal positions the profiler as a way to uncover bottlenecks across the full inference path and to support targeted optimization work. The company says it can also surface telemetry for AI agents, a category that is increasingly important as more applications add agentic workflows on top of model inference.

Installation options are tailored to common GPU setups. Users can install the package with support for CUDA 12.x or CUDA 13.x, either through uv tool install or directly into a workload environment with pip install. For applications that bootstrap themselves, Graphsignal also offers a Python API called graphsignal.watch().

To run the profiler, users wrap their launch command with graphsignal-run and provide an account API key through the GRAPHSIGNAL_API_KEY environment variable. The documentation also notes support for custom tags using GRAPHSIGNAL_TAG_<KEY> variables.

Graphsignal says the profiler integrates with several inference and machine learning stacks, including PyTorch, vLLM and SGLang. The company directs users to its documentation for CLI and API references as well as integration guides.

Minimal overhead claims

The company emphasizes that the profiler is intended to have limited impact on production performance. Graphsignal says CUDA kernel activity is gathered through CUPTI using low-overhead APIs, while analysis and upload are handled in a sidecar process.

Security and privacy are also highlighted in the documentation. Graphsignal says the profiler makes outbound connections only to api.graphsignal.com and does not accept inbound connections or commands. It also says that sensitive content such as prompts and completions is not recorded.

Alongside the profiler release, Graphsignal points users to its broader observability platform for monitoring and analysis. The company says users can log in to review signals, and that an AI optimization workflow is available through a Graphsignal skill for agents such as Claude Code, Codex and Gemini.

The repository for Graphsignal Profiler is publicly available on GitHub and is listed under the Apache-2.0 license. At the time of publication, the project page shows more than 200 stars and one primary contributor.