NVIDIA and AWS Expand AI Deployment Tools With New GPU Instances and Vector Search Support

NVIDIA and AWS deepen their AI infrastructure partnership

NVIDIA and Amazon Web Services are expanding their collaboration with a set of updates aimed at helping companies run AI systems in production more efficiently. The announcements focus on three areas that are often bottlenecks for enterprise AI: inference performance, vector search, and large-scale training infrastructure.

The companies said the changes are designed to reduce operational complexity while improving speed and scalability across workloads that include AI, graphics, data analytics, and retrieval systems. The updates span Amazon EC2, Amazon OpenSearch, and NVIDIA validation for AWS training infrastructure.

New EC2 G7 instances add RTX PRO 4500 GPUs

At the center of the announcement are new Amazon EC2 G7 instances powered by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs. AWS says the new instances are intended for production environments that need strong performance without the burden of managing a customer-owned GPU stack.

According to NVIDIA, G7 instances can deliver up to 4.6 times the AI inference performance of G6 instances. The company also says graphics performance can be up to 2.1 times higher, while data analytics workloads can run faster through NVIDIA cuDF support for Apache Spark on Amazon EMR.

The instances support as many as eight GPUs, 256GB of total GPU memory, up to 700 Gbps of EFA-enabled networking, and as much as 7.6TB of local NVMe SSD storage. AWS plans one-, two-, four-, and eight-GPU configurations, plus a bare-metal option that is listed as coming soon.

NVIDIA and AWS said the new instance family is meant to help customers size infrastructure more precisely instead of overbuying capacity. The company positioned G7 as a single platform for a range of users, including AI teams that need lower-latency inference, media customers working with rendering and video, and data teams handling analytics and vector database pipelines.

The new instances are available through several AWS services and tools, including Deep Learning AMIs, Deep Learning Containers, Amazon EMR, Amazon EKS, Amazon ECS, and graphics AMIs. Support in Amazon SageMaker AI is expected later.

OpenSearch Serverless gets GPU-accelerated vector indexing

The companies also announced that the next version of Amazon OpenSearch Serverless will use GPU-accelerated vector indexing powered by NVIDIA cuVS as the default option for vector collections.

That matters for developers building retrieval-augmented generation systems, semantic search, recommendation engines, and agentic AI applications, all of which depend on fast access to large embedding databases. NVIDIA said the change makes GPU-based vector search a built-in AWS capability rather than a specialist optimization.

NVIDIA said the approach can make vector indexing up to 10 times faster at one-quarter of the cost compared with CPU-only implementations. The companies also said it could make billion-scale vector databases practical to build in less than an hour.

By pushing the compute work into GPU acceleration by default, AWS customers can use serverless scaling to reduce management overhead when workloads are not active.

AWS earns NVIDIA Exemplar Cloud status for GB300

In a separate milestone, AWS has received NVIDIA Exemplar Cloud status for GB300 training workloads. NVIDIA said the designation means AWS met the performance thresholds used to benchmark AI infrastructure against NVIDIA's reference architecture.

The result comes from joint engineering work between the two companies. NVIDIA said the recognition should give developers and AI teams more confidence when comparing cloud options for large training jobs and when trying to move projects from planning into production.

Taken together, the announcements show how AWS and NVIDIA are trying to cover more of the AI stack with integrated infrastructure, from model training to inference and retrieval. The emphasis throughout is on making large-scale AI deployments faster to launch and easier to operate.