Oracle Cloud Infrastructure (OCI) GPU Scanner is a dedicated solution that provides observability, health checks, and performance monitoring for GPU workloads.
Don’t miss our exclusive live demo on October 30 where we’ll showcase the deployment of Llama on OCI. See the latest generative AI technologies in action, explore real-world use cases, and learn how to build smarter, more automated workflows step by step.
OCI's top architects reveal how cluster networks power scalable GenAI—from a few GPUs to a zettascale OCI Supercluster with 131,072 NVIDIA Blackwell GPUs.
Oracle AI infrastructure is scalable, performant, and deployable anywhere. See why we stand out with industry-leading scalability, bare metal GPU instances, and more.
Discover the analyst’s perspective on OCI AI infrastructure with AMD GPUs and how this combination can improve productivity, accelerate time to value, and reduce energy costs.
OCI GPU Scanner is a centralized, managed solution for GPU performance monitoring that helps eliminate manual research and scripting, simplifying the benchmarking process.
OCI GPU Scanner minimizes downtime and false positives through comprehensive health checks, baseline comparisons, and automated diagnostics.
OCI GPU Scanner offers customizable, tenant-wide visibility and team-specific insights that can help optimize resource sharing and cost management for enterprise-scale GPU clusters.
A managed, centralized solution that eliminates manual script-running and compatibility research across all regions in a tenancy. Enables sharding visibility for teams sharing large clusters.
Detailed health checks for day zero (baseline), day one (active monitoring), and day two+ (ongoing diagnostics), including node, multinode, and advanced diagnostics with historical comparisons to pinpoint issues.
Supports NVIDIA and AMD GPUs, with plans to extend support to future chipmakers and next-gen architectures.
Monitors GPU resources across all regions without needing per-region installations, supporting Oracle Cloud Infrastructure Kubernetes Engine clusters, high performance computing clusters, bare metal, and virtual machines.
Compatible with popular open source tools, including Grafana and Prometheus, allowing for customizable dashboards and seamless data storage/export for customer use cases.
Provides recommended remediation actions (for example, reboot for GPU off-bus errors) and automates health checks via API or portal, reducing customer downtime and false positives.
Get help with building your next AI solution or deploying your workload on OCI GPU Scanner.
Enter a new era of productivity with generative AI solutions for your business. Learn how Oracle helps customers leverage AI embedded across the full technology stack.
Learn more about RDMA cluster networking, GPU instances, bare metal servers, and more.
Oracle Cloud pricing is simple, with consistent low pricing worldwide, supporting a wide range of use cases. To estimate your low rate, check out the cost estimator and configure the services to suit your needs.
注:为免疑义,本网页所用以下术语专指以下含义:
Live Demo Day: Oracle, Meta, and NVIDIA Experts Deploy Llama on OCI
First Principles: Zettascale OCI Superclusters
Accelerating AI workloads with OCI (PDF)
Enterprise Strategy Group on AMD Instinct MI300X