GPU Scanner

Oracle Cloud Infrastructure (OCI) GPU Scanner is a dedicated solution that provides observability, health checks, and performance monitoring for GPU workloads.

Why OCI GPU Scanner?

  • Centralized GPU monitoring

    OCI GPU Scanner is a centralized, managed solution for GPU performance monitoring that helps eliminate manual research and scripting, simplifying the benchmarking process.

  • Actionable insights

    OCI GPU Scanner minimizes downtime and false positives through comprehensive health checks, baseline comparisons, and automated diagnostics.

  • Cloud native flexibility

    OCI GPU Scanner offers customizable, tenant-wide visibility and team-specific insights that can help optimize resource sharing and cost management for enterprise-scale GPU clusters.

GPU Scanner features

  • Centralized GPU monitoring

    A managed, centralized solution that eliminates manual script-running and compatibility research across all regions in a tenancy. Enables sharding visibility for teams sharing large clusters.

  • Comprehensive health checks

    Detailed health checks for day zero (baseline), day one (active monitoring), and day two+ (ongoing diagnostics), including node, multinode, and advanced diagnostics with historical comparisons to pinpoint issues.

  • Vendor-neutral compatibility

    Supports NVIDIA and AMD GPUs, with plans to extend support to future chipmakers and next-gen architectures.

  • Tenant-level monitoring

    Monitors GPU resources across all regions without needing per-region installations, supporting Oracle Cloud Infrastructure Kubernetes Engine clusters, high performance computing clusters, bare metal, and virtual machines.

  • Cloud native integration

    Compatible with popular open source tools, including Grafana and Prometheus, allowing for customizable dashboards and seamless data storage/export for customer use cases.

  • Actionable insights and automation

    Provides recommended remediation actions (for example, reboot for GPU off-bus errors) and automates health checks via API or portal, reducing customer downtime and false positives.

Get started with OCI GPU Scanner

Access AI subject matter experts

Get help with building your next AI solution or deploying your workload on OCI GPU Scanner.

  • They can answer questions such as

    • How do I get started with Oracle Cloud?
    • What kinds of AI workloads can I run on OCI?
    • What types of AI services does OCI offer?

See how to apply AI today

Enter a new era of productivity with generative AI solutions for your business. Learn how Oracle helps customers leverage AI embedded across the full technology stack.

  • What can you achieve with Oracle AI?

    • Fine-tune LLMs in OCI
    • Automate invoice processing
    • Build a chatbot with RAG
    • Summarize web content with generative AI
    • And so much more!

Additional resources

Learn more about RDMA cluster networking, GPU instances, bare metal servers, and more.

See how much you can save with OCI

Oracle Cloud pricing is simple, with consistent low pricing worldwide, supporting a wide range of use cases. To estimate your low rate, check out the cost estimator and configure the services to suit your needs.

Experience the difference

  • 1/4 the outbound bandwidth costs
  • 3X the compute price-performance
  • Same low price in every region
  • Low pricing without long term commitments

注:为免疑义,本网页所用以下术语专指以下含义:

  1. 除Oracle隐私政策外,本网站中提及的“Oracle”专指Oracle境外公司而非甲骨文中国 。
  2. 相关Cloud或云术语均指代Oracle境外公司提供的云技术或其解决方案。