Instaclustr, Author at Instaclustr

Apache Kafka® security: The 5 non-negotiables for secure data streaming

Instaclustr — Mon, 12 May 2025 15:11:45 +0000

Securing real-time data streaming

Imagine you’re flying 30,000 feet in the air, trying to doze off, but you’re also tempted to catch up on that show you’ve been meaning to finish for the last several years. In the meantime, you’re being jolted awake by turbulence along the way—maybe causing a bit of panic here and there. But, hey, you still have several hours left until landing, so you manage to doze off after all. And all the while, that airplane–and every other airplane currently in the sky–is streaming real-time data back to different stakeholders:

The turbine company monitoring for any abnormalities
The airline keeping track of its geographical location
Weather monitoring for turbulence from other planes on the same route

This data streaming keeps your flight safe, smooth, and on time. Now, imagine how critical security becomes when sensitive data is involved. Enterprises relying on data streaming must prioritize robust security measures to protect their operations.

Real-time fraud detection, hyper-personalized shopping experiences, data from IoT devices, AI insights…the list of use cases for data streaming is extensive and growing.

And along with a growing need for data streaming is the need to make sure that data is secure.

But with so many changes going on (here’s looking at you AI) how can enterprises secure their data streaming? For organizations leveraging Apache Kafka® to power their data infrastructure, here are five essential best practices to ensure Kafka security and long-term scalability.

Secure your Kafka streams with confidence

Download our white paper on Kafka compliance strategies and learn how to build scalable and secure streaming architectures

Download the white paper

1. Stay on top of regular updates and patching notices

This one’s a no-brainer—or rather, it should be a no brainer, but unfortunately, it’s not. The sheer number of enterprises that don’t stay on top of regular updates and patching for their software is uncomfortably larger than it should be.

In 2024 alone, there were a record-breaking 40,009 CVEs announced, and so far in 2025, we’re well on our way to surpassing that number by a good margin—some are already estimating that we’ll have upwards of 50,000 CVEs this year. With so many vulnerabilities out there (and growing), maintaining regular patching and staying on top of software updates is imperative. Thankfully, the solution is straightforward.

Don’t kick the can down the road. When a CVE is announced, action it. When a software update is rolled out, prioritize updating it. This will help save you from far more severe problems later.

2. Create strong authentication and access controls

How many people have access to your database? How easy it is to access it? Does everyone really need to have that access to it? Ask yourself these questions and you could very well be surprised by the answer.

Use role-based access control (RBAC) or attribute-based access control (ABAC) to limit access to data streams based on user roles or attributes.

Control access to resources by setting up fine-grained access control lists (ACLs). This ensures that only authorized users or applications can read from or write to specific topics/channels.

By creating strong authentication and access controls, you’re helping to prevent unauthorized access, protect sensitive data, and ultimately mitigating security risks.

3. Keep it private: Use VPC peering and private networks

What’s the best way to enhance the confidentiality, integrity, and availability as you scale your data streams? Utilize privacy measures like VPC peering and private networks.

Why? Using VPC peering and private network connections ensures that your data streams are transmitted securely within a controlled environment, reducing exposure to external threats and improving performance.

This is beneficial for a few reasons:

It reduces the risk of attacks like man-in-the-middle (MITM), DDoS, or unauthorized access that can occur when data is exposed to the public internet.
By keeping data within a private network, you eliminate the need for public IP addresses, which are more vulnerable to exploitation.
Publicly accessible endpoints increase the attack surface for your data streaming infrastructure. With private connections, only internal resources (e.g., producers, consumers, and brokers) can communicate, significantly reducing the risk of external threats.

By taking active privacy measures like VPC peering and private networks, you’re helping to build a robust and secure data streaming architecture.

4. Stay compliant with data privacy rules and regulations

Admittedly, this one is easier said than done—but that doesn’t mean it’s not worth investing in.

It’s no surprise that many industries that rely heavily on streaming data (like finance and healthcare) have strict compliance requirements: GDPR, HIPAA, PCI-DSS, etc. and these requirements aren’t going away anytime soon—if anything, they’ll become more stringent.

While using VPC and private network connections can help meet these regulatory requirements, you can—and should—take it further. If you’re at the early stages of architecture design, then you can start by implementing privacy principles from the get-go. Incorporate privacy controls and data anonymization techniques to minimize the exposure of sensitive data.

Beyond the early-stage position? Not to worry, there’s still a lot you can do:

Implement monitoring mechanisms to detect and report any potential data breaches or non-compliance incidents.
Consistently audit logs, access attempts, and permissions.
Define data retention policies to ensure sensitive data is securely deleted or anonymized.
Train employees on data privacy rules, best practices, and regulations to build a culture of security and awareness across your organization.

5. Stick with open source

And finally, what’s the best way to secure data streaming and help you scale for whatever your enterprise may face?

Use open source software.

Sure, that may seem counterintuitive at first; after all, the actual source code for open source (as the name suggests) is literally out in the open for all to see. But open source provides security benefits that its proprietary counterparts simply cannot match.

Apache Kafka® has proved itself to become a leading streaming data technology, and for good reason. Its distributed architecture is designed to scale seamlessly, making it capable of handling massive volumes of real-time data without performance degradation.

Plus, Kafka’s flexibility enables it to easily integrate with AI-driven systems—a key driver of increased workloads both now and in the future. Being open source, you get numerous security benefits that propriety code simply cannot match: a strong community bringing transparency and quick updates to any security vulnerabilities that can (and will) arise.

Final thoughts on Kafka security

AI is changing everything, and enterprises are already living it.

According to the 2024 Data complexity report from NetApp, 69% of enterprises are already noting an increase in security threats because of AI. Most C-level executives list global security challenges as their main stressor, which will only increase in 2025 and beyond.

But the demand for streaming data is not slowing down anytime soon. Financial institutions still need to stay ahead of the curve with fraud detection. Healthcare providers want to deliver the best possible outcome. Retailers are coming up with new ways to make the shopping experience as personalized as possible.

And airplanes—with all the real-time data monitoring keeping them in the sky—still need to fly; odds are, that data is streaming with Apache Kafka, too.

Want to know exactly how to implement these best practices?

The post Apache Kafka® security: The 5 non-negotiables for secure data streaming appeared first on Instaclustr.

Product update: April 2025

Instaclustr — Tue, 08 Apr 2025 00:01:26 +0000

Major Announcements

ClickHouse®

Instaclustr for ClickHouse® available in Preview on Azure

Instaclustr for ClickHouse® is now available in Preview on Azure, following our recent ClickHouse GA release on AWS in November 2024.

ClickHouse, an open source, column-oriented DBMS, is ideal for real-time analytics, log analysis, and machine learning. The Preview release includes features such as provisioning and monitoring support accessible through our console, API, and Terraform Provider, with continuous enhancements being added towards GA.

Instaclustr Managed Platform

Instaclustr’s commitment to data protection: Achieving PCI DSS 4.0 compliance

Instaclustr has recently achieved compliance with PCI DSS 4.0, the latest Payment Card Industry Data Security Standard. Our Apache Cassandra®, Apache Kafka®, OpenSearch®, Valkey™, and Cadence® offerings on AWS and GCP, are all PCI DSS compliant.

PCI DSS 4.0 represents the most comprehensive update to the standard since version 3.2.1 was released in 2018. This update addresses emerging threats and technologies by enhancing security requirements to tackle modern security challenges and promoting security as a continuous process. Since obtaining our initial PCI certification in 2020, the implemented security controls have consistently improved security levels for all our Managed Service customers, regardless of the product or platform.

OpenSearch

Observability plugin is now available for OpenSearch on the Instaclustr Managed Platform

The Observability Plugin for OpenSearch is now accessible on the Instaclustr Managed Platform. It provides comprehensive insights into application and infrastructure performance and operations through OpenSearch Dashboards.

Available to all Instaclustr OpenSearch customers, the Observability Plugin supports various monitoring, analysis, and data visualization use cases.

Other Significant Changes

Apache Cassandra

Apache Cassandra 3.11.19.1, 4.0.17, 4.1.8, and 5.0.3 are released into General Availability in response to CVEs reported by the Apache Cassandra project community.
Shotover Proxy 0.6.0 for Cassandra-AWS PrivateLink has been released.

Apache Kafka

Apache Kafka and Kafka® Connect 3.9.0 are GA with support for Tiered Storage.
Apache Kafka and Kafka Connect 3.8.1 are also now GA.

OpenSearch

The OpenSearch Mapper Size plugin has been released, allowing configuring indices to automatically store document sizes.

ClickHouse

ClickHouse Domain integration allows clusters to use URL table functions to read from and write data to the specified domain.
Overriding certain ClickHouse configuration parameters is now supported via the console, API and Terraform Provider

Valkey

On-premises provisioning is now available for Valkey

Instaclustr Managed Platform

Added support for AWS Malaysia (ap-southeast-5) region for all managed offerings.
To comply with PCI 4.0 requirements, the ‘Don’t challenge me on this device for 7 days’ option for MFA code entry on the Console has been disabled for PCI-enabled accounts. As a result, users with PCI-enabled accounts will now be prompted to enter the MFA code for each login, regardless of the device used.

Upcoming Releases

ClickHouse

ClickHouse is advancing towards GA support for Azure, enabling customers to benefit from both the power of ClickHouse for real-time analytics and the wide array of services available through Azure.

OpenSearch

OpenSearch version 3.0 is launching soon on the Instaclustr Managed Platform. This exciting new release includes search performance improvements, increased stability and enhancements to AI, ML and vector search capabilities. Keep an eye on our blog for more information in the coming months.

PostgreSQL

Support for PCI compliance in PostgreSQL will be available soon, enabling customers with PCI requirements to use PostgreSQL on the Instaclustr Managed platform.

Cadence

We’re excited to announce that the latest version of Cadence, including Cadence Web 4.0, is coming soon to the NetApp Instaclustr managed platform. This update, currently in beta for the Cadence project, features Multi-Cluster support, enhanced performance, and a sleek new user interface for an improved experience.

Did You Know?

Potential customers often ask why they need a managed service instead of a Kubernetes operator. Read our blog Why pay for a managed service when I could just use a Kubernetes operator? to discover the hidden benefits of managed services over Kubernetes operators and learn why relying on experts can save you from unexpected pitfalls and ensure seamless operation.
The User Actions feature, available since May 2024, streamlines audits by offering detailed logging and tracking of all user actions across organizations, accounts, and clusters. Explore more about how the feature works in action with our two blogs: A Detailed Walkthrough of the Feature in the Console: Who Did What and When and Automating the Data Audit Process with the Instaclustr User Actions API.

The post Product update: April 2025 appeared first on Instaclustr.

Harnessing managed open source: The future of data infrastructure

Instaclustr — Tue, 04 Mar 2025 09:00:26 +0000

Data isn’t just a byproduct of business—it’s the driving force behind smarter decisions, innovative strategies, and future growth. As businesses put more focus on their data strategies, intelligent data infrastructure is becoming essential. This year, open source technologies are set to transform how companies store, process, and analyze their data.

What does this mean for you? It’s an opportunity to stay ahead by leveraging the latest trends in data infrastructure. Whether you’re exploring managed open source solutions or scaling your existing systems, these advancements can help your organization unlock the full potential of your data.

Let’s dive into the key trends shaping the future and how they can empower your business.

1. The rise of unified data storage

Managing massive amounts of data is one thing—managing it efficiently across various environments is another. In 2025, organizations will pursue unified and more intelligent data storage solutions that integrate seamlessly across on-prem, hybrid, and cloud systems. Open source tools will play a pivotal role in this transformation, offering innovation in flexibility and interoperability.

Impact: Unified storage solutions will increase demand for open source communities that actively innovate and improve capabilities for advanced data management. Open source’s flexibility means businesses can customize solutions to meet their exact needs, but with this comes the responsibility of scaling and managing the complexity of these ecosystems to ensure projects remain efficient and sustainable.

2. The resurgence of real-time data processing

The demand for real-time insights is accelerating. Whether it’s detecting fraud, powering recommendation engines, or managing IoT systems, businesses need data processing infrastructure that operates with minimal latency. Open source technologies like Apache Kafka® are already leading the charge, but in 2025, the focus will shift to expanding use cases, improving accessibility, and integrating more cost-effective solutions like tiered storage.

Tiered storage is proving to be a game-changing component, enabling organizations to optimize costs and scale their real-time systems by intelligently managing data. Frequently accessed, high-priority information is stored on faster, premium tiers for immediate access, while less critical data is shifted to more economical, lower-performance tiers. This ensures that real-time systems remain agile and cost-efficient, even as data volumes grow.

Impact: Enterprises will benefit immensely from leveraging open source technologies like Apache Kafka, supported by tiered storage, to build robust, scalable, and efficient data pipelines that meet the demands of modern workloads. By balancing performance and cost, tiered storage not only simplifies maintaining low-latency systems but also makes real-time actionable intelligence more accessible for businesses of all sizes.

3. Privacy-first and distributed data architectures

With stricter data privacy regulations and rising concerns over centralization, open source projects will increasingly focus on privacy-first and distributed data solutions. Technologies like federated learning and decentralized storage (think IPFS and Pachyderm) will allow organizations to process data without moving or exposing it entirely.

In 2025, we’ll see more open source projects designed to enable secure, collaborative data sharing—allowing businesses to analyze data across borders and internal silos while staying compliant with privacy laws. Expect new frameworks for maintaining data sovereignty and preventing breaches while enabling the analysis of sensitive datasets.

Impact: Healthcare and telecommunications industries will likely lead adoption, balancing stringent regulation with the need for data insights. Implementing distributed architectures, however, may require a significant up-front investment.

4. AI-driven search capabilities

AI-driven search capabilities will redefine how businesses extract value from their data. By incorporating advanced natural language processing (NLP) and machine learning models OpenSearch® is enabling highly intuitive, real-time search experiences that adapt to user behavior and uncover deep insights across vast datasets. This evolution aligns perfectly with the demands of modern workloads such as real-time analytics, customer personalization, and operational intelligence, empowering businesses to optimize data infrastructure for speed and seamless access.

Impact: Enhanced search functionality improves operational efficiency by reducing the time spent navigating and retrieving critical information. Simultaneously, the ability to analyze data in real time enriches customer experiences through faster, more relevant service delivery. For organizations leveraging data-driven decision-making, OpenSearch provides a powerful foundation for surfacing actionable intelligence, driving more informed strategies, and unlocking new opportunities for innovation.

5. The evolution of open source data warehousing

Open source data warehouses like ClickHouse® will continue to evolve, offering lighter, faster, and more scalable solutions. In 2025, they’ll compete head-on with proprietary systems, integrating deeper analytics capabilities and machine learning extensions. The trend will shift from businesses simply adopting open source solutions for cost-saving purposes to leveraging them as central components of modern data strategies. We’ll also see enterprise-focused, managed versions of these technologies gaining traction, allowing even smaller businesses to unlock advanced analytics without high overhead.

Impact: Enterprises, which rely on granular operational data, will leverage open source warehouses to drive efficiency and innovation. The challenge will lie in handling large-scale migrations and workforce training to adopt new tools effectively.

6. The re-prioritization of infrastructure modernization

Open source technologies are transforming how critical workloads like VMware, Oracle, and other enterprise systems operate. A key trend for 2025 is enhanced interoperability, with open source technologies—like Kubernetes and Apache Kafka, enabling seamless integration across hybrid environments. This allows legacy systems to connect easily with modern cloud infrastructures, helping enterprises modernize without replacing existing investments.

Another major development is improved scalability. Open source technologies now support dynamic scaling to handle demanding workloads—from large databases to resource-heavy applications. This flexibility helps businesses manage unpredictable demands while maintaining strong performance and efficiency.

Cost-efficiency is also a key advantage, with open source reducing licensing costs and optimizing resources. Additionally, it offers enhanced customization, allowing businesses to create solutions tailored to their specific needs.

Impact: These advancements allow businesses to unlock greater flexibility and performance while continuing to rely on critical enterprise systems like VMware and Oracle. By modernizing legacy workloads and aligning them with cutting-edge open source solutions, organizations can innovate faster, maintain cost control, and remain adaptable in a rapidly evolving data landscape.

Final thoughts

Open source is transforming the way organizations plan their data strategies, delivering cost efficiency, scalability, and innovation all at once. But as data infrastructure grows in prominence, businesses must be prepared to adopt and adapt. Each of these trends—unified platforms, real-time processing, privacy-first systems, AI-driven search, open warehouses, and infrastructure modernization—represents an opportunity to turn data into a powerful asset.

The question isn’t whether open source will drive enterprise innovation in 2025, but rather how fast your organization can leverage these technologies. Those who scale quickly will find themselves better equipped to innovate, deliver value, and stay ahead of the curve.

What open source data infrastructure trends are you excited to explore? The future of data-driven business is taking shape. Are you ready to seize it with a managed open source solution?

The post Harnessing managed open source: The future of data infrastructure appeared first on Instaclustr.

Benchmarking ClickHouse® performance: Insights from Instaclustr’s testing methodology

Instaclustr — Mon, 10 Feb 2025 09:00:38 +0000

In the evolving landscape of data analytics and management, performance benchmarking is pivotal to understanding how different systems handle real-world workloads. At NetApp, we performance test each of the ClickHouse nodes we offer to provide insightful results and showcase the capabilities of our managed platform. This blog presents an overview of our testing methodology, the ClickBench method, and the results.

Testing methodology

1. Provisioning: We provisioned ClickHouse clusters using the Instaclustr Console.

2. Node selection: For each cluster we ran performance tests with, we selected a different production node size from the available options.

3. Cluster initialization: Then we waited for the cluster to reach the running state, which took no more than 5 minutes.

4. Security configuration: By default, Instaclustr for ClickHouse clusters have security firewalls configured to protect against data infiltration or exfiltration. So, we configured an integration with the http://datasets.clickhouse.com domain, enabling access, allowing us to download the ClickBench “hits” data. See the support page for detailed information on how to do this.

5. Create table: Next we created a table to store the ClickBench data using the query found on GitHub.

6. Data loading: Then inserted the hits data into the table, using the following query:

insert into hits SELECT * FROM  url('https://datasets.clickhouse.com/hits_compatible/hits.parquet', 'Parquet') SETTINGS enable_url_encoding = 0;

7. Execution: Finally, we ran the ClickBench queries, which can be found on GitHub, using a script. The query time for each was recorded, then processed to calculate the relative query time for each node size.

Limitations

The ClickBench benchmark, developed by ClickHouse, is designed to simulate typical workloads. By using a dataset derived from the traffic recordings of one of the largest web analytics platforms worldwide, the benchmark maintains realism and relevance. The dataset, while anonymized, retains essential data distributions, enabling accurate performance testing across various queries.

However, the following limitations should be noted:

The dataset consists of a single flat table, rather than a set of related tables.
The table contains exactly 99,997,497 records, which is relatively small for a production dataset and does not fully utilize the resources available for larger instances.
The benchmark only tests single-node performance and does not utilize the entire ClickHouse cluster. This level of detail is more helpful for making cluster sizing decisions.
The benchmark runs queries sequentially and does not test workloads with concurrent requests, nor does it test for system capacity. Each query is run only a few times, which can introduce some variability in the results.

Results

Our performance testing, using the ClickBench methodology, on the Instaclustr Managed Platform yielded the following results. Our findings demonstrate the efficiency of Instaclustr for ClickHouse, showing significant performance improvements as additional vCPUs are leveraged on larger nodes to reduce relative query times. However, as node size increases, the test becomes less challenging, and the additional resources were not fully utilized, resulting in diminishing returns.

Sizing clusters is always workload dependent. Our results are useful for comparing the performance of these nodes handling the ClickBench workload, which is only about 15 GB when compressed. By understanding comparative performance for this workload, decisions can be made regarding optimal sizing. To better model real-world performance, a similar test could be repeated with a workload which more accurately reflects your production data.

Conclusion

In conclusion, our comprehensive performance testing has demonstrated how effectively scaling ClickHouse clusters on the Instaclustr platform can significantly enhance performance. By methodically provisioning clusters and leveraging the ClickBench benchmark, we have shown that larger node sizes directly contribute to improved query times. However, it is important to note that as size increases, the additional resources may lead to diminishing returns.

NetApp is committed to providing our customers with the insights and support needed to make informed decisions about their data infrastructure. Understanding performance characteristics for specific workloads is crucial for optimal cluster sizing, and our team is ready to assist you in tailoring your ClickHouse clusters to meet your production demands effectively.

Ready to harness the power of ClickHouse? Sign up here and elevate your open source data infrastructure strategy today.

The post Benchmarking ClickHouse® performance: Insights from Instaclustr’s testing methodology appeared first on Instaclustr.

Instaclustr product update: January 2025

Instaclustr — Fri, 17 Jan 2025 00:59:31 +0000

Here’s a roundup of the latest features and updates we’ve recently released.

If you have any particular feature requests or enhancement ideas that you would like to see, please get in touch with us.

Major Announcements:

Apache Cassandra®

Apache Cassandra 5.0 is now GA

Cassandra 5.0 on the Instaclustr Managed Platform marked a significant milestone, affirming our commitment to providing fully managed service across major cloud providers and on-premises infrastructures.

This release is a game-changer, featuring a suite of new features designed to supercharge AI/ML-based applications including storage-attached indexes and vector search, new and enhanced indexing and processing capabilities, and improved security.

Apache Kafka®

GCP Private Service Connect on Instaclustr for Apache Kafka

This enhancement ensures a secure and private connection to your Kafka clusters, leveraging the robust capabilities of Google Cloud Platform. Available as an Enterprise Feature and activated with a single click, this offers an extra layer of security over traditional VPC peering or internet gateways.

ClickHouse®

Instaclustr for ClickHouse is now GA

ClickHouse is an open source, column-oriented DBMS commonly used for real-time analytics, log and event analysis, and supporting machine learning and AI use cases. By offering ClickHouse, we enable businesses to leverage these capabilities without the operational complexities of running ClickHouse in their production environment. ClickHouse complements existing services we offer, like Kafka and PostgreSQL®, which together can be used to build robust and scalable data processing pipelines.

Valkey®

Valkey now available on the Instaclustr Managed Platform

Born earlier this year after Redis™ transitioned to a non-open source model, Valkey is championed by the Linux Foundation and major sponsors, including NetApp, AWS, Google Cloud and Alibaba.

Valkey’s initial release has been designed to maximize compatibility and ease of migration from Redis clusters. All command line and API calls will remain the same, and a Valkey node can even join a Redis cluster to make migration as simple as possible.

Instaclustr Managed Platform

Run In Your Own Account (RIYOA) subscriptions are now available via the Azure Marketplace, streamlining billing by consolidating Instaclustr services with Azure cloud expenses for more efficient financial management. The integration combines Azure’s infrastructure with Instaclustr’s managed services in a transparent, monthly per-node pricing that matches our direct offerings.

Other Significant Changes:

Apache Cassandra

Support for custom Subject Alternate Names (SAN), allowing for a structured way to specify multiple host names to be validated by a single certificate and simplifying Certificate Management
Apache Cassandra 5.0.1, 4.1.7 and 4.0.14 are GA
Shotover Proxy 0.5.2 for Cassandra PrivateLink

Apache Kafka

Support for Custom Subject Alternate Name (SAN) for Instaclustr for Apache Kafka clusters running on GCP and Azure
A bundled Kafka® Connect OpenSearch® Sink Connector is available on the Instaclustr Managed Platform, connecting the gap between Kafka’s robust data streaming capabilities and OpenSearch’s flexible and scalable search engine functionality
New metrics for Tiered Storage clusters: Remote Copy Lag Bytes, Remote Delete Lag Bytes, Remote Log Size Bytes, Expires Per Sec

Cadence®

Benchmarking results released for AWS Graviton3 node types demonstrating the noticeable performance gains observed during our testing

ClickHouse

ClickHouse 24.8.2 and 24.8.7 released in General Availability
Support for AWS S3 integrations using S3 and S3Queue table engines

Apache Spark™

A new version of NetApp Supported Spark on Red Hat OpenShift was released patching important security CVEs
Ocean for Apache Spark™ was enhanced so that Kubernetes Events are persisted for our TechOps engineers. Kubernetes only streams Events for a running application, but we now also make these Events available for completed applications. With access to historical information, our Support engineers can now understand failure reasons.

OpenSearch

Cross Cluster Replication feature is enabled, enhancing data availability and disaster recovery for OpenSearch clusters
Support for OpenSearch in Azure RIYOA Marketplace
New Standard_E16s_v4 node sizes on the Azure provider for OpenSearch
OpenSearch 2.18.0 is GA

PostgreSQL

Support for Azure Private Link, providing our Azure customers with a more secure and simplified option for network cross-VNet connectivity
PostgreSQL versions 16.6, 15.10 and 14.15 released in General Availability

Upcoming Releases:

ClickHouse

Instaclustr for ClickHouse has made its general availability debut on AWS and will soon enter the Public Review stage on Azure as part of its rollout across major hyperscalers

Ocean for Apache Spark

ISO 27001 and ISO 27018 certification is coming soon
We will soon have a new UI screen for clear identification of optimized cloud instances used in a customer’s Spark cluster

OpenSearch

Upcoming enhancements to NetApp Instaclustr for OpenSearch will include Observability and the Mapper-size plugin, which will improve the ability to monitor the Open Search clusters and analyse document sizes for optimal performance
Support for the ML Commons Plugin, which allows customers to enable, configure and operate machine learning algorithms in their OpenSearch cluster
Support for the Neural Search plugin which introduces the ability to use neural networks to make search results highly relevant

PostgreSQL

PCI For PostgreSQL for the gold standard of security of your cluster
AWS PrivateLink for Managed PostgreSQL will be rolled out on the Instaclustr Managed Platform, providing a simpler alternative to using VPC peering and transit gateways for the security conscious

Did You Know?

The OpenSearch project has moved to an open governance model under the Linux Foundation, cementing OpenSearch as the true open source option in the search space. This gives customers a guarantee of ongoing, user-centered innovation without the cost and risks of being locked in with a single vendor. Instaclustr has a long history with the OpenSearch project and will continue participating as a General Member of the new OpenSearch Foundation.
Why would organizations want to adopt open source technology into their data infrastructure? Shouldn’t proprietary software, with its code closely guarded, be the more secure option? Discover the Top 5 Reasons open source solutions provide distinct and numerous security advantages that proprietary software simply cannot offer.
Instaclustr recently announced the availability of a public preview of Apache Kafka® Tiered Storage. If you’re eager to explore this feature, check out Part 1 of our blog series for an initial overview of how it operates within Instaclustr Managed Kafka, Part 2 for delving deeper into performance aspects, the concept of “dam the river,” and our concluding thoughts, and Part 3 for exploring Kafka time and space.

The post Instaclustr product update: January 2025 appeared first on Instaclustr.

Automating Data Audit Process with Instaclustr User Actions API

Instaclustr — Sun, 24 Nov 2024 05:48:07 +0000

In today’s fast-paced digital world, managing and auditing data efficiently is critical for maintaining security, compliance, and operational excellence. Automating these processes can save time, reduce errors, and provide timely insights into system activities.

Since the release of the Users Actions feature on the Instaclustr platform, we have added API support for programmatic access to the logs. This blog will demonstrate how to use this API to streamline your data audit workflows.

The User Actions feature provides detailed logging and tracking of user actions across organizations, accounts, and clusters. It is designed for Instaclustr users, especially administrators and auditors, to gain insights into the actions taken on their clusters.

User Actions are recorded regardless of who triggered them, whether it was account users or the Instaclustr Support team. Utilizing this feature enhances security, audit trails, and operational transparency. With customized filters such as action types, user, and account, the User Actions API allows comprehensive monitoring and analysis. This capability is particularly valuable for maintaining compliance and diagnosing issues.

What is the User Actions API?

The User Action API provides a way to track and audit user actions within the Instaclustr platform. This API allows you to retrieve detailed information about user activities, such as login attempts, cluster modifications, and other critical operations. By utilizing this API, you can automate the collection and analysis of audit logs, ensuring comprehensive monitoring of user activities.

Key Features of the User Actions API

Comprehensive Tracking: The API captures a comprehensive set of user actions, ensuring you have a complete audit trail.
Detailed Logs: More than 100+ recorded user actions are logged with detailed information, including timestamps, usernames, and specific actions performed.
Integration Capabilities: The API can be integrated with various data analysis and monitoring tools, enhancing your ability to analyze and respond to user activities.

Getting Started with the User Actions API

To begin automating your data audit process, you’ll need to access the User Actions API. The Instaclustr API documentation provides information on how to access the user actions endpoints and which query parameters to use for better filtering. Below are some examples of such parameters:

Query Parameter	Description
crudOperations: array of unique strings	Used in both account and organization level endpoints. The response returns the operation type in the CRUD (create, read, update, delete) form. Please refer to the Instaclustr API documentation for more information on the accepted values for this query parameter.
getActionCounts: boolean	Default: true Used in both account and organization level endpoints When set to true, the response includes counts of each action type returned.
organisationLogsOnly: boolean	Default: false Used only in the organization level endpoint When set to true, the response will include only the logs related to the organization
accountLogsOnly: boolean	Default: false Used only in the organization level endpoint When set to true, the response will include only the logs related to the organization’s accounts

Below are listed the key endpoints you will use:

1. List Account User Actions

Endpoint	GET /cluster-management/v2/operations/user-actions/v2
Description	This endpoint retrieves a list of user actions, providing details about each action performed within the Instaclustr account.
Documentation	User Actions API – Account level
API Key	Provisioning API Key

2. List Organization User Actions

Endpoint	GET /v1/organisations/{organisationId}/user-actions
Description	This endpoint lists user actions within the context of organization management (including all the actions from the child accounts of the organization), allowing you to track organizational activities.
Documentation	Organization User Actions API
API Key	Organization Management API Key

Use Cases

Here are some scenarios in which Instaclustr User Actions API would be used:

1. Monitoring User Activity

Security teams can investigate unauthorized changes performed by a suspected user by filtering the user actions for that user. This request will generate a detailed report of all activities performed up until that point. The report would include cluster or API key modifications, resource deletions, and many more. For more user action filters, refer to the API docs.

2. Auditing Actions within a Specific Organization

Compliance teams can audit actions within My Org (member of) to ensure adherence to regulatory requirements. By filtering events by organization name, they can verify that all activities comply with internal and external policies and regulations.

3. Identifying Anomalous Behavior

The security team can identify potential security threats by monitoring for unusual user behaviour. By filtering for actions like multiple API key creations/deletions (ACCOUNT_GENERATE_API_KEY, ACCOUNT_REVOKE_API_KEY), or sudden mass deletions (DELETE_CLUSTER, REQUEST_CLUSTER_DELETE), they can alert and investigate users or organizations exhibiting malicious activity, potentially preventing security breaches. To further mitigate such threats, we have implemented measures like two-factor authentication, ensuring an additional layer of security. This helps verify the legitimacy of actions and reduces the risk of unauthorized or accidental operations.

4. Managing Resource Utilization and Efficiency

The operations team can monitor resource utilization with My Org to optimize efficiency. By filtering user actions related to resource (i.e. clusters, data centres, firewall rules etc.) creation and deletion, they can track which resources are being created or removed, ensuring efficient use and minimizing unnecessary expenses.

Automating Data Audits via User Action API

Let’s consider a scenario where you want to automate the collection of audit logs for user actions in your organization. You can set up a Python script to call the User Action Events API at regular intervals (Note: In line with Instaclustr monitoring metrics, user action data is refreshed every 20 seconds), which would retrieve the latest user actions and store them in a database or a log management system.

Benefits of Automating Data Audits:

Improved Security: Automated audits help identify unauthorized or suspicious activities promptly.
Regulatory Compliance: Automated logging ensures you meet compliance requirements by maintaining detailed records of user actions.
Operational Efficiency: API helps reduce the manual effort required for auditing, allowing your team to focus on more strategic tasks. Manual effort implies that the users access the Instaclustr console and filter it via the provided UI (user interface). By utilizing the API, users can “set and forget” – they can build detailed reports, set alerts and monitor their resources without any additional manual steps.

How-to Steps

The Python script below is a simple example of fetching the latest user actions from the API.

1. Prepare your script

The script needs to send a GET request to the Instaclustr API to retrieve the latest user actions and store the response in a JSON file. For this, it is required that the requests and json libraries are imported.

import requests 
import json

Sending a GET request to the Instaclustr API requires an endpoint and an API Key. The endpoint for listing an organisation’s user actions is found on the Instaclustr API documentation and it requires Basic Authentication via Organization Management API key.

# Replace with your Instaclustr API key and endpoint 
API_KEY = 'your_api_key' 
ENDPOINT = 'https://api.instaclustr.com/v1/organisations/{organisationId}/user-actions'

2. Get user actions

The function defined below will send a GET request to the Instaclustr API and store the response in a variable. The function returns the response variable in JSON format. If the request is not successful, it will raise an HTTPError containing the appropriate Instaclustr API’s error message.

def get_user_actions(api_key, endpoint):
headers = {
'Authorization': f'Bearer {api_key}',
'Content-Type': 'application/json'
}
response = requests.get(endpoint, headers=headers)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()

3. Store the user actions

Another function is defined below. This function will dump the content of the data argument into a JSON file called audit_log.json.

Note: You can modify the save_audit_log() function to store the data in a database or any other preferred storage.

def save_audit_log(data): 
# Save the audit log data to a file or a database 
with open('audit_log.json', 'a') as f: 
json.dump(data, f, indent=4) 
f.write('\n')

4. Run the script

Using the two functions defined above, the script will get the user actions, store them in a variable and save the audit logs in a JSON file:

def main(): 
# Fetch user actions 
user_actions = get_user_actions(API_KEY, ENDPOINT) 

# Save the user actions to audit log 
save_audit_log(user_actions) 

if __name__ == '__main__': 
main()

save_audit_log function to store the data in a database or any other preferred storage.

Conclusion

Automating your data audit process with Instaclustr’s User Action API can significantly enhance your organization’s ability to monitor and manage user activities. By leveraging this API, you can ensure comprehensive tracking, detailed logging, and seamless integration with your existing tools and workflows. Start using the User Action API today to streamline your data audit processes and boost your operational efficiency.

For more information, check out our blog  about the User Actions feature and explore the possibilities of automating your audit workflows.

If you have any questions about this feature, please contact Instaclustr Support at any time. If you are not a current Instaclustr customer and you’re interested in learning more, register for a free trial and spin up your first cluster for free!

The post Automating Data Audit Process with Instaclustr User Actions API appeared first on Instaclustr.

NetApp® Instaclustr now available on the Azure Marketplace

Instaclustr — Mon, 28 Oct 2024 00:58:04 +0000

NetApp® Instaclustr RIYOA (Run In Your Own Account) subscriptions are now seamlessly offered as a managed service subscription through the Azure Marketplace.

With Instaclustr RIYOA through Azure, you can now easily streamline your billing process by consolidating it with your Azure account, simplifying financial planning and management. This integration ensures that all Instaclustr services are billed alongside your Azure cloud expenses.

This release introduces our RIYOA model to the Azure Marketplace, where your subscription includes the robust Azure infrastructure and the top-tier management services provided by Instaclustr, all in a transparent, monthly per-node pricing structure that aligns with our direct offerings.

Utilizing the Azure Marketplace for RIYOA provisioning allows you to use the Microsoft Defender security suite across your deployments and enables effortless management of our comprehensive open source services for Apache Cassandra®, Apache Kafka®, Apache Kafka® Connect, PostgreSQL®, OpenSearch®, Redis™, Cadence®, and Apache ZooKeeper™.

Instaclustr offerings provisioned via the Azure Marketplace have all the familiar features you expect, including:

Service Level Agreements (SLAs) with up to 99.999% uptime guarantees
An intuitive management console for hassle-free provisioning, configuration, and integrated monitoring
DevOps-optimized API access: manage provisioning with REST API or Terraform, monitoring through Prometheus API, or integrate with your preferred monitoring tools via REST API
24x7x365 expert support and continuous system monitoring
Regular updates and maintenance
Compliance with SOC 2, ISO27001, and ISO27018 for uncompromising security

Ready to harness the power of Instaclustr via the Azure Marketplace? Sign up here and elevate your cloud strategy today.

For guidance on account setup and provisioning, check out our support documentation.

Existing Instaclustr clients considering a billing transition to Azure, please contact your sales or customer support representative, or email us directly at sales@instaclustr.com.

The post NetApp® Instaclustr now available on the Azure Marketplace appeared first on Instaclustr.

Top 10 reasons why enterprises choose NetApp® Instaclustr

Instaclustr — Fri, 20 Sep 2024 11:00:23 +0000

Enterprises of all shapes, sizes, and industries are increasingly turning to managed open source platforms to simplify their operations, enhance their capabilities, and build reliable applications at scale.

According to an IDC analyst brief, with the data growth of AI and cloud-native applications, teams must simplify their underlying data infrastructure to support business objectives.

Instaclustr delivers a reliable and scalable platform for deploying open source technologies – but what makes it the go-to choice for so many enterprises?

Explore the top 10 reasons why businesses are choosing Instaclustr and how it can transform your data infrastructure, too.

1. Comprehensive multi-tech data platform

Enterprises often choose multiple open source technologies to leverage best-of-breed solutions, enhance innovation, and meet diverse business needs.

However, this approach introduces challenges such as increased complexity, integration difficulties, and the need for specialized skills. Managing these technologies can lead to interoperability and maintenance issues, along with unpredictable support.

To address these challenges, enterprises can engage in strategic planning and utilize managed services to streamline operations and maintain a robust infrastructure.

Instaclustr offers an all-in-one platform that supports leading open–source technologies including Apache Cassandra®, Apache Kafka®, PostgreSQL®, Valkey™, Apache Spark™, Cadence®, and ClickHouse®. This comprehensive suite makes it easy for enterprises to manage, deploy, and scale their data needs within a single platform, streamlining operations and reducing complexity.

2. Master AI challenges

Enterprises venturing into AI face significant hurdles in data management, starting with the massive volumes and diverse types of data they must handle. Instaclustr is a great choice for enterprises looking to scale AI initiatives, offering robust support for vector search on PostgreSQL, Cassandra, and OpenSearch.

The Instaclustr Managed Platform handles the complexities of delivering a reliable, high performance open–source data set at scale, ensuring accurate and fast AI responses needed to drive your business success.

3. True open source commitment

Open source is what we do every day.

Instaclustr is dedicated to maintaining true open source principles, ensuring that businesses have the freedom to innovate without the risk of vendor lock-in. This commitment also includes active contributions to the open –source community, reinforcing their position as a trusted partner in the industry.

Don’t just take our word for it: check out our project GitHub repository and our collaborations with the Linux Foundation.

4. Flexible deployment options

Flexible deployment options are not just about managing data—they’re about unlocking potential, driving growth, and positioning enterprises to thrive in an ever-evolving technological landscape.

Whether you’re operating in the cloud (AWS, Azure, Google Cloud), on-prem, or in a hybrid environment, Instaclustr offers flexible deployment options to suit your specific needs. This adaptability ensures that you can leverage the best of both worlds without compromising on performance or reliability.

5. Exceptional performance and scalability

Instaclustr’s platform is designed to handle large-scale workloads with ease, offering automation to scale infrastructure based on business demands, such as seasonal traffic spikes. This ensures that your operations run smoothly, no matter the circumstances.

6. Robust security and compliance

Safeguarding your data is crucial. And while companies debate between open source and proprietary code, with Instaclustr, you get a powerhouse of benefits, from expert management to regular updates and compliance support, all while reducing your operational load. This means you can focus on what you do best, knowing your security and compliance are in good hands.

Instaclustr excels by providing comprehensive security measures and adheres to various industry standards and regulations, including GDPR, SOC 2, and HIPAA. This not only helps you meet regulatory requirements but also ensures your data upholds the highest levels of data protection.

7. Enhanced operational efficiency

By automating routine tasks such as patching, maintenance, and backups, Instaclustr frees up your IT operations team to focus on more value-added activities. This efficiency boost can lead to significant time and cost savings for your enterprise.

As Ken Izumi, VP of Engineering from Paidy, says, “Technologies like Apache Cassandra and Apache Kafka can be complex to manage efficiently and effectively. Instaclustr abstracts that complexity and helps us to stay focused on building our applications and services.”

8. Superior customer support

Instaclustr provides 24×7 monitoring and support to ensure the ongoing health and performance of your data systems. This proactive approach helps in identifying and resolving issues before they impact your business, ensuring continuous uptime and reliability.

Having an open source emergency? Our expert team collaborates closely with your engineers to resolve operational issues to minimize impact to the business.

9. Cost efficiency

Leveraging open source technologies through Instaclustr helps enterprises achieve substantial cost savings. The managed service model also allows existing staff to support growing business needs without the requirement for significant capital investments, making it an economically sound choice.

10. Seamless integration with NetApp storage

For organizations using NetApp storage solutions, Instaclustr fully integrates to deliver up to 3x performance and increased resiliency for open source databases and AI workloads. This synergy enhances the overall efficiency and reliability of your data infrastructure.

Discover why enterprises are using PostgreSQL on Azure NetApp Files and how OpenSearch supports the use of NetApp StorageGRID®, making it a win-win for enterprises aiming to optimize their operations.

Conclusion: The incredible power open source

Data-driven decisions are the heartbeat of modern enterprises, keeping them competitive and innovative, and success hinges on a robust, reliable, and scalable data infrastructure.

Instaclustr takes the headache out of managing open source technologies, offering the expertise and support to ensure your operations run like a well-oiled machine. With our team enhancing your data capabilities, you can focus on what truly matters—growing your business.

Ready to elevate your data strategy? Dive into the Instaclustr Managed Platform and discover how it can empower your enterprise today, equipping you with the tools and confidence to conquer any data challenge.

The post Top 10 reasons why enterprises choose NetApp® Instaclustr appeared first on Instaclustr.

Apache Kafka® Use Cases and Real-Life Examples

Instaclustr — Wed, 21 Aug 2024 06:26:47 +0000

What Is Apache Kafka®?

Apache Kafka is an open source event streaming platform used for building real-time data pipelines and streaming applications. Developed by LinkedIn and later donated to the Apache Software Foundation, Kafka is designed to handle high-throughput, fault-tolerant data streaming. It’s built on a distributed architecture, allowing it to scale horizontally across multiple servers.

Kafka works on the principle of publish-subscribe, where data producers send messages to Kafka topics, and consumers retrieve them. Each topic is partitioned and replicated across multiple nodes for reliability. This design allows Kafka to process massive volumes of data with low latency, making it ideal for real-time data processing and analytics.

Kafka’s ecosystem includes Kafka Streams for stream processing and Kafka® Connect for integrating with various data sources and sinks. Its wide adoption is due to its high performance, scalability, and durability, making it a popular choice for many large-scale, data-driven companies.

The Need for an Event Streaming Platform Like Kafka

The ability to process and analyze data in real-time is crucial. Traditional data processing systems often struggle with the volume, velocity, and variety of data generated by modern applications. This is where an event streaming platform like Kafka comes into play.

Event streaming platforms enable continuous, real-time data processing. Kafka, specifically, allows organizations to ingest, store, process, and analyze data as it’s generated. This is crucial for use cases like real-time analytics, monitoring, and messaging, where immediate data processing is necessary.

Another critical aspect of Kafka is its ability to decouple data producers from consumers. This means systems can produce data at their own pace, and consumers can process it when ready. Kafka’s durability and fault tolerance ensure that data is not lost, even if a consumer is temporarily down. This reliability and flexibility make Kafka an essential tool for modern data architectures.

How Kafka Supports Common Use Case

Apache Kafka is great in many areas, and these are some of the most popular use cases:

1) Real-Time Data Processing

Apache Kafka supports real-time data processing by providing high throughput, low-latency data handling. In this use case, Kafka functions as a central hub for data streams. Its ability to handle large volumes of data in real-time is attributed to its distributed architecture, where data is partitioned and processed in parallel across multiple nodes. Kafka ensures data integrity through replication, preventing data loss during node failures.

Kafka’s publish-subscribe model allows real-time data ingestion from various sources. Data producers publish to Kafka topics, from where data consumers can subscribe and process data as it arrives. Kafka’s performance in real-time processing is enhanced by its efficient storage mechanism and the ability to maintain high throughput, even with high data volumes. This capability is critical for applications that require immediate data analysis, such as fraud detection or live monitoring systems.

Moreover, Kafka Streams API facilitates real-time data processing by allowing developers to build applications that can process and analyze data directly within Kafka. This feature supports complex operations like windowing, aggregations, and joins on streaming data, enabling sophisticated real-time analytics directly on the stream.

2) Messaging

Kafka serves as a robust messaging system, supporting high-throughput, distributed messaging. It handles messaging use cases by allowing systems and applications to exchange data in real-time and at scale. Kafka’s durability and fault tolerance are key features here, ensuring messages are not lost in case of system failures, and can be replayed if needed.

In Kafka, messages are organized into topics, making it easier to categorize and manage different types of messages. Producers send messages to topics, and consumers subscribe to these topics to receive messages. Kafka’s ability to handle a large number of simultaneous producers and consumers makes it an ideal choice for complex messaging ecosystems.

Kafka also supports different messaging patterns, such as point-to-point, publish-subscribe, and request-reply. Its scalability allows organizations to start with a small setup and scale horizontally as their messaging needs grow. The system’s built-in partitioning, replication, and fault-tolerant design ensure messages are processed efficiently and reliably, making Kafka a popular choice for messaging in distributed systems.

3) Metrics Collection and Monitoring

Apache Kafka is highly effective for collecting and processing operational metrics. It captures metrics from various parts of an application or system and makes them available for monitoring, analysis, and alerting. Kafka’s distributed nature allows it to handle large volumes of metric data generated by multiple sources without performance degradation.

In this use case, Kafka acts as a central repository for operational metrics. Producers send metrics data to Kafka topics, where they are stored and made available to consumers. This setup enables real-time monitoring of operational metrics, allowing for timely insights into system performance, usage patterns, and potential issues.

Kafka’s scalability is particularly beneficial for operational metrics as it can accommodate the growing amount of data over time. The system ensures data retention for a configurable period, allowing for historical analysis of metrics. Moreover, Kafka’s compatibility with various data processing frameworks and monitoring tools enables comprehensive analysis and visualization of operational metrics.

4) Log Aggregation

Kafka is highly effective for log aggregation, a process critical for monitoring, debugging, and security analysis in distributed systems. It collects log data from various sources, such as servers, applications, and network devices, providing a centralized platform for log data management.

Kafka’s ability to handle high volumes of data makes it suitable for log aggregation, where large amounts of log data are generated continuously. Its distributed nature allows logs to be collected and processed in parallel, enhancing the efficiency of log data management. Kafka topics serve as log data repositories, where logs can be categorized based on their source or type, simplifying data organization and retrieval.

The durability and fault tolerance of Kafka ensure that log data is not lost, maintaining data integrity. This aspect is crucial for log analysis, especially in scenarios involving debugging or security incident investigations. Kafka’s scalable architecture supports the increasing volume of log data as systems expand, maintaining performance without compromising data processing speed.

Real World Examples and Uses of Apache Kafka

Kafka is used across multiple different industries and in real-world use cases, but these are some of the more interesting instances we’ve come across.

1) Modernized Security Information and Event Management (SIEM)

SIEM is a foundational tool in a security operations center (SOC), which collects event data from across the IT environment and generates alerts for security teams. Traditional SIEM systems often struggled with scalability and performance issues. However, Kafka’s distributed architecture allows it to handle the large-scale, high-speed data ingestion required by modern SIEM systems.

Kafka’s rapid processing capability brings a new level of responsiveness to SIEM systems. It enables organizations to detect and respond to potential security threats as they happen, rather than after the fact. This proactive approach can significantly reduce the potential damage caused by security breaches.

Real life example: Goldman Sachs, a leading global investment banking firm, leveraged Apache Kafka for its SIEM system. Kafka enabled them to efficiently process large volumes of log data, significantly enhancing their ability to detect and respond to potential security threats in real-time.

2) Website Activity Tracking

Many organizations use Kafka to gather and process user activity data on large scale websites and applications. This data can include everything from page views and clicks, to searches and transactions.

Kafka enables businesses to collect data from millions of users simultaneously, process it quickly, and use it to gain insights into user behavior. These insights can help businesses optimize their websites, provide personalized user experiences, and make data-driven decisions.

Kafka’s durability is another advantage for website activity tracking. It stores data reliably for a configurable amount of time, ensuring no data is lost even if a system failure occurs. This reliability is important for businesses that need accurate data to drive their decision-making processes.

Real life example: Netflix, a major player in the streaming service industry, uses Apache Kafka for real-time monitoring and analysis of user activity on its platform. Kafka helps Netflix in handling millions of user activity events per day, allowing them to personalize recommendations and optimize user experience.

3) Stateful Stream Processing

Kafka’s stream processing capabilities make it possible to process and analyze data as it comes in, instead of batch processing it at intervals. Stateful stream processing refers to the ability to maintain state information across multiple data records. This is crucial for use cases where the value of a data record depends on previous records. Kafka’s Streams API supports this functionality.

Real life example: Pinterest utilizes Kafka for stateful stream processing, using the Kafka Streams API to deliver inflight spend data to thousands of ad servers. A primary feature of the system is a predictive system based on Kafka Streams which reduces overdelivery in Pinterest ad systems.

Ready to Get Started with Apache Kafka?

With hundreds of millions of node hours of operational experience under our belts, we’ve seen it all with Apache Kafka–and know how to solve the most challenging problems. Reach out to our team of experts and let’s have a chat about your use case!

The post Apache Kafka® Use Cases and Real-Life Examples appeared first on Instaclustr.

Important Update: CPU Usage Metrics Now Reflect True Utilization

Instaclustr — Wed, 31 Jul 2024 03:32:12 +0000

Instaclustr by NetApp is committed to providing you with the most accurate and actionable data on your clusters’ performance. We have recently discovered and fixed a bug that resulted in lower than actual CPU utilization being reported by Instaclustr managed application nodes.

In August of last year, Instaclustr deployed changes that introduced significantly more in-depth CPU metrics for our support staff. This allows them to investigate bottlenecks in single threaded applications. Unfortunately, this introduced a regression that resulted in the underreporting of system CPU time as we no longer included System in our total calculation. The discrepancy between actual and reported total CPU usage is particularly noticeable on instances with high system CPU usage.

Upon the discovery of the regression, we immediately identified the root cause and began deploying a fix to managed instances on July 15, 2024. This correction means CPU utilization metrics now include system CPU time in the total CPU calculations, providing accurate representation of your CPU usage.

What This Means for You

There are no actions required by customers at this time. You may have noticed an apparent increase in CPU usage reported for your clusters around the time the fix was released. We would like to assure our customers that this is not a real increase in usage but rather a correction to reflect the true values. The metrics shown will now present a comprehensive overview of CPU utilization, ensuring you have the correct information for capacity planning and performance analysis.

As a result of this fix, historical total CPU usage data after August 2023 and before mid-July 2024 may appear lower than they should as they did not include System CPU usage. All historical data relating to the detailed CPU utilization types is correct (including user, system, steal, irq, and nice). Instaclustr is confident that this error has not caused any impact to application performance or alerting. However, we understand that total CPU can be a useful indicator for evaluating cluster performance and available overhead.

Please don’t hesitate to reach out to our Support team if you are concerned about the impact that this change has on your cluster, or any future scaling plans, and we can assist you in working out the best path forward. We apologize for any inconvenience caused and are taking steps to prevent similar issues from occurring in the future. Thank you for your understanding and continued support.

Please contact our Support team if you have any further queries or concerns related to this.

The post Important Update: CPU Usage Metrics Now Reflect True Utilization appeared first on Instaclustr.