CelerData, formerly known as StarRocks Inc., today announced the latest version of its unified analytics platform – CelerData V3. The move introduces multiple new capabilities for handling batch and real-time data, including the option to perform analytics without first ingesting information into a data lake or data lakehouse.
Enterprises have long relied on data ingestion for analytics. They import large, assorted data files from multiple sources into a single, cloud-based storage medium — like a data lake — and then run analysis on it. The process usually involves roping in integration tools like Matillion and Airbyte.
CelerData V3 for direct analytics
With a 3.0 update set to hit general availability in April 2023, CelerData’s analytics platform will allow enterprise users to integrate with open table formats such as Hudi, Iceberg and Delta Lake, and apply the CelerData query engine on data without ingestion in a data lake.
This way, the company said, users could query across streaming data and historical data in real time, without having to wait and combine streaming data into batches for analysis. The move also simplifies the data architecture and improves the timeliness of analytics.
“The data lakehouse has added critical capabilities to the data lake architecture by introducing ACID control, table formats and data governance,” James Li, CEO at CelerData, said. “However, analytics capabilities on the lakehouse are still limited and cost prohibitive. Most query engines struggle to support interactive ad-hoc queries, are not able to support real-time analytics, and fall apart when facing a large number of concurrent users.”
CelerData, on the other hand, has been increasing its focus on supporting unified analytics for data lakes and lakehouses. The platform was built on top of the open-source StarRocks project, which started in 2020 as a fork of the open-source Apache Doris analytics database. However, since then, it has diverged from Doris and developed to become an MPP (massively parallel processing) OLAP database enabling rapid real-time query support for analytics workloads.
The company claims the platform can today support thousands of concurrent users at 10,000 QPS (queries per second), delivering at least three times better performance than other common query engines.
What else is in the new update?
Along with integration with open table formats, CelerData’s latest version gives users the option to bring data into its own storage format on the lake, as well as create multitable materialized views. This, it says, will also help speed up query performance.
Further, the cloud-native architecture of the update – leveraging cloud object storage – will improve reliability and reduce storage costs for enterprises. It will also enable better workload and resource isolation for them.
The developments will help CelerData take on the competition in the market for query engines for data analytics. This includes the Imply-backed Apache Druid project, which is also an open-source, real-time analytics database, as well as the Apache Pinot analytics database project, backed by commercial vendor StarTree.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.