The conventional method for handling storage and compute resource allocation is to treat them as fixed and set in advance. This inflexibility is particularly the case when committing to on-premises data warehouse hardware. Any tasks that are performed with this model are either bottlenecked by resource limitations or are not utilising the entirety of the available resources. This is an inefficient approach which can lead to overspending or underspending.
Modern data warehouses implement dynamic scaling for both storage and compute resources, offering “resources on tap” whenever they are needed. This is implemented as part of a pricing model which charges only for what your organisation uses meaning you do not pay to maintain resource capacity that you do not use.
Storage and Compute Resources
Modern Data Warehouses typically separate storage and compute resources, handling them as separate entities.
This division of storage and compute is important because it offers even more pricing flexibility. The evolution of cloud technology has been dual-pronged with the price of storage going down much more rapidly than the price of computing resources. This allows organisations to persistently store all historical data but only worry about the cost of querying and processing the data that is needed at a particular point in time.
Variety of Consumption
Another important element of a modern data warehouse is the fact that it offers much more than conventional self-service business intelligence reports or dashboards. A simpler and more old-fashioned data warehouse will be set up to ingest data, transforming it to adhere to a structured, relational database which would then be queried for specific BI outputs.
A modern data warehouse offers more options for utilising the data it contains and becomes a potentially useful tool for more roles within the organisation. For example, Data exploration is much more viable for data scientists and advanced business users in a modern data warehouse.
Because querying of the data warehouse is no longer limited to SQL, users who are comfortable, for instance, with Python can use them for access and exploration. One robust platform with a discovery layer with a single security model means that there is no need to prepare the data to fit a certain format and silo it before it can be used for different purposes.
This can overcome the traditional challenges of multiple copies of siloed data to service different analytical needs across a business.
Velocity and Volume of Data
A modern data warehouse handles complex pipelines of high-volume unstructured data while simultaneously maintaining low levels of latency.
If you are dealing with structured data Massively Parallel Processing (MPP) enables the querying of the same data simultaneously by multiple users from multiple departments. This is, again, without the need for creating specialised data siloes for each department or the delays and processing queues associated with more conventional, non-parallel data warehouses.
Modern data warehouses can handle real-time and streaming data. High volume streaming data pipelines which contain unstructured data can be processed for useful analytics and passed through to directly to a system in-flight. This is particularly useful for modern IoT applications which typically generate unprecedented levels of real-time data.
How Loome can Help
Loome Integrate makes building and maintaining a modern data warehouse frictionless. With over a hundred pre-built connectors you can set up and have full visibility over your data pipeline without writing a single line of SQL. When consolidating from multiple sources, task orchestration and real-time alerts give you full transparency over your entire process to get you from source to target as easily as possible.
Google Big Query