SAP HANA is a high-performance analytic appliance that provides SAP software components optimized on hardware from SAP’s leading hardware partners. HANA’s in-memory computing engine enables organizations to effectively analyze business operations based on very large volumes of detailed real-time data, without affecting backend enterprise applications or databases.
SAP HANA can be added to existing landscape ,which may include Datasources and BI software.
SAP HANA can be added to existing landscape ,which may include Datasources and BI software.
SAP HANA Database is an appliance that hosts the In-memory Computing Engine,which leverages technologies such as columnar storage,massively parallel processing and data compression
Let see what is In-Memory Columnar Storage:
Relational databases typically use row-based data storage.
However Column-based storage is more suitable for many business applications.
SAP HANA supports both row-based and column-based storage, and is particularly
optimized for column-based storage
Row Storage - It stores table records in a sequence of rows.
Column Storage - It stores table records in a sequence of
columns i.e. the entries of a column is stored in contiguous memory locations
let’s review what Columnar Storage actually looks like. In
the image below, you will see that Row Based Storage looks a lot like Excel –
Columns up top, and Values down below. To read a row store, we would have to go
through each record one by one.
Columnar Storage will instead store each column separately,
acting as its own, separate index, which also means that you can easily exclude
entire columns not included in the query (ie. Queries for Country and Sales
will ignore Order and Product), thereby increasing the speed and efficiency of
the query
1. HANA stores data in-memory for fast access
In-memory means that the data in the hard disk is loaded
into RAM. In doing so, data is retrieved much more quickly because RAM is
attached directly to the system’s motherboard.
RAM is also cheaper than ever, meaning a company can load
terabytes of data into memory and report directly from it.
The fact that data is stored in a columnar
fashion in-memory is what enables SAP HANA.
Only affected columns have to be read during the selection
process of a query. Any of the columns can serve as an index.
2. This data is then compressed
Because each column is stored separately, this means we can
achieve greater rates of compression.
For example, let’s look at how HANA would compress “Italy”:
As indicated above, the records indicating “Italy” have been
compressed four to one. Imagine if we had a million records – the compression
ratio would be that much higher, resulting in an increase in performance.
With this high rate of compression in HANA, we can achieve a
lower overall data footprint, which means we require less RAM and less
resources, lowering the overall cost.
3. Faster retrieval with HANA’s Dictionary
There is no need to decompress compressed data when reading
from the column store.
The reason for this is “The Dictionary”, which assigns a
value for each record (ie. Italy = 1, Spain =2), and uses this logic every time
the data needs to be accessed. This is how HANA “memorizes the book”.
If we wanted to search “Italy”, HANA will refer to the
dictionary and retrieve only values from the Country column related to Italy,
ignoring everything else not in Section 1.
4. Aggregating transactional data
Aggregations are the bread and butter of transactional
systems, allowing companies to aggregate their transaction data by sum,
average, etc.
For example, if I were doing a SUM on Italy Sales, HANA will
do the following:
In traditional transactional systems, these aggregates have
to be materialized, meaning they are created and take up space in the system.
HANA does the opposite - performing aggregations on the fly without taking
space, which translates to less resources and lower costs.
The speed comes from the fact that the numerical values in
“Sales” are stored in-memory, in its own, separate column, meaning HANA can
scan it very quickly.
This real time aggregation of data in HANA is enabled by
hardware, in-memory columnar storage, and various engines/operators built into
the HANA system.
As per the above example, the combination of dictionary
compression on the Italy value and on-the-fly aggregation of transaction data,
demonstrates the efficiency in which HANA processes data.
5. Parallel Execution
Now imagine all of the above examples, but happening
multiple times at once. HANA partitions, or splits, the column being processed,
and distributes the workload (aggregation, search, etc.) across multiple CPU
cores – if a HANA deployment had 80 cores, then these would be happening 80 at
a time, taking mere milliseconds each.
In HANA, you have the option to go with a traditional
row-store. While that would still be fairly quick due to the hardware,
in-memory Columnar storage is the great enabler - the foundation that enables
HANA to, in one second, process data equivalent to every repair and service
visit for every car on earth in the last 12 months, or every address that
everyone currently alive has ever lived in.
Executing queries with speed and efficiency, and calculating
and aggregating on the fly, HANA is more than just a database. It is a real
time data platform - a perfect symphony of hardware and software innovation
that allows entire companies to run on a single platform – SAP HANA.
Advantages and disadvantages of row-based tables:
Row based tables have advantages in the following
circumstances:
The application needs to only process a single record at one
time (many selects and/or updates of single records).
The application typically needs to access a complete record
(or row).
Neither aggregations nor fast searching are required.
The table has a small number of rows (e. g. configuration
tables, system tables).
Row based tables have dis-advantages in the following
circumstances:
In case of analytic applications where aggregation are used
and fast search and processing is required. In row based tables all data in a
row has to be read even though the requirement may be to access data from a few
columns.
When to use Row-based or Column Based Tables?
In
case of analytic applications where aggregations are used and fast search and
processing is required row-based storage are not good. In row based tables all
data in a row has to be read even though the requirement may be to access data
from a few columns. Hence these queries on huge amounts of data take a lot of
time.
In columnar tables, this information is stored physically next to each other, significantly increasing the speed of certain data queries.
The following example shows the different usage of column and row storage, and positions them relative to row and column queries. Column storage is most useful for OLAP queries (queries using any SQL aggregate functions) because these queries get just a few attributes from every data entry. But for traditional OLTP queries (queries not using any SQL aggregate functions), it is more advantageous to store all attributes side-by-side in row tables. HANA combines the benefits of both row- and column-storage tables.
In columnar tables, this information is stored physically next to each other, significantly increasing the speed of certain data queries.
The following example shows the different usage of column and row storage, and positions them relative to row and column queries. Column storage is most useful for OLAP queries (queries using any SQL aggregate functions) because these queries get just a few attributes from every data entry. But for traditional OLTP queries (queries not using any SQL aggregate functions), it is more advantageous to store all attributes side-by-side in row tables. HANA combines the benefits of both row- and column-storage tables.
To enable fast on-the-fly aggregations, ad-hoc reporting,
and to benefit from compression mechanisms it is recommended that transaction
data is stored in a column-based table.
The SAP HANA data-base allows joining row-based tables with
column-based tables. However, it is more efficient to join tables that are
located in the same row or column store. For example, master data that is
frequently joined with transaction data should also be stored in column-based
tables.
No comments:
Post a Comment