Saturday, September 10, 2016

Overview of SAP HANA with BI

SAP HANA is a high-performance analytic appliance that provides SAP software components optimized on hardware from SAP’s leading hardware partners. HANA’s in-memory computing engine enables organizations to effectively analyze business operations based on very large volumes of detailed real-time data, without affecting backend enterprise applications or databases.

SAP HANA can be added to existing landscape ,which may include Datasources and BI software.





SAP HANA Database is an appliance that hosts the In-memory Computing Engine,which leverages technologies such as columnar storage,massively parallel processing and data compression

Let see what is In-Memory Columnar Storage:

Relational databases typically use row-based data storage. However Column-based storage is more suitable for many business applications. SAP HANA supports both row-based and column-based storage, and is particularly optimized for column-based storage

Row Storage - It stores table records in a sequence of rows.
Column Storage - It stores table records in a sequence of columns i.e. the entries of a column is stored in contiguous memory locations

let’s review what Columnar Storage actually looks like. In the image below, you will see that Row Based Storage looks a lot like Excel – Columns up top, and Values down below. To read a row store, we would have to go through each record one by one.
Columnar Storage will instead store each column separately, acting as its own, separate index, which also means that you can easily exclude entire columns not included in the query (ie. Queries for Country and Sales will ignore Order and Product), thereby increasing the speed and efficiency of the query

1. HANA stores data in-memory for fast access

In-memory means that the data in the hard disk is loaded into RAM. In doing so, data is retrieved much more quickly because RAM is attached directly to the system’s motherboard.

RAM is also cheaper than ever, meaning a company can load terabytes of data into memory and report directly from it.

The fact that data is stored in a columnar fashion in-memory is what enables SAP HANA.

Only affected columns have to be read during the selection process of a query. Any of the columns can serve as an index.

2. This data is then compressed

Because each column is stored separately, this means we can achieve greater rates of compression.

For example, let’s look at how HANA would compress “Italy”:

As indicated above, the records indicating “Italy” have been compressed four to one. Imagine if we had a million records – the compression ratio would be that much higher, resulting in an increase in performance.

With this high rate of compression in HANA, we can achieve a lower overall data footprint, which means we require less RAM and less resources, lowering the overall cost.

3. Faster retrieval with HANA’s Dictionary

There is no need to decompress compressed data when reading from the column store.

The reason for this is “The Dictionary”, which assigns a value for each record (ie. Italy = 1, Spain =2), and uses this logic every time the data needs to be accessed. This is how HANA “memorizes the book”.

If we wanted to search “Italy”, HANA will refer to the dictionary and retrieve only values from the Country column related to Italy, ignoring everything else not in Section 1.

4. Aggregating transactional data

Aggregations are the bread and butter of transactional systems, allowing companies to aggregate their transaction data by sum, average, etc.
For example, if I were doing a SUM on Italy Sales, HANA will do the following:


In traditional transactional systems, these aggregates have to be materialized, meaning they are created and take up space in the system. HANA does the opposite - performing aggregations on the fly without taking space, which translates to less resources and lower costs.

The speed comes from the fact that the numerical values in “Sales” are stored in-memory, in its own, separate column, meaning HANA can scan it very quickly.

This real time aggregation of data in HANA is enabled by hardware, in-memory columnar storage, and various engines/operators built into the HANA system.

As per the above example, the combination of dictionary compression on the Italy value and on-the-fly aggregation of transaction data, demonstrates the efficiency in which HANA processes data.

5. Parallel Execution

Now imagine all of the above examples, but happening multiple times at once. HANA partitions, or splits, the column being processed, and distributes the workload (aggregation, search, etc.) across multiple CPU cores – if a HANA deployment had 80 cores, then these would be happening 80 at a time, taking mere milliseconds each.

In HANA, you have the option to go with a traditional row-store. While that would still be fairly quick due to the hardware, in-memory Columnar storage is the great enabler - the foundation that enables HANA to, in one second, process data equivalent to every repair and service visit for every car on earth in the last 12 months, or every address that everyone currently alive has ever lived in.

Executing queries with speed and efficiency, and calculating and aggregating on the fly, HANA is more than just a database. It is a real time data platform - a perfect symphony of hardware and software innovation that allows entire companies to run on a single platform – SAP HANA.

Advantages and disadvantages of row-based tables:

Row based tables have advantages in the following circumstances:

The application needs to only process a single record at one time (many selects and/or updates of single records).
The application typically needs to access a complete record (or row).
Neither aggregations nor fast searching are required.
The table has a small number of rows (e. g. configuration tables, system tables).

Row based tables have dis-advantages in the following circumstances:

In case of analytic applications where aggregation are used and fast search and processing is required. In row based tables all data in a row has to be read even though the requirement may be to access data from a few columns.

When to use Row-based or Column Based Tables?

In case of analytic applications where aggregations are used and fast search and processing is required row-based storage are not good. In row based tables all data in a row has to be read even though the requirement may be to access data from a few columns. Hence these queries on huge amounts of data take a lot of time.

In columnar tables, this information is stored physically next to each other, significantly increasing the speed of certain data queries.

The following example shows the different usage of column and row storage, and positions them relative to row and column queries. Column storage is most useful for OLAP queries (queries using any SQL aggregate functions) because these queries get just a few attributes from every data entry. But for traditional OLTP queries (queries not using any SQL aggregate functions), it is more advantageous to store all attributes side-by-side in row tables. HANA combines the benefits of both row- and column-storage tables.



To enable fast on-the-fly aggregations, ad-hoc reporting, and to benefit from compression mechanisms it is recommended that transaction data is stored in a column-based table.

The SAP HANA data-base allows joining row-based tables with column-based tables. However, it is more efficient to join tables that are located in the same row or column store. For example, master data that is frequently joined with transaction data should also be stored in column-based tables.

No comments:

Post a Comment