Hero Backgroud Elements 2
Reading Time: 4Min

What is a Data Lake and Why You Need One 

March 19, 2020

Blog
Author
Anthony Karge Anthony Karge
Anthony Karge is a Senior Solutions and Product Marketing Specialist in the SAP Digital Strategy Team.

A data lake is a repository for all types of data. From this repository, data can be examined, accessed, and used to make data-backed business decisions. However, those are just the basics. Modern data lakes are now so much more than just a repository.

It's a good thing modern data lakes are so versatile, because there are so many data challenges today. Data is growing enormously, with most organizations having to think on a petabyte scale. Meanwhile, data is getting more complex thanks to the number of sources and formats, like IoT and social media. Despite these enormous challenges, the data lakes of today are up to the challenge.

Deploying a modern data lake helps your organization lower costs, improve performance, and gain easier access to data that generates insights.

Where the Data Lake Fits in the Data Tier

Now that we know what a data lake is, let's dive a little deeper into the role of a data lake in SAP HANA Cloud. Think of your data architecture as a pyramid. At the top of the pyramid is the most frequently needed data – it's often the most valuable and what you need to access immediately and often. This data at the top tier, also known as hot data, is stored in-memory, which is both the fastest way to access data but also more expensive than other tiers.

At the bottom is the raw data. Here lies the data that will only infrequently need to be accessed. This layer can be accessed quickly, but not as quickly as in-memory storage offers. The trade-off for speed is that this tier can handle large amounts of data for a low price.

In the middle, you have the important data lake. We'll get into that shortly, but before we do here's what the tiers of data look like in SAP HANA Cloud.

For data lakes of the past, this tier would typically be cold storage. That's not the case with this data. Here, the relational database structure simplifies and accelerates data analysis. Even with massive data volumes, the data can be accessed rapidly. A single data layer makes accessing the data, no matter where it is, an easy process.

In short, the SAP HANA Cloud's data lake helps manage data through a reasonable life cycle. Critical data is available in real time, urgent data available in near-real-time, and important but older data is available as quickly as possible. This tiering helps keep costs down, as you can choose to store your data depending on how often and how quickly you need it.

What to Look for in a Data Lake

If a data lake sounds like something that could help your data strategy, you should know that not all data lakes are created equally. Before you decide which data lake solution to dive into, there are a few factors to consider in order to choose the solution that's best for you.

Ease of Deployment: A data lake that isn't easy to set up or use generates no value. Make sure the solution you choose offers automated deployment and operations to prevent headaches.

Scalability: No company remains static when it comes to data. As an organization grows, so too does the amount of data generated. You want to make sure that the solution you choose can scale with you no matter the data volume, user count, or complexity of the workload.

High Speed: The faster data can be accessed, the more efficient everyone using data becomes. That's why speed should never be overlooked.

There are, of course, other factors to consider when choosing a data lake. To see what else is available, read more about SAP HANA Cloud and its built-in data lake.