Home > Research > AWS Lake Formation Takes Pain Out of Data Lakes

AWS Lake Formation Takes Pain Out of Data Lakes

AWS Lake Formation makes it easier for users to set up and manage data lakes. But organizations will face challenges in determining how to derive value from their data lakes.

AWS Lake Formation does a lot of the heavy lifting in setting up data lakes for AWS users.

A data lake is a single repository of an organization’s data, including both the raw data in its original form and restructured and transformed data prepared for analysis. The purpose of a data lake is to break down data silos and make it easier for organizations to derive insights.

Establishing data lakes has traditionally been fraught with technical challenges. IT professionals have to identify the appropriate data repositories and bring together the various sources, categorize data, deduplicate data, and cross-link records, all while providing for appropriate security and access permissions and sometimes having to transform or restructure data in certain ways to make it useable. Best practice would also require regular auditing access to ensure that policies are being adhered to.

As such, building and managing a data lake from scratch can be expensive and time-consuming, not to mention difficult.

AWS Lake Formation is designed to take on much of the heaving lifting, making it easier to set up, configure, and manage data lakes. Lake Formation reduces the work in setting up the data lake to the user by “defining data sources and what data access and security policies [they] want to apply.” Then Lake Formation “helps [them] collect and catalog data from databases and object storage, move the data into ... [an] S3 data lake, clean and classify [the] data using machine learning algorithms, and secure access to [their] sensitive data.” At the end of the day, “users can access a centralized data catalog which describes available data sets and their appropriate usage,” and can then work with the data using various AWS analytics services.

Importantly, Lake Formation simplifies the establishment of security and access policies, because administrators can define the policies within Lake Formation itself, rather than having to set up the policies for each service using identity and access management (IAM) roles through the AWS console or AWS CloudFormation. The user simply has to define policies in one place, and AWS will manage the enforcement of those policies across the entire platform, greatly simplifying auditing and compliance concerns.

Image: AWS Lake Formation’s process. Source: AWS.


Our Take

AWS Lake Formation assists users in solving many significant technical and operational challenges in setting up and managing data lakes.

However, organizations will still have their work cut out for them to get business value from those data lakes. AWS Lake Formation gives users a data lake, but the platform doesn’t ensure that they know how to use it to derive the most value.

While the data lake is a powerful tool, users will need working knowledge of data and analytics and proficiency in the business context of their organization in order to be able to ask the right questions to perform the truly insightful analyses that lead to breakthroughs and better business decisions.

In this sense, AWS Lake Formation is just one further example of the broader trend in cloud services and in technology: back-end and non-business-facing IT functions are becoming more and more commoditized as point-and-click service offerings.

At the end of the day, this trend is transforming demand for skill sets across the industry, but it isn’t making IT any easier overall. Technology professionals who can master the new tools and combine technical skills with a deep understanding of their organizational context will thrive in the years to come.


Want to Know More?

The End of Hadoop and Cloudera?