Home > Categories > Big Data > AWS Lake Formation Takes Pain Out of Data Lakes

Software Category

Big Data

Write Review

AWS Lake Formation Takes Pain Out of Data Lakes

AWS Lake Formation makes it easier for users to set up and manage data lakes. But organizations will face challenges in determining how to derive value from their data lakes.

AWS Lake Formation does a lot of the heavy lifting in setting up data lakes for AWS users.

A data lake is a single repository of an organization’s data, including both the raw data in its original form and restructured and transformed data prepared for analysis. The purpose of a data lake is to break down data silos and make it easier for organizations to derive insights.

Establishing data lakes has traditionally been fraught with technical challenges. IT professionals have to identify the appropriate data repositories and bring together the various sources, categorize data, deduplicate data, and cross-link records, all while providing for appropriate security and access permissions and sometimes having to transform or restructure data in certain ways to make it useable. Best practice would also require regular auditing access to ensure that policies are being adhered to.

As such, building and managing a data lake from scratch can be expensive and time-consuming, not to mention difficult.

AWS Lake Formation is designed to take on much of the heaving lifting, making it easier to set up, configure, and manage data lakes. Lake Formation reduces the work in setting up the data lake to the user by “defining data sources and what data access and security policies [they] want to apply.” Then Lake Formation “helps [them] collect and catalog data from databases and object storage, move the data into ... [an] S3 data lake, clean and classify [the] data using machine learning algorithms, and secure access to [their] sensitive data.” At the end of the day, “users can access a centralized data catalog which describes available data sets and their appropriate usage,” and can then work with the data using various AWS analytics services.

Importantly, Lake Formation simplifies the establishment of security and access policies, because administrators can define the policies within Lake Formation itself, rather than having to set up the policies for each service using identity and access management (IAM) roles through the AWS console or AWS CloudFormation. The user simply has to define policies in one place, and AWS will manage the enforcement of those policies across the entire platform, greatly simplifying auditing and compliance concerns.

Image: AWS Lake Formation’s process. Source: AWS.

Our Take

AWS Lake Formation assists users in solving many significant technical and operational challenges in setting up and managing data lakes.

However, organizations will still have their work cut out for them to get business value from those data lakes. AWS Lake Formation gives users a data lake, but the platform doesn’t ensure that they know how to use it to derive the most value.

While the data lake is a powerful tool, users will need working knowledge of data and analytics and proficiency in the business context of their organization in order to be able to ask the right questions to perform the truly insightful analyses that lead to breakthroughs and better business decisions.

In this sense, AWS Lake Formation is just one further example of the broader trend in cloud services and in technology: back-end and non-business-facing IT functions are becoming more and more commoditized as point-and-click service offerings.

At the end of the day, this trend is transforming demand for skill sets across the industry, but it isn’t making IT any easier overall. Technology professionals who can master the new tools and combine technical skills with a deep understanding of their organizational context will thrive in the years to come.

Want to Know More?

The End of Hadoop and Cloudera?

Other Recent Research in Big Data

Big Data

IBM Raises Price on Software Support; Shoves Customers Toward the Cloud

IBM is changing the terms of its ubiquitous Passport Advantage agreement to remove entitled discounts on over 5,000 on-premises software products, resulting in an immediate price increase for IBM Software & Support (S&S) across its vast customer landscape.

Big Data

PHEMI: A Data Privacy Tool for Healthcare Providers

PHEMI is a data privacy solution focused on keeping data-processing activities secure by redacting information based on the role of the accessor. Thus, allowing such data to be used for multiple use cases without compromising privacy.

Big Data

Immuta Named to Fast Company’s 2020 List of the World’s 50 Most Innovative Companies

Joining the ranks of giants such as Snap (Snapchat’s parent company), Microsoft and Tesla, Immuta the automated Data Governance company has been named to Fast Company’s 2020 list of the World’s 50 Most Innovative Companies.

Big Data

Databricks Lakehouse Combines the Best of Data Lake and Data Warehouse in a Single Platform

Databricks has launched a new Data Ingestion Network, made up of partners whose integrations to Data Ingest provide hundreds of connectors and enable automation to move disparate data into Databricks’ new storage layer, eliminating the need to maintain siloed data in a data lake and data warehouse.