AWS Lake Formation makes it easier for users to set up and manage data lakes. But organizations will face challenges in determining how to derive value from their data lakes.
AWS Lake Formation does a lot of the heavy lifting in setting up data lakes for AWS users.
A data lake is a single repository of an organization’s data, including both the raw data in its original form and restructured and transformed data prepared for analysis. The purpose of a data lake is to break down data silos and make it easier for organizations to derive insights.
Establishing data lakes has traditionally been fraught with technical challenges. IT professionals have to identify the appropriate data repositories and bring together the various sources, categorize data, deduplicate data, and cross-link records, all while providing for appropriate security and access permissions and sometimes having to transform or restructure data in certain ways to make it useable. Best practice would also require regular auditing access to ensure that policies are being adhered to.
As such, building and managing a data lake from scratch can be expensive and time-consuming, not to mention difficult.
AWS Lake Formation is designed to take on much of the heaving lifting, making it easier to set up, configure, and manage data lakes. Lake Formation reduces the work in setting up the data lake to the user by “defining data sources and what data access and security policies [they] want to apply.” Then Lake Formation “helps [them] collect and catalog data from databases and object storage, move the data into ... [an] S3 data lake, clean and classify [the] data using machine learning algorithms, and secure access to [their] sensitive data.” At the end of the day, “users can access a centralized data catalog which describes available data sets and their appropriate usage,” and can then work with the data using various AWS analytics services.Importantly, Lake Formation simplifies the establishment of security and access policies, because administrators can define the policies within Lake Formation itself, rather than having to set up the policies for each service using identity and access management (IAM) roles through the AWS console or AWS CloudFormation. The user simply has to define policies in one place, and AWS will manage the enforcement of those policies across the entire platform, greatly simplifying auditing and compliance concerns.
Image: AWS Lake Formation’s process. Source: AWS.
AWS Lake Formation assists users in solving many significant technical and operational challenges in setting up and managing data lakes.
However, organizations will still have their work cut out for them to get business value from those data lakes. AWS Lake Formation gives users a data lake, but the platform doesn’t ensure that they know how to use it to derive the most value.
While the data lake is a powerful tool, users will need working knowledge of data and analytics and proficiency in the business context of their organization in order to be able to ask the right questions to perform the truly insightful analyses that lead to breakthroughs and better business decisions.
In this sense, AWS Lake Formation is just one further example of the broader trend in cloud services and in technology: back-end and non-business-facing IT functions are becoming more and more commoditized as point-and-click service offerings.
At the end of the day, this trend is transforming demand for skill sets across the industry, but it isn’t making IT any easier overall. Technology professionals who can master the new tools and combine technical skills with a deep understanding of their organizational context will thrive in the years to come.
On May 24-25, Informatica held its annual conference in Las Vegas – the first time “in-person” since the beginning of the COVID-19 pandemic.
IBM is changing the terms of its ubiquitous Passport Advantage agreement to remove entitled discounts on over 5,000 on-premises software products, resulting in an immediate price increase for IBM Software & Support (S&S) across its vast customer landscape.
The beauty of good story telling is its applicability to the most unexpected situations. In 1871, Lewis Carroll wrote about the evil Queen trying to convince Alice to work for her, with a promise of “jam to-morrow and jam yesterday – but never jam to-day.” Little did he know that this one statement would be used by economists, politicians, playwrights, and musicians long after he wrote it – it's time to add data analysts to the list.
Egynte, a player in the cloud enterprise content management space since 2007, has recently emerged as a multi-faceted Software as a Service (SaaS) offering, now providing data classification and security options for businesses looking to identify, classify, and protect sensitive data.
A leader in the data security and privacy industry, Spirion presents its swift response to the changing landscape of COVID-19 through a no-cost offering of data security tools.
PHEMI is a data privacy solution focused on keeping data-processing activities secure by redacting information based on the role of the accessor. Thus, allowing such data to be used for multiple use cases without compromising privacy.
Board International follows the trend of delivering solutions by opening a solution marketplace while strengthening customer trust by getting SOC-2 and SOC-3 certifications.
Boomi, a Dell Technologies business, has been known for its lack of hierarchy and relationship management capability in its Master Data Hub (MDH) offering. Acquiring Unifi Software does not seem to fill this void but could even cannibalize MDH – unless the two products are merged into one.
Joining the ranks of giants such as Snap (Snapchat’s parent company), Microsoft and Tesla, Immuta the automated Data Governance company has been named to Fast Company’s 2020 list of the World’s 50 Most Innovative Companies.