Amazon just announced the availability of the highly expected Lake Formation service for their AWS platform. Lake Formation enables cloud administrators to easily setup and manage data lakes. Lake Formation is as simple as defining data sources and what data access and security policies you want to apply. Lake Formation then helps you collect and catalog data from databases and object storage, move the data into your new Amazon S3 data lake, clean and classify your data using machine learning algorithms, and secure access to your sensitive data. At Data Glue our clients are constantly asking us for better tools for managing their data lake. Let us pause for a moment and clear the air, since there is a lot of confusion regarding the meaning of a data lake.
If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.
Our friend James Dixon from Pentaho used the following analogy to famously coin the term: “If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” I always love to use analogies to explain software and cloud services, since it allows us to easily familiarize ourselves with technical concept from a non-technical point of view. In a more technical language, a data lake s a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions
Fully Managed - No Need to Administer Servers
The data lake then serves as an organizations source of information in its ‘raw’ form. By raw we mean, before any manipulation of the information has been made. Developers enhance and clean the data in order to make it useful for other business groups/departments in an organization. The lake then should be large and ever growing as more data is brought in. The growth of the lake can become a challenge to manage for even the best administrators and can lead to underutilized resources and unnecessary charges in fees. In addition, since the data in the lake has not be classified then it becomes difficult to create a catalog that can help users easily identify the ‘needle in the haystack.’ AWS Lake Formation is a fully managed service, meaning we do not have to worry (or worry less) about underutilizing resources and generating unnecessary charges in fees. Lake Formation facilitates with a click of a few buttons the building, securing and managing of a data lake of any size. Our clients range from a few events per day to billion of events per day. All handled by the same platform.
Maximize Your Data
At Data Glue we use the Lake Formation to accelerate the integration of data into an organization. Typically, a client, that has no current data lake, would like to integrate data from a legacy system or other data source into our managed solutions. In order to accelerate the process we can quickly build and secure a data lake and integrate that new data source into our pre-existing solution.
Contact us today for a quick demo on how we can use AWS Lake Formation to accelerate your journey to the cloud
Nolan Davidson Reply
We had such a big data problem. All sorts of sources with no organized ingestion protocol, duplicate data and very messy results before. Not anymore, thanks to Chalrie our data is now one of the most important assets within our organization. Thank you.