AWS re:Invent 2021: Data related launches

re:Invent AWS data

Saku Vaittinen 06.12.2021

AWS re:Invent 2021 is again over and hopefully next year it is possible to meet people again live, though I didnâ€™t mind avoiding the jet-lag and being able to view all interesting keynotes and other presentations from my comfortable home theatre! As always, thereâ€™s tons of material waiting for watching and some of it is still waiting for being released this week. Following the keynotes and showcases I noticed some interesting trends on data and analytics. This is not a thorough coverage but rather a cherry-pick for you to enjoy.

SageMaker

No-code is now becoming mainstream in AWS data services. This is part of data democratization and decreasing time to market but also a natural consequence from the lack of skilled people in IT and especially in data field. The no-code tools in AWS are now only in their first generation and may not make yet any great miracles, but they help in shifting the excessive load from data development and data science more towards the business.

The SageMaker family got two new members that both fade out the coding from data science work. SageMaker Canvas provides a user interface for connecting to number of varying data sources and preparing data for Machine Learning models. SageMaker Ground Truth Plus allows creating high-quality training datasets at scale for ML by removing the heavy lifting caused by data labelling that is crucial part of ML training.

SageMaker can now also recommend inference instances, train models in half time and to help people learn, you can use no-configuration SageMaker Studio Lab for free.

Lake Formation

Another trend now common with many different vendorâ€™s data solutions is the capability to control access to data by row and column level. In part this links to the same problems as what no-code has come to solve, moreover in organisations these data governance decisions must be able to be done without technical knowledge of the underlying platform. Previously this has been usually done by physically separating the data e.g., in two data sets or even in data stores as those are typically the layers where the access control has been available with sensible effort. The problem in this approach is that it requires more technical competence and observation, and it creates more complexity and steps to the process, i.e., points of possible failure.

For tackling this problem AWS introduced Governed Tables with row and cell-level security and storage optimization for Lake Formation. Governed Tables allow multi-table transaction support and manages conflicts, so this has full potential in simplifying the data pipeline implementations. The row and cell-level security enables now elastic use of data sets that contain sensitive data without specially created filtering and transformation routines. This granularity level has already been available in Amazon Redshift, and now it can be stretched to cover the underlying data lake layer with identical accessibility controlled by AWS Identity and Access Management. The Lake Formation access controls are supported by all AWS services that have accessibility to the data lake, such as Glue and QuickSight. Automatic storage optimization sounds much like how Redshift manages its data on storage layer to speed up queries, so now a similar capability is available in S3.

Data Migration Service

As only a fraction of global IT workloads have found their way from basements to the cloud, it doesnâ€™t surprise that migration is a thing that has not yet even happened in large scale. The services in this category may be more invisible for our clients than those mentioned before, but no less important, since migrating the data is many times the most time-consuming part of an integration project.

AWS has had for years a reliable workhorse for data migrations, the AWS Data Migration Service. It feels even a bit legacy compared to all modern serverless services, but it does its job well. For these mentioned large scale data migrations, DMS Fleet Advisor automates migration planning and helps to migrate database and analytics fleets to the cloud at scale while minimising the effort. DMS Studio then integrates all DMS Fleet Advisor and existing DMS and Schema Conversion Tool together for helping to orchestrate the whole data migration process from assessment to the actual migration.

Conclusion

The innovation speed in data services is already amazing. The whole field seems to progress much faster than the other cloud services, and at the same time the makers are sparse. It doesn't require a crystal ball to forecast that by reducing the complexity of use, the data innovation speed will accelerate much more in the short foreseeable future.

Besides these services mentioned here, there were dozens and dozens of other launches, releases and updates. Particularly interesting announcement for Europe was the launch of Local Zones in 2022 in Denmark, Finland, Norway and many other European countries not having a local AWS Region. Local Zones will have some nice data-related use cases, weâ€™ll get back to those later when the launch dates are closer.

Until that, Let's Build! Knowit will be happy to help you.

About the author

Saku Vaittinen