Download Video and Audio from YouTube

Fuzzy Matching and Deduplicating Data with ML Transforms for AWS Lake Formation

3,545 views 54 1

ML Transforms for AWS Lake Formation enable you to identify duplicate or linked records in your dataset, even when the records do not have a common unique identifier and no fields match exactly. ML Transforms for AWS Lake Formation can help you with the following problems such as linking customer records across different customer databases, even when many customer fields do not match exactly across the databases (e.g. different name spelling, address differences). It can help match external product lists against your product catalog, such as lists of hazardous goods or lists of goods that can’t be transported by air. Additionally, it helps to deduplicate customer accounts, when the same person makes multiple registrations. In this tech talk, you'll get an overview of ML Transforms for AWS Lake Formation and learn how to find matching records between two different lists of consumer electronic products. Learning Objectives: - Understand ML Transforms, a new capability of AWS Glue and AWS Lake Formation - Learn how to match external product lists against your product catalog, such as lists of hazardous goods or lists of goods that can’t be transported by air - Learn how to match and de-duplicate records with ML Transforms