Unlocking Efficiency: Pairing in Entity Resolution

Buckle up, data detectives, because today we’re diving into the world of pairing – the entity resolution trick that’ll save you time, headaches, and maybe even a few tears. Don’t worry if you’re new to this whole ER game; we’re keeping it simple and friendly here. By the end of this post, you’ll have a solid grasp of what pairing is, why it’s so awesome, and how it can make your data cleaning adventures a whole lot smoother....

October 9, 2024 · 7 min · Paul Kinsvater
A simple image that says "BERT for Entity Resolution".

A Beginner's Guide to BERT for Entity Resolution

BERT, or Bidirectional Encoder Representations from Transformers, quickly proved its worth and took the entity resolution community by storm. If BERT is new to you, picture it as ChatGPT’s older, brainy cousin. BERT is like the translator, converting text into numbers that computers understand. Meanwhile, ChatGPT is the storyteller, using those numbers to generate fresh text – that’s the magic behind generative AI. What’s the Big Deal with BERT? The original base version of BERT is a deep learning architecture consisting of 110 million parameters....

September 22, 2024 · 6 min · Paul Kinsvater
Two coins with words "Dedupe" and "link".

Deduplication vs. Linkage: Two Sides of the Same Data Quality Coin

In our data-drenched world, it’s easy to drown in duplicates and disconnected info. It’s like having a messy closet, but way worse for your business! When you’re dealing with a single dataset and need to eliminate duplicate records within it, it’s natural to call this process “deduplication.” On the other hand, when you have multiple, already deduplicated datasets and need to connect records that represent the same real-world entity across them, the term “linkage” is commonly used....

September 10, 2024 · 5 min · Paul Kinsvater