The most important Job for us to get right to be a helpful price comparison platform for fashion is to find the same product in as many shops as possible. At first sight this sounds like an easy enough task to perform but structured meta data is scarce while metadata embedded in flowing text is relatively abundant.

In this post I will explore our attempt to find IDs in product names and descriptions. We based our solution on a basic machine learning technique, the identification tree; which we also made dynamic to be able to retrain it at runtime.