Fuzzy SQL Update Sugar Contacts

Ever need to match up records for updates but there is no clear relationship to the existing data? There may be duplicates, multiple hits, or a new update may be needed. Here is some sample SQL I used when I needed to reassign contacts in SugarCRM based on a certain kind of relevance, no predictable fields or keys, the data comes in a bucket of excel. Yikes!

This applies to SugarCRM 6.x and is not tested on 7.x without retooling but the technique would help with similar data.

Targeting MySQL, this tactic should work fine in PostgreSQL. My approach is to take the weighted score of all fields that are similar, add that and if it is bigger than a certain value, go ahead and do work on it.  If there are multiple results, the highest score SHOULD be the right one to update.  If too low I know to insert.

The output of the generic statement looks like this:

SCORE CUSTOM ACCOUNT_ID ACCOUNT_NAME FIRST_NAME LAST_NAME CONTACT_ID CITY STATE POSTALCODE TITLE
-------------------------------------------------------------------------------------------------
5, 11111, 22222, "ABC CORP", FIRST, LAST, 123456, CITY, ST, 12345-1234, OWNER
2, 33333, 44444, "ANY CORP", FIRST, LAST, 789012, VILLAGE,WV, 67890-1111, OWNER
3, 55555, 66666, "A CORP", First, Last, 345678, VILLE, TX, 11111-2222, MANAGER
3, 77777, 88888, "INST INC", FIRST, LAST, 966866, TOWN, IN, 33333-4444, MANAGER
3, 99999, 33333, "A1 CORP", FIRST, LAST, 977977, DALE, IN, 50000-6666, MANAGER

In the first line you see the score 5.  In this case it tells me that record is probably the right one to update. The score of 2 in my dataset will almost certainly not be a match. A score of 3 and up are good if there is only one record returned. In programs that use my SQL, the highest value is the one to work on.

Look at the first line of SQL. That series of numbers comes from the CASE statements below. I actually tweak those values depending on my dataset in the THEN clause from 1 to 0 if it isn’t a strong differentiator(like State), or I might increase that number to something like 100 on the Email or Phone fields.

In a program I would replace the strings to parameterized values.  Your mileage may vary, but please let me know if you have seen something better or can make this SQL prettier.

Here’s the generic sample(scrubbed, of course):

select
 ( bar.res1 + bar.res2 + bar.res3 + bar.res4 + bar.res5 + bar.res6 + bar.res7 + bar.res8 ) as score,
 custom_field_c,
 account_id,
 account_name,
 first_name,
 last_name,
 contact_id,
 city,
 state,
 postalcode,
 title
FROM
 (SELECT
 custom_field_c,
 account_id,
 account_name,
 first_name,
 last_name,
 contact_id,
 account_assigned_user_id,
 account_assigned_rep,
 contact_assigned_user_id,
 contact_phone,
 city,
 state,
 postalcode,
 title,
 email_address,
 (CASE WHEN substr(postalcode, 1, 5) = '12345'
 THEN 1
 ELSE 0 END) AS res1,
 (CASE WHEN upper(title) = upper('title')
 THEN 1
 ELSE 0 END) AS res2,
 (CASE WHEN upper(email_address) = upper('user@corp.com')
 THEN 1
 ELSE 0 END) AS res3,
 (CASE WHEN upper(first_name) = upper('first')
 THEN 1
 ELSE 0 END) AS res4,
 (CASE WHEN upper(last_name) = upper('last')
 THEN 1
 ELSE 0 END) AS res5,
 (CASE WHEN upper(city) = upper('city')
 THEN 1
 ELSE 0 END) AS res6,
 (CASE WHEN upper(state) = upper('st')
 THEN 1
 ELSE 0 END) AS res7,
 (CASE WHEN upper(contact_phone) = upper('123-123-1234')
 THEN 1
 ELSE 0 END) AS res8
 FROM
 (SELECT
 b.custom_field_c,
 a.id AS account_id,
 a.name AS account_name,
 d.first_name AS first_name,
 d.last_name AS last_name,
 d.id AS contact_id,
 a.assigned_user_id AS account_assigned_user_id,
 b.custom_field_c AS custom_field,
 d.assigned_user_id AS contact_assigned_user_id,
 coalesce(d.phone_work, d.phone_mobile, d.phone_other) AS contact_phone,
 d.primary_address_city AS city,
 d.primary_address_state AS state ,
 d.primary_address_postalcode AS postalcode,
 d.title AS title,
 e_a.email_address AS email_address
 FROM sugarcrm.accounts a
 INNER JOIN sugarcrm.accounts_cstm b ON b.id_c = a.id AND a.deleted = 0
 INNER JOIN sugarcrm.accounts_contacts c ON c.account_id = a.id AND c.deleted = 0
 INNER JOIN sugarcrm.contacts d ON d.id = c.contact_id AND d.deleted = 0
 LEFT JOIN sugarcrm.email_addr_bean_rel bean
 ON d.id = bean.bean_id AND bean.bean_module = 'Contacts' AND bean.deleted = 0
 LEFT JOIN sugarcrm.email_addresses e_a
 ON e_a.id = bean.email_address_id AND e_a.invalid_email = 0 AND NOT e_a.email_address IS NULL AND
 e_a.deleted = 0
 WHERE upper(d.first_name) = UPPER('first')
 AND upper(d.last_name) = upper('last')
 ) foo
 )bar

From here we have a fighting chance of looking at the data and making tweaks to weights so that we can automate the data changes. The discussion here completely ignores impact to child records, BTW, and that must be considered before making sweeping changes.

Let me know what you can do with this!

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.