We have been using Drupal 5 for a number of multilingual sites, and recently came across a problem of junk locale strings left over from modules we have stopped using and since deleted - namely CivicRM which inserted thousands of locales. Our client is just about to start translating these strings and they were messing up the view.
I begun with the locales_sources table and searched for everything with CivicRM in the path or in the source message text. This helped me to remove the strings so that they no longer appear in the manages strings page (/admin/settings/locale/string/search) for translators.
However now the locales_target table has a lot of orphans. It was reasonably easy to remove these with:
delete from locales_target
where lid not in (select lid from locales_source)
For good measure, I decided to add some unique constraints on the table. The column locales_source.lid was a primary key and already unique. I came across a strange situation in locales_target, however. The combination of lid and locale was not unique, even though I cannot see how it could be anything but.
My colleague, Peter, created a tricky bit of SQL to list these artifacts:
SELECT *
FROM locales_target
WHERE concat( lid, locale )
IN (
SELECT sum
FROM (
SELECT sum, count( sum ) AS numsum
FROM (
SELECT concat( lid, locale ) AS sum
FROM `locales_target`
)
SECOND
GROUP BY sum
HAVING numsum >1
)third
)
ORDER BY lid