In MySQL, you can delete duplicate rows using a DELETE query with a JOIN and a subquery. One way to delete duplicate data is using JOIN and subquery together like this:
DELETE t1
FROM table_name t1
JOIN (
SELECT MIN(id) as min_id, unique_column1, unique_column2
FROM table_name
GROUP BY unique_column1, unique_column2
HAVING COUNT(*) > 1
) t2 ON t1.unique_column1 = t2.unique_column1
AND t1.unique_column2 = t2.unique_column2
AND t1.id > t2.min_id
This query will first use a subquery to select the minimum id and unique columns of duplicate records. Then it JOINs the original table with the subquery on the unique columns and the id column. Finally, it deletes all the records from the original table that have a greater id than the minimum id of the duplicate records.
It’s important to note that before running this query, you should make a backup of your data and test the query on a small subset of data to ensure that the query returns the expected results. You need to replace the “table_name” with the real name of the table and “unique_column1, unique_column2” with the columns used to identify the duplicate records, Also make sure you have a primary key in the table.
Another approach would be using DELETE JOIN with GROUP BY, this query will delete all the duplicate rows except one and keep only the one that is unique.
DELETE t1.*
FROM table_name t1
INNER JOIN (
SELECT unique_column1, unique_column2
FROM table_name
GROUP BY unique_column1, unique_column2
HAVING COUNT(*) > 1
) t2 ON t1.unique_column1 = t2.unique_column1 AND t1.unique_column2 = t2.unique_column2
Another approach to deleting duplicate rows in MySQL is by using the “not exists” clause in a DELETE query. This approach can be useful if you want to keep the row with the maximum or minimum value of a specific column among the duplicate records. Here’s an example of how you can use the “not exists” clause to delete duplicate rows while keeping the one with the maximum value of a “timestamp” column:
DELETE FROM table_name
WHERE NOT EXISTS (
SELECT 1
FROM (
SELECT MAX(timestamp) as max_timestamp, unique_column1, unique_column2
FROM table_name
GROUP BY unique_column1, unique_column2
HAVING COUNT(*) > 1
) t
WHERE table_name.unique_column1 = t.unique_column1
AND table_name.unique_column2 = t.unique_column2
AND table_name.timestamp = t.max_timestamp
)
This query first uses a subquery to select the maximum timestamp and unique columns of the duplicate records.