How to do a multi-table update with a limit in MySQL

According to the MySQL documentation, you can’t do a multi-table UPDATE with a LIMIT.

What’s a multi-table update with a limit? Well, something like this:

UPDATE
    foo
    , bar
SET
    foo.baz=bar.baz
WHERE
    foo.baz IS NULL
    AND foo.id=bar.id
LIMIT
    1000
;

(Which doesn’t work. Of course, you can do single table UPDATEs with a LIMIT just fine)

Why would you even want to do this?

Well, anytime you have monster sized tables, and you don’t want to lock everybody else while you either read (from bar) or write (to foo). If you can put a limit on the update, you can call it repeatedly, in small chunks, and not choke everything for all other users.

For example, if bar happens to have, say, ohhh, 30 million rows in it, foo happens to have ooh, 2 million rows and they’re both used by everything, all the time.

So, here’s a sneaky way to get around this limitation. I did promise one, right there in the title, after all.

UPDATE
    foo
    , (SELECT
            bar.id
            , bar.baz
        FROM
            foo
            , bar
        WHERE
            foo.id=bar.id
            AND foo.baz IS NULL
        LIMIT
            1000
    ) tmp
SET
    foo.baz=tmp.baz
WHERE
    foo.id=tmp.id
;

Some important notes:

  • The update conditions (foo.baz IS NULL) go inside the subquery, along with the LIMIT.
  • We have to match ids twice – once for the subquery, and once against the created temporary table. That’s why we make sure we SELECT both the id and baz from bar in the subquery.
  • There’s no conditionals (other than id match) on the outside WHERE condition, since we’ve done them all in the subquery.
  • MySQL also has a limitation of not allowing you to UPDATE while SELECTing from the same table in a subquery. Notice that this sneakily avoids it by only SELECTing from the other table.

So, how about that? You can now do limited multi-table updates.

Oh, except for one. Minor. Problem.

This doesn’t work with temp tables (eg if foo was created with a CREATE TEMPORARY TABLE statement).

Bugger.

However, here’s a sneaky way around that limitation too.

First of all, give your temp table another column, in the example below “ref_no INT”.

Make sure you have an index on the id, otherwise it’ll be dog slow.

Then do this:

# do this in chunks of 1k
SET @counter = 1;

REPEAT

    UPDATE
        tmp_foo
    SET
        ref_no=@counter
    WHERE
        ref_no=0
        AND baz IS NULL
    LIMIT
        1000
    ;
    COMMIT;

    UPDATE
        tmp_foo
        , bar
    SET
        tmp_foo.baz=bar.baz
    WHERE
        tmp_foo.ref_no=@counter
        AND tmp_foo.id=bar.id
    ;
    COMMIT;

    SET @counter = @counter + 1;

UNTIL (SELECT COUNT(id) FROM tmp_foo WHERE ref_no=0 AND baz IS NULL) = 0
END REPEAT;

Some important notes:

  • We’re basically flagging a thousand rows at a time, then matching only against those rows – pretty simple concept really.
  • The commits are in there because MySQL can be a bit weird about not propagating changes to the database if you don’t commit inside your stored proc. This ensures that updates are passed out, which also means I can run multiple copies of this stored proc concurrently with moderate safety (if I replace @counter with a suitably large RAND() value) – well, as much as you can normally expect with MySQL anyway.
  • If you want to reuse the temp table (say, to update something else from – a reverse update to that shown above) you’ll need to reset all the ref_no’s to 0.
  • Whatever conditions are in the initial WHERE need to be mirrored in the final SELECT COUNT.
  • Obviously just drop the table when you’re finished.

As a bonus, I’ve found this is actually quicker than doing one single large scale update. Why? Less memory is used.

So look at that. TWO ways to get multi-table updates with a limit. Nifty.

related

  • No Related Posts

February 9th, 2012 | Code, Software-Engineering |
Mastodon