1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Postgres recursive query not using trivial index on massive table

Discussion in 'Programming/Internet' started by David Wolever, Oct 8, 2018.

  1. I've got a simple recursive query to find a tree of all posts:

    with recursive all_mentions(post_id, author_id, mentioned_id) as (
    select post_id, author_id, mention_id
    from mentions
    where author_id = 'abc123' -- <-- index gets used

    union all

    select next.post_id, next.author_id, next.mention_id
    from all_mentions as cur
    inner join mentions as next on
    next.author_id = cur.mention_id -- <-- index does not get used
    where next.post_id > cur.post_id
    select *
    from all_mentions

    There is an index on mentions.author_id, which gets used in the first portion of the query (ie, for … where author_id = 'abc123'), but it does not get used in the second part of the query (ie, … where next.author_id = cur.mention_id), and instead there is a table scan + sort of the 500M line table:

    Limit (cost=3818225300.80..3818225310.80 rows=500 width=136)
    CTE all_mentions
    -> Recursive Union (cost=779.32..3818225300.80 rows=13002675849 width=193)
    -> Bitmap Heap Scan on mentions (cost=779.32..75336.89 rows=18919 width=193)
    Recheck Cond: (("author_id")::citext = '0xc5918a927c4fb83fe99e30d6f66707f4b396900e'::citext)
    USES INDEX => -> Bitmap Index Scan on mentions_author_id (cost=0.00..774.59 rows=18919 width=0)
    Index Cond: (("author_id")::citext = '0xc5918a927c4fb83fe99e30d6f66707f4b396900e'::citext)
    -> Merge Join (cost=282994142.90..355809644.69 rows=1300265693 width=193)
    Merge Cond: ((all_mentions_1."author_id")::citext = (next."mention_id")::citext)
    Join Filter: (next.post_id > all_mentions_1.post_id)
    -> Sort (cost=25542.31..26015.28 rows=189190 width=40)
    Sort Key: all_mentions_1."author_id"
    -> WorkTable Scan on all_mentions all_mentions_1 (cost=0.00..3783.80 rows=189190 width=40)
    -> Materialize (cost=282968600.59..285568486.01 rows=519977084 width=166)
    -> Sort (cost=282968600.59..284268543.30 rows=519977084 width=166)
    DOES NOT USE INDEX => Sort Key: next."mention_id"
    -> Seq Scan on mentions next (cost=0.00..37074366.84 rows=519977084 width=166)
    -> CTE Scan on all_mentions (cost=0.00..260053516.98 rows=13002675849 width=136)

    Other relevant details:

    • There are many (millions) authors, and each author has a relatively small number of posts (on ~10-100; obviously there's a long tail)
    • I expect the query to return ~hundreds of rows

    Any pointers towards why this could be?

    Login To add answer/comment

Share This Page