3 hal :
- Anda menghitung ulang hal yang sama sekitar satu miliar setengah kali (sebenarnya semua hanya bergantung pada beberapa parameter yang sama untuk banyak baris)
- Agregat lebih efisien dalam potongan besar (JOIN) daripada dalam bit kecil (subkueri)
- MySQL sangat lambat dengan subquery.
Jadi, ketika Anda menghitung "penghitungan suara dengan option_id" (yang membutuhkan pemindaian tabel besar), dan kemudian Anda perlu menghitung "penghitungan suara dengan poll_id", nah, jangan mulai tabel besar lagi, gunakan saja hasil sebelumnya!
Anda bisa melakukannya dengan ROLLUP.
Inilah kueri yang akan melakukan apa yang Anda butuhkan, berjalan di Postgres.
Untuk membuat MySQL melakukan ini, Anda harus mengganti semua pernyataan "DENGAN foo AS (PILIH...)" dengan tabel sementara. Itu mudah. Tabel temp dalam memori MySQL cepat, jangan takut untuk menggunakannya, karena itu akan memungkinkan Anda untuk menggunakan kembali hasil dari langkah sebelumnya dan menghemat banyak komputasi.
Saya telah membuat data pengujian acak, sepertinya berfungsi. Dieksekusi dalam 0,3 detik...
WITH
-- users of interest : target group
uids AS (
SELECT DISTINCT user_id
FROM options
JOIN responses USING (option_id)
WHERE poll_id=22
),
-- votes of everyone and target group
votes AS (
SELECT poll_id, option_id, sum(all_votes) AS all_votes, sum(target_votes) AS target_votes
FROM (
SELECT option_id, count(*) AS all_votes, count(uids.user_id) AS target_votes
FROM responses
LEFT JOIN uids USING (user_id)
GROUP BY option_id
) v
JOIN options USING (option_id)
GROUP BY poll_id, option_id
),
-- totals for all polls (reuse previous result)
totals AS (
SELECT poll_id, sum(all_votes) AS all_votes, sum(target_votes) AS target_votes
FROM votes
GROUP BY poll_id
),
poll_options AS (
SELECT poll_id, count(*) AS poll_option_count
FROM options
GROUP BY poll_id
)
-- reuse previous tables to get some stats
SELECT *, ABS(total_percent - subgroup_percent) AS deviation
FROM (
SELECT
poll_id,
option_id,
v.target_votes / v.all_votes AS subgroup_percent,
t.target_votes / t.all_votes AS total_percent,
poll_option_count
FROM votes v
JOIN totals t USING (poll_id)
JOIN poll_options po USING (poll_id)
) AS foo
ORDER BY deviation DESC, poll_option_count DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=14910.46..14910.56 rows=40 width=144) (actual time=299.844..299.862 rows=200 loops=1)
Sort Key: (abs(((t.target_votes / t.all_votes) - (v.target_votes / v.all_votes)))), po.poll_option_count
Sort Method: quicksort Memory: 52kB
CTE uids
-> HashAggregate (cost=1801.43..1850.52 rows=4909 width=4) (actual time=3.935..4.793 rows=4860 loops=1)
-> Nested Loop (cost=0.00..1789.16 rows=4909 width=4) (actual time=0.029..2.555 rows=4860 loops=1)
-> Seq Scan on options (cost=0.00..3.50 rows=5 width=4) (actual time=0.008..0.032 rows=5 loops=1)
Filter: (poll_id = 22)
-> Index Scan using responses_option_id_key on responses (cost=0.00..344.86 rows=982 width=8) (actual time=0.012..0.298 rows=972 loops=5)
Index Cond: (public.responses.option_id = public.options.option_id)
CTE votes
-> HashAggregate (cost=13029.43..13032.43 rows=200 width=24) (actual time=298.255..298.317 rows=200 loops=1)
-> Hash Join (cost=13019.68..13027.43 rows=200 width=24) (actual time=297.953..298.103 rows=200 loops=1)
Hash Cond: (public.responses.option_id = public.options.option_id)
-> HashAggregate (cost=13014.18..13017.18 rows=200 width=8) (actual time=297.839..297.879 rows=200 loops=1)
-> Merge Left Join (cost=399.13..11541.43 rows=196366 width=8) (actual time=9.301..230.467 rows=196366 loops=1)
Merge Cond: (public.responses.user_id = uids.user_id)
-> Index Scan using responses_pkey on responses (cost=0.00..8585.75 rows=196366 width=8) (actual time=0.015..121.971 rows=196366 loops=1)
-> Sort (cost=399.13..411.40 rows=4909 width=4) (actual time=9.281..22.044 rows=137645 loops=1)
Sort Key: uids.user_id
Sort Method: quicksort Memory: 420kB
-> CTE Scan on uids (cost=0.00..98.18 rows=4909 width=4) (actual time=3.937..6.549 rows=4860 loops=1)
-> Hash (cost=3.00..3.00 rows=200 width=8) (actual time=0.095..0.095 rows=200 loops=1)
-> Seq Scan on options (cost=0.00..3.00 rows=200 width=8) (actual time=0.007..0.043 rows=200 loops=1)
CTE totals
-> HashAggregate (cost=5.50..8.50 rows=200 width=68) (actual time=298.629..298.640 rows=40 loops=1)
-> CTE Scan on votes (cost=0.00..4.00 rows=200 width=68) (actual time=298.257..298.425 rows=200 loops=1)
CTE poll_options
-> HashAggregate (cost=4.00..4.50 rows=40 width=4) (actual time=0.091..0.101 rows=40 loops=1)
-> Seq Scan on options (cost=0.00..3.00 rows=200 width=4) (actual time=0.005..0.020 rows=200 loops=1)
-> Hash Join (cost=6.95..13.45 rows=40 width=144) (actual time=298.994..299.554 rows=200 loops=1)
Hash Cond: (t.poll_id = v.poll_id)
-> CTE Scan on totals t (cost=0.00..4.00 rows=200 width=68) (actual time=298.632..298.669 rows=40 loops=1)
-> Hash (cost=6.45..6.45 rows=40 width=84) (actual time=0.335..0.335 rows=200 loops=1)
-> Hash Join (cost=1.30..6.45 rows=40 width=84) (actual time=0.140..0.263 rows=200 loops=1)
Hash Cond: (v.poll_id = po.poll_id)
-> CTE Scan on votes v (cost=0.00..4.00 rows=200 width=72) (actual time=0.001..0.030 rows=200 loops=1)
-> Hash (cost=0.80..0.80 rows=40 width=12) (actual time=0.130..0.130 rows=40 loops=1)
-> CTE Scan on poll_options po (cost=0.00..0.80 rows=40 width=12) (actual time=0.093..0.119 rows=40 loops=1)
Total runtime: 300.132 ms