I have been working with pg_stat_statements extension to PG and it give us a way to see the actual SQL statements being executed by lemmy_server and the number of times they are being called.
This has less overhead than cranking up logging and several cloud computing services enable it by default (example) - so I don’t believe it will have a significant slow down of the server.
A DATABASE RESTART WILL BE REQUIRED
It does require that PostgreSQL be restarted. Which can take 10 or 15 seconds, typically.
Debian / Ubuntu install steps
https://pganalyze.com/docs/install/self_managed/02_enable_pg_stat_statements_deb
Following the conventions of “Lemmy from Scratch” server install commands:
sudo -iu postgres psql -c "ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';"
Followed by a restart of the PostgreSQL service.
Update July 24, 2023
Some major stored procedure SQL problems were overlooked until 2 days ago. I submitted revised statements to fix a massive write operation on every single post and comment creation on a local site. The site_aggregates table… every row modified instead of a single row. https://github.com/LemmyNet/lemmy/pull/3704
I was curious why pg_stat_statements didn’t draw more attention to the INSERT statements hitting so many rows, and I found out that by default it does not take into account stored procedure execution! https://stackoverflow.com/questions/56741860/pg-stat-activity-how-to-see-current-activity-inside-a-running-stored-procedure
pg_stat_statements.track = all
Now I’m seeing far more activity than I have been looking at for the past 2 months… the stored procedure statements are showing up!
Install steps: https://gist.github.com/rcanepa/535163dc249539912c25
@roundsparrow@lemmy.ml If you find queries that are slow could you post the EXPLAIN ANALYZE here or send them to me? You can do it manually or maybe use https://www.postgresql.org/docs/current/auto-explain.html
Once the extension is installed and enabled, you can reset the stats whenever you want and it will start to build a new list of queries and I assume reset the average execution time.
I found a curious query that seems to take 10 or 11 seconds on my server:
SELECT "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "person_aggregates"."id", "person_aggregates"."person_id", "person_aggregates"."post_count", "person_aggregates"."post_score", "person_aggregates"."comment_count", "person_aggregates"."comment_score" FROM ("person" INNER JOIN "person_aggregates" ON ("person_aggregates"."person_id" = "person"."id")) WHERE (("person"."admin" = $1) AND ("person"."deleted" = $2)) ORDER BY "person"."published"
This query is being called over 1000 times an hour with my instance only really doing incoming Federation content. Should this be optimized by doing a subSELECT on the WHERE clause before doing the INNER JOIN?
IMPORTANT NOTE: I am rusty on my PostgreSQL, it’s been over 10 years since I designed and ran a major time-critical website with it. That is not “SECONDS”, it is “MILLISECONDS”, but it is still a slow query being called frequently.