Any postgres DBA type people here?

Coffee Warlord · 10-16-2018, 06:57 PM

Got a sysadmin type question for someone, curious if anyone here fit the bill.

Ryche · 10-16-2018, 07:18 PM

I might be able to help depending on how technical.

bhlloy · 10-16-2018, 07:51 PM

Was never particularly strong in PG but have worn many DBA and Database Engineer hats, so you could try me too depending on how specific it is...

Coffee Warlord · 10-16-2018, 09:11 PM

There's a config setting in postgresql.conf called max_files_per_process. I've been told by a DBA he never uses it, and it's unnecessary. Everything I've read on the intertubes recommends setting it if you're having issues running out of filehandles (which we are, and it's set on the kernel side to something absurdly high).

But if I set that, I'm not entirely certain what it does if you run up against the max_files_per_process limit, and I can't find a straight answer on that.

bhlloy · 10-16-2018, 10:18 PM

Hmmm... what Linux distro? Looking at the source code, it will definitely not start PG if you don't have enough file handles based on that setting (it does some calculations but then takes the minimum of what's available and your max) but like you, I'm having a really hard time finding out what happens if a process hits that limit after startup. And there's no way I know what I'm looking for enough to trawl through thousands of lines of code to try and figure it out.

I guess a better question would be, what happens when you run out of file descriptors? I'm guessing it takes the whole server down and as such, any impact on Postgres processes (my logical guess would be it's going to kill any such process) is probably not as bad as the alternative? I'm making some serious assumptions in all of those statements though. Do you pay for support from anyone?

There's probably also some questions about what you are doing on this server to run out of file descriptors in the first place - that's not a normal situation to be in (obviously) and if PG is causing that, it's almost certainly being used in a very unusual way.

Coffee Warlord · 10-16-2018, 10:32 PM

Debian. It's pretty high load database, somewhere in the realm of probably 50-100k queries per hour, with a lot of row locking/unlocking, and serves up a good 300,000+ inserts daily.

From what we've been able to tell, we hit the descriptor limit under certain times of super increased traffic, and/or if certain locks aren't released in fast time (which is unavoidable some times - gotta wait on other servers to answer). Usually those two things are, surprise surprise, related. Hitting the file descriptor limit is *bad*, but not Take The Entire Server Down bad - it does its damndest to finish serving up requests, and eventually it will repair itself, but a whole shit-ton of requests error out in the meantime, and this is a system that Should Not Die.

And like I said. Everything I've read says "set that value". But yeah, I have zero clue if the cure is worse than the symptom, and this is hardly a system I can just experiment on.

And because corporate bullshit, we have no DBA, we have no approval to hire a DBA, I sure as hell am no DBA, and the one aforementioned quasi consultant we temporarily have, I'm finding him exceedingly useless.

bhlloy · 10-16-2018, 11:31 PM

Can you set up replication to offload that read traffic? That sounds like a potentially very problematic scenario. Having worked for a top 5 e-commerce website, we wouldn't dream of putting any of our single DB's under that kind of mixed traffic load.

I guess I can think of two things I'd do next if you don't have any paid support or expert DBAs on hand - 1) post on the Postgres forums or Stack Overflow or 2) set up some kind of test server where you set the limit really low, put it under some load and wait and see what happens. Probably both.

rjolley · 10-16-2018, 11:45 PM

From what I've read briefly, not setting max_files_per_process will default it to 1000. This is when Postgres will clean up files, otherwise, it relies on the OS. One person set it lower, to 200, and was able to reduce an issue where they were seeing too many files open errors. Maybe in your case, Postgres doesn't need to have a lot of files open but isn't cleaning up after itself. Reducing max_files_per_process could help.

Do you have a test system for this? If not, can you get one? Personally, I hate not having a system where I can test out changes to see what happens, even if that system is one of my servers at home.

And by trade, I'm a DBA, mainly with Oracle, but have supported others, including Postgres, though I'm still picking up the intricacies there.

Coffee Warlord · 10-17-2018, 09:27 AM

We have a test server, but I'm not sure there's a way to replicate the sheer volume in production.

rjolley · 10-17-2018, 12:32 PM

Yeah, a representative load can be hard to replicate. However, depending on what you're testing, if you can generate, say, 1/10th of the load, lowering some parameters may help you see what would happen in some situations. Also, you may be able to see if the changes you make break the environment even at low volumes.

Has your team looked into any load testing tools? LoadRunner or something like that may be too expensive, but can be helpful. Also, there are a few open source load testing frameworks that may help you.

I would also add that testing one change at a time is the way to go. It may take longer, but having to determine which change caused an issue when you've made multiple changes can take a lot of time.

10-16-2018, 07:18 PM	#2
Ryche College Benchwarmer Join Date: Oct 2000 Location: Highlands Ranch, CO, USA	I might be able to help depending on how technical. __________________ Some knots are better left untied.

10-16-2018, 09:11 PM	#4
Coffee Warlord Head Coach Join Date: Oct 2002 Location: Colorado Springs	There's a config setting in postgresql.conf called max_files_per_process. I've been told by a DBA he never uses it, and it's unnecessary. Everything I've read on the intertubes recommends setting it if you're having issues running out of filehandles (which we are, and it's set on the kernel side to something absurdly high). But if I set that, I'm not entirely certain what it does if you run up against the max_files_per_process limit, and I can't find a straight answer on that. Last edited by Coffee Warlord : 10-16-2018 at 09:13 PM.

10-16-2018, 11:31 PM	#7
bhlloy Coordinator Join Date: Nov 2003	Can you set up replication to offload that read traffic? That sounds like a potentially very problematic scenario. Having worked for a top 5 e-commerce website, we wouldn't dream of putting any of our single DB's under that kind of mixed traffic load. I guess I can think of two things I'd do next if you don't have any paid support or expert DBAs on hand - 1) post on the Postgres forums or Stack Overflow or 2) set up some kind of test server where you set the limit really low, put it under some load and wait and see what happens. Probably both. Last edited by bhlloy : 10-16-2018 at 11:31 PM.

10-17-2018, 12:32 PM	#10
rjolley College Starter Join Date: Dec 2000 Location: Roseville, CA	Yeah, a representative load can be hard to replicate. However, depending on what you're testing, if you can generate, say, 1/10th of the load, lowering some parameters may help you see what would happen in some situations. Also, you may be able to see if the changes you make break the environment even at low volumes. Has your team looked into any load testing tools? LoadRunner or something like that may be too expensive, but can be helpful. Also, there are a few open source load testing frameworks that may help you. I would also add that testing one change at a time is the way to go. It may take longer, but having to determine which change caused an issue when you've made multiple changes can take a lot of time. Last edited by rjolley : 10-17-2018 at 12:33 PM.

10-16-2018, 07:51 PM	#3
bhlloy Coordinator Join Date: Nov 2003	Was never particularly strong in PG but have worn many DBA and Database Engineer hats, so you could try me too depending on how specific it is...

10-16-2018, 10:18 PM	#5
bhlloy Coordinator Join Date: Nov 2003	Hmmm... what Linux distro? Looking at the source code, it will definitely not start PG if you don't have enough file handles based on that setting (it does some calculations but then takes the minimum of what's available and your max) but like you, I'm having a really hard time finding out what happens if a process hits that limit after startup. And there's no way I know what I'm looking for enough to trawl through thousands of lines of code to try and figure it out. I guess a better question would be, what happens when you run out of file descriptors? I'm guessing it takes the whole server down and as such, any impact on Postgres processes (my logical guess would be it's going to kill any such process) is probably not as bad as the alternative? I'm making some serious assumptions in all of those statements though. Do you pay for support from anyone? There's probably also some questions about what you are doing on this server to run out of file descriptors in the first place - that's not a normal situation to be in (obviously) and if PG is causing that, it's almost certainly being used in a very unusual way.

10-16-2018, 10:32 PM	#6
Coffee Warlord Head Coach Join Date: Oct 2002 Location: Colorado Springs	Debian. It's pretty high load database, somewhere in the realm of probably 50-100k queries per hour, with a lot of row locking/unlocking, and serves up a good 300,000+ inserts daily. From what we've been able to tell, we hit the descriptor limit under certain times of super increased traffic, and/or if certain locks aren't released in fast time (which is unavoidable some times - gotta wait on other servers to answer). Usually those two things are, surprise surprise, related. Hitting the file descriptor limit is bad, but not Take The Entire Server Down bad - it does its damndest to finish serving up requests, and eventually it will repair itself, but a whole shit-ton of requests error out in the meantime, and this is a system that Should Not Die. And like I said. Everything I've read says "set that value". But yeah, I have zero clue if the cure is worse than the symptom, and this is hardly a system I can just experiment on. And because corporate bullshit, we have no DBA, we have no approval to hire a DBA, I sure as hell am no DBA, and the one aforementioned quasi consultant we temporarily have, I'm finding him exceedingly useless.

10-16-2018, 11:45 PM	#8
rjolley College Starter Join Date: Dec 2000 Location: Roseville, CA	From what I've read briefly, not setting max_files_per_process will default it to 1000. This is when Postgres will clean up files, otherwise, it relies on the OS. One person set it lower, to 200, and was able to reduce an issue where they were seeing too many files open errors. Maybe in your case, Postgres doesn't need to have a lot of files open but isn't cleaning up after itself. Reducing max_files_per_process could help. Do you have a test system for this? If not, can you get one? Personally, I hate not having a system where I can test out changes to see what happens, even if that system is one of my servers at home. And by trade, I'm a DBA, mainly with Oracle, but have supported others, including Postgres, though I'm still picking up the intricacies there.

10-17-2018, 09:27 AM	#9
Coffee Warlord Head Coach Join Date: Oct 2002 Location: Colorado Springs	We have a test server, but I'm not sure there's a way to replicate the sheer volume in production.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)