502 Proxy Error after migrating from Fine to Gaprindashvili

gaprindashvili

#21

@Sven_Jansen I believe apache times out a request after 2 minutes so grep your UI and web service worker(UI needs to hit /api to do things) PIDs from the production.log. If all puma threads (5 per ui/web service worker) are tied up or a single request takes more than 2 minutes, you’ll get 502 errors.


#22

Okay i just loggged in and use your updated grep command, i can find several entries just right after logging in…

zgrep -E β€˜Started|(Completed|Successfully)’ production.log | grep -Eo β€œ#[0-9]+[:a-f0-9]+” | sort | uniq -c
16 #2106:184c140
10 #2106:184c30c
14 #2106:184c99c
20 #2106:184cd98
12 #2106:184d1a8
6 #2153:4514024
6 #2153:4514808
4 #2153:4514a9c
4 #2153:4517580
6 #2153:4517a44


#23

@Sven_Jansen try a similar grep but look for the start and end and see if you have any long requests for those pids:

zgrep -E '(2106|2153)Started|(Completed|Successfully)' production.log


#24
[----] I, [2018-06-29T18:02:33.862614 #2106:184cd98]  INFO -- : Completed 200 OK in 14ms (Views: 7.6ms | ActiveRecord: 0.8ms)

[----] I, [2018-06-29T18:02:51.604546 #2106:184d1a8] INFO – : Completed 200 OK in 11ms (Views: 5.6ms | ActiveRecord: 0.6ms)
[----] I, [2018-06-29T18:03:11.076083 #2106:184c30c] INFO – : Completed 200 OK in 54ms (Views: 0.5ms | ActiveRecord: 10.8ms)
[----] I, [2018-06-29T18:03:11.407593 #2106:184c30c] INFO – : Completed 200 OK in 194ms (Views: 142.1ms | ActiveRecord: 24.7ms)
[----] I, [2018-06-29T18:03:12.314553 #2153:4517a44] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:03:12.318670 #2106:184c140] INFO – : Completed 200 OK in 1ms (Views: 0.7ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.453171 #2106:184c99c] INFO – : Completed 200 OK in 1ms (Views: 0.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.496238 #2106:184c140] INFO – : Completed 200 OK in 23ms (Views: 0.3ms | ActiveRecord: 3.0ms)
[----] I, [2018-06-29T18:03:12.510054 #2106:184cd98] INFO – : Completed 200 OK in 1ms (Views: 0.6ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.569911 #2106:184cd98] INFO – : Completed 200 OK in 1ms (Views: 0.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.589562 #2106:184c99c] INFO – : Completed 200 OK in 36ms (Views: 0.5ms | ActiveRecord: 19.7ms)
[----] I, [2018-06-29T18:03:12.593542 #2106:184c140] INFO – : Completed 200 OK in 40ms (Views: 0.4ms | ActiveRecord: 22.7ms)
[----] I, [2018-06-29T18:03:12.666187 #2106:184cd98] INFO – : Completed 200 OK in 28ms (Views: 0.4ms | ActiveRecord: 11.3ms)
[----] I, [2018-06-29T18:03:12.675919 #2106:184c99c] INFO – : Completed 200 OK in 17ms (Views: 16.8ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.689404 #2106:184c140] INFO – : Completed 200 OK in 31ms (Views: 0.3ms | ActiveRecord: 5.8ms)
[----] I, [2018-06-29T18:03:12.689625 #2106:184d1a8] INFO – : Completed 200 OK in 12ms (Views: 11.8ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.730478 #2106:184cd98] INFO – : Completed 200 OK in 1ms (Views: 0.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.750139 #2106:184d1a8] INFO – : Completed 200 OK in 3ms (Views: 2.6ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.772700 #2106:184c99c] INFO – : Completed 200 OK in 31ms (Views: 0.3ms | ActiveRecord: 11.4ms)
[----] I, [2018-06-29T18:03:12.794241 #2106:184c140] INFO – : Completed 200 OK in 35ms (Views: 0.4ms | ActiveRecord: 12.7ms)
[----] I, [2018-06-29T18:03:12.812284 #2106:184cd98] INFO – : Completed 200 OK in 20ms (Views: 0.8ms | ActiveRecord: 2.6ms)
[----] I, [2018-06-29T18:03:12.821941 #2106:184d1a8] INFO – : Completed 200 OK in 1ms (Views: 0.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.852657 #2106:184c140] INFO – : Completed 200 OK in 1ms (Views: 0.8ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.861915 #2106:184cd98] INFO – : Completed 200 OK in 1ms (Views: 0.7ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:18.800490 #2106:184c30c] INFO – : Completed 200 OK in 221ms (Views: 132.6ms | ActiveRecord: 14.4ms)
[----] I, [2018-06-29T18:03:19.680920 #2106:184c99c] INFO – : Completed 200 OK in 1ms (Views: 0.6ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:19.701942 #2153:4514024] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:03:20.464955 #2106:184c140] INFO – : Completed 200 OK in 346ms (Views: 7.4ms | ActiveRecord: 29.8ms)
[----] I, [2018-06-29T18:03:20.467957 #2106:184cd98] INFO – : Completed 200 OK in 3ms (Views: 2.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:20.517662 #2106:184cd98] INFO – : Completed 200 OK in 1ms (Views: 0.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:29.179953 #2153:4514808] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:03:41.263502 #2106:184c30c] INFO – : Completed 200 OK in 90ms (Views: 9.6ms | ActiveRecord: 4.5ms)
[----] I, [2018-06-29T18:03:41.638065 #2153:4517580] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:03:41.935035 #2106:184c99c] INFO – : Completed 200 OK in 433ms (Views: 7.8ms | ActiveRecord: 37.6ms)
[----] I, [2018-06-29T18:03:44.372358 #2106:184d1a8] INFO – : Completed 200 OK in 95ms (Views: 1.3ms | ActiveRecord: 13.8ms)
[----] I, [2018-06-29T18:03:46.503227 #2106:184c140] INFO – : Completed 200 OK in 12ms (Views: 0.5ms | ActiveRecord: 1.0ms)
[----] I, [2018-06-29T18:03:49.700941 #2153:4514a9c] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:03:50.477289 #2106:184cd98] INFO – : Completed 200 OK in 1822ms (Views: 1.1ms | ActiveRecord: 82.9ms)
[----] I, [2018-06-29T18:03:53.164469 #2106:184c30c] INFO – : Completed 200 OK in 281ms (Views: 98.9ms | ActiveRecord: 19.0ms)
[----] I, [2018-06-29T18:03:54.166204 #2106:184c99c] INFO – : Completed 200 OK in 259ms (Views: 120.8ms | ActiveRecord: 17.4ms)
[----] I, [2018-06-29T18:03:55.604659 #2106:184d1a8] INFO – : Completed 200 OK in 622ms (Views: 371.9ms | ActiveRecord: 19.4ms)
[----] I, [2018-06-29T18:03:58.667404 #2153:4517a44] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:04:08.401851 #2153:4514024] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:04:18.815553 #2153:4514808] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:04:29.808237 #2153:4517580] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:04:41.321950 #2153:4514a9c] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:04:53.330078 #2153:4517a44] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:05:05.762953 #2153:4514024] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:05:18.591062 #2153:4514808] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)


#25

Keep in mind, a UI login requires:

  1. web service worker running so the UI can do requests on /api for authentication and other things
  2. obviously, ui worker running
  3. memcached running if using session_store: cache in advanced settings
  4. database running and properly configured
  5. apache/httpd running (we start it automatically, it could fail to start if configuration is invalid)
    …
    Maybe other things

#26

I didnt changed anything on the appliance except IP , FreeIPA Authentication, DNS Name, Changed SSL Certificates for the Web Interface and added my first Hypervisor which is VMware vSphere 6.0 with around 120 VMs. I put a little bit more memory in it because 8GB was not enough, currently running 12GB.


#27

@Sven_Jansen I don’t see anything in those log lines that are relevant. Maybe grep-i timeout in the ssl_error.log. This should live in /var/www/miq/vmdb/log/apache/ssl_error.log . From this log, get the ports and times that are seeing timeouts.

Then, grep the top_output.log for the failing port to find the process id that was using that port at that time. For example, if the timeout was on port 3000:

grep -E "3000|timesync|top" top_output.log

Once you know the process id from top, you can grep the production log for just that process id (12345 for example).

grep "12345" production.log


#28

I found that production.log ist quite small and easy to review because no one is working with it atm. the only strange thing i found is upgrade errors and unauthorized connections.

[----] E, [2018-06-29T18:04:41.322613 #2153:8678448] ERROR – : An unauthorized connection attempt was rejected
[----] E, [2018-06-29T18:04:41.322806 #2153:8678448] ERROR – : Failed to upgrade to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)

I click something, see that rendering stuff happing without issues and after sitting around that reject und upgrade errors appear in production log.

The other stuff that looks good is just rendering stuff like dashboard and provision window i opened for testing.


#29

Checked session_store: cache
Checked memcached process is runnig
Checked database… running, disk io/memory okay.
Checked httpd running

currently checking other logs for strange stuff, but so far production.log is not helpful except the two errors i mentioned.


#30

@Sven_Jansen we’ve seen a 502 error in the UI because the web service worker deadlocks. I’m not sure what versions are affected besides the master branch and gaprindashvili. Here is the proposed workaround for now: https://github.com/ManageIQ/manageiq-api/pull/416


#31

Hi Joe,

i applied your patch for β€œlib/api/environment.rb” and it solved my issue, i see no more deadlocks or 502 errors.

Thank you very much for your help!

Sven


#32

@Sven_Jansen that’s great to hear! I’ll try to update this thread when we figure out a solution instead of the workaround above. Either way, it should be transparent.


#33

The 502 Bad Gateway Error is an indication that something has gone wrong within the server of your application, as opposed to the client side request.

How to Fix a 502 Error

Perform a hard-refresh in your browser. On Macs, this is done by pressing Cmd + Shift + R.

This problem is due to poor IP communication between back-end computers, possibly including the Web server at the site you are trying to visit. Before analysing this problem, you should clear your browser cache completely.

If you get this problem for only some of the Web sites you try to visit then it is likely to be a problem at those sites i.e. one of their pieces of equipment is failing/overloaded. Contact the people at those sites.

If you are surfing the Web and see this problem for all Web sites you try to visit, then either 1) your ISP has a major equipment failure/overload or 2) there is something wrong with your internal Internet connection e.g. your firewall is not functioning correctly. In the first case, only your ISP can help you. In the second case, you need to fix whatever it is that is preventing you reaching the Internet.

In some cases, this error caused by low computer hard disk space, you can go check the free space of your computer hard disk. If there is really not enough free space left, clean your computer hard disk well.

Start your browser in Safe Mode. Running a browser in Safe Mode means to run it with default settings and without add-ons or extensions, including toolbars.

If your web application is configured to listen on a socket, ensure that the socket exists in the correct location and that it has the proper permissions

Finally, restart your computer/networking equipment. Some temporary issues with your computer and how it’s connecting to your network could be causing 502 errors, especially if you’re seeing the error on more than one website. In these cases, a restart would help.


#34

what sort of random shit is this? a bot?