502 Proxy Error after migrating from Fine to Gaprindashvili

gaprindashvili

#21

@Sven_Jansen I believe apache times out a request after 2 minutes so grep your UI and web service worker(UI needs to hit /api to do things) PIDs from the production.log. If all puma threads (5 per ui/web service worker) are tied up or a single request takes more than 2 minutes, you’ll get 502 errors.


#22

Okay i just loggged in and use your updated grep command, i can find several entries just right after logging in…

zgrep -E β€˜Started|(Completed|Successfully)’ production.log | grep -Eo β€œ#[0-9]+[:a-f0-9]+” | sort | uniq -c
16 #2106:184c140
10 #2106:184c30c
14 #2106:184c99c
20 #2106:184cd98
12 #2106:184d1a8
6 #2153:4514024
6 #2153:4514808
4 #2153:4514a9c
4 #2153:4517580
6 #2153:4517a44


#23

@Sven_Jansen try a similar grep but look for the start and end and see if you have any long requests for those pids:

zgrep -E '(2106|2153)Started|(Completed|Successfully)' production.log


#24
[----] I, [2018-06-29T18:02:33.862614 #2106:184cd98]  INFO -- : Completed 200 OK in 14ms (Views: 7.6ms | ActiveRecord: 0.8ms)

[----] I, [2018-06-29T18:02:51.604546 #2106:184d1a8] INFO – : Completed 200 OK in 11ms (Views: 5.6ms | ActiveRecord: 0.6ms)
[----] I, [2018-06-29T18:03:11.076083 #2106:184c30c] INFO – : Completed 200 OK in 54ms (Views: 0.5ms | ActiveRecord: 10.8ms)
[----] I, [2018-06-29T18:03:11.407593 #2106:184c30c] INFO – : Completed 200 OK in 194ms (Views: 142.1ms | ActiveRecord: 24.7ms)
[----] I, [2018-06-29T18:03:12.314553 #2153:4517a44] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:03:12.318670 #2106:184c140] INFO – : Completed 200 OK in 1ms (Views: 0.7ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.453171 #2106:184c99c] INFO – : Completed 200 OK in 1ms (Views: 0.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.496238 #2106:184c140] INFO – : Completed 200 OK in 23ms (Views: 0.3ms | ActiveRecord: 3.0ms)
[----] I, [2018-06-29T18:03:12.510054 #2106:184cd98] INFO – : Completed 200 OK in 1ms (Views: 0.6ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.569911 #2106:184cd98] INFO – : Completed 200 OK in 1ms (Views: 0.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.589562 #2106:184c99c] INFO – : Completed 200 OK in 36ms (Views: 0.5ms | ActiveRecord: 19.7ms)
[----] I, [2018-06-29T18:03:12.593542 #2106:184c140] INFO – : Completed 200 OK in 40ms (Views: 0.4ms | ActiveRecord: 22.7ms)
[----] I, [2018-06-29T18:03:12.666187 #2106:184cd98] INFO – : Completed 200 OK in 28ms (Views: 0.4ms | ActiveRecord: 11.3ms)
[----] I, [2018-06-29T18:03:12.675919 #2106:184c99c] INFO – : Completed 200 OK in 17ms (Views: 16.8ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.689404 #2106:184c140] INFO – : Completed 200 OK in 31ms (Views: 0.3ms | ActiveRecord: 5.8ms)
[----] I, [2018-06-29T18:03:12.689625 #2106:184d1a8] INFO – : Completed 200 OK in 12ms (Views: 11.8ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.730478 #2106:184cd98] INFO – : Completed 200 OK in 1ms (Views: 0.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.750139 #2106:184d1a8] INFO – : Completed 200 OK in 3ms (Views: 2.6ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.772700 #2106:184c99c] INFO – : Completed 200 OK in 31ms (Views: 0.3ms | ActiveRecord: 11.4ms)
[----] I, [2018-06-29T18:03:12.794241 #2106:184c140] INFO – : Completed 200 OK in 35ms (Views: 0.4ms | ActiveRecord: 12.7ms)
[----] I, [2018-06-29T18:03:12.812284 #2106:184cd98] INFO – : Completed 200 OK in 20ms (Views: 0.8ms | ActiveRecord: 2.6ms)
[----] I, [2018-06-29T18:03:12.821941 #2106:184d1a8] INFO – : Completed 200 OK in 1ms (Views: 0.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.852657 #2106:184c140] INFO – : Completed 200 OK in 1ms (Views: 0.8ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:12.861915 #2106:184cd98] INFO – : Completed 200 OK in 1ms (Views: 0.7ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:18.800490 #2106:184c30c] INFO – : Completed 200 OK in 221ms (Views: 132.6ms | ActiveRecord: 14.4ms)
[----] I, [2018-06-29T18:03:19.680920 #2106:184c99c] INFO – : Completed 200 OK in 1ms (Views: 0.6ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:19.701942 #2153:4514024] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:03:20.464955 #2106:184c140] INFO – : Completed 200 OK in 346ms (Views: 7.4ms | ActiveRecord: 29.8ms)
[----] I, [2018-06-29T18:03:20.467957 #2106:184cd98] INFO – : Completed 200 OK in 3ms (Views: 2.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:20.517662 #2106:184cd98] INFO – : Completed 200 OK in 1ms (Views: 0.5ms | ActiveRecord: 0.0ms)
[----] I, [2018-06-29T18:03:29.179953 #2153:4514808] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:03:41.263502 #2106:184c30c] INFO – : Completed 200 OK in 90ms (Views: 9.6ms | ActiveRecord: 4.5ms)
[----] I, [2018-06-29T18:03:41.638065 #2153:4517580] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:03:41.935035 #2106:184c99c] INFO – : Completed 200 OK in 433ms (Views: 7.8ms | ActiveRecord: 37.6ms)
[----] I, [2018-06-29T18:03:44.372358 #2106:184d1a8] INFO – : Completed 200 OK in 95ms (Views: 1.3ms | ActiveRecord: 13.8ms)
[----] I, [2018-06-29T18:03:46.503227 #2106:184c140] INFO – : Completed 200 OK in 12ms (Views: 0.5ms | ActiveRecord: 1.0ms)
[----] I, [2018-06-29T18:03:49.700941 #2153:4514a9c] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:03:50.477289 #2106:184cd98] INFO – : Completed 200 OK in 1822ms (Views: 1.1ms | ActiveRecord: 82.9ms)
[----] I, [2018-06-29T18:03:53.164469 #2106:184c30c] INFO – : Completed 200 OK in 281ms (Views: 98.9ms | ActiveRecord: 19.0ms)
[----] I, [2018-06-29T18:03:54.166204 #2106:184c99c] INFO – : Completed 200 OK in 259ms (Views: 120.8ms | ActiveRecord: 17.4ms)
[----] I, [2018-06-29T18:03:55.604659 #2106:184d1a8] INFO – : Completed 200 OK in 622ms (Views: 371.9ms | ActiveRecord: 19.4ms)
[----] I, [2018-06-29T18:03:58.667404 #2153:4517a44] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:04:08.401851 #2153:4514024] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:04:18.815553 #2153:4514808] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:04:29.808237 #2153:4517580] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:04:41.321950 #2153:4514a9c] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:04:53.330078 #2153:4517a44] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:05:05.762953 #2153:4514024] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)
[----] I, [2018-06-29T18:05:18.591062 #2153:4514808] INFO – : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)


#25

Keep in mind, a UI login requires:

  1. web service worker running so the UI can do requests on /api for authentication and other things
  2. obviously, ui worker running
  3. memcached running if using session_store: cache in advanced settings
  4. database running and properly configured
  5. apache/httpd running (we start it automatically, it could fail to start if configuration is invalid)
    …
    Maybe other things

#26

I didnt changed anything on the appliance except IP , FreeIPA Authentication, DNS Name, Changed SSL Certificates for the Web Interface and added my first Hypervisor which is VMware vSphere 6.0 with around 120 VMs. I put a little bit more memory in it because 8GB was not enough, currently running 12GB.


#27

@Sven_Jansen I don’t see anything in those log lines that are relevant. Maybe grep-i timeout in the ssl_error.log. This should live in /var/www/miq/vmdb/log/apache/ssl_error.log . From this log, get the ports and times that are seeing timeouts.

Then, grep the top_output.log for the failing port to find the process id that was using that port at that time. For example, if the timeout was on port 3000:

grep -E "3000|timesync|top" top_output.log

Once you know the process id from top, you can grep the production log for just that process id (12345 for example).

grep "12345" production.log


#28

I found that production.log ist quite small and easy to review because no one is working with it atm. the only strange thing i found is upgrade errors and unauthorized connections.

[----] E, [2018-06-29T18:04:41.322613 #2153:8678448] ERROR – : An unauthorized connection attempt was rejected
[----] E, [2018-06-29T18:04:41.322806 #2153:8678448] ERROR – : Failed to upgrade to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: WebSocket)

I click something, see that rendering stuff happing without issues and after sitting around that reject und upgrade errors appear in production log.

The other stuff that looks good is just rendering stuff like dashboard and provision window i opened for testing.


#29

Checked session_store: cache
Checked memcached process is runnig
Checked database… running, disk io/memory okay.
Checked httpd running

currently checking other logs for strange stuff, but so far production.log is not helpful except the two errors i mentioned.


#30

@Sven_Jansen we’ve seen a 502 error in the UI because the web service worker deadlocks. I’m not sure what versions are affected besides the master branch and gaprindashvili. Here is the proposed workaround for now: https://github.com/ManageIQ/manageiq-api/pull/416


#31

Hi Joe,

i applied your patch for β€œlib/api/environment.rb” and it solved my issue, i see no more deadlocks or 502 errors.

Thank you very much for your help!

Sven


#32

@Sven_Jansen that’s great to hear! I’ll try to update this thread when we figure out a solution instead of the workaround above. Either way, it should be transparent.