Closed Bug 968838 Opened 10 years ago Closed 5 years ago

Crash on australis first-run when right side of window is off screen

Categories

(Core :: Layout, defect)

30 Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox29 + fixed
firefox30 + fixed

People

(Reporter: Pike, Assigned: mstange)

References

(Depends on 1 open bug, )

Details

(Keywords: crash, Whiteboard: [Australis:P-])

Attachments

(5 files)

I reproducably crash when opening https://www.mozilla.org/en-US/firefox/29.0a2/whatsnew/ when the right side of the window is off screen, i.e., where the mixed popup would show up.

If the window is fully visible, the page works fine.

I've got three stack traces, all useless.

https://crash-stats.mozilla.com/report/index/2a5652cf-ad07-4a8e-8b90-33a6f2140206
https://crash-stats.mozilla.com/report/index/bp-ce6428bf-8639-40a9-9443-aa6b12140206
https://crash-stats.mozilla.com/report/index/bp-6d090c2c-1a10-4eae-b9e9-927a42140206

As there's no stack, I don't have a good idea where to file.

I didn't have a chance to test on non-mac.
Can you catch it in a debugger?
Flags: needinfo?(l10n)
I tested this on OS X 10.8 and couldn't get a crash. I tried both just moving the right side of the window offscreen (the popup adjusted to be on screen) and with the right side of the window on an external monitor.

What version of OS X did you use?
I'm on 10.8.5. I'll try to reproduce in a debug build later today.
Flags: needinfo?(l10n)
(In reply to Axel Hecht [:Pike] from comment #4)
> Created attachment 8372266 [details]
> Problem report from OSX

That's me reproducing this crasher in a fresh debug build from today. Does this help? Is there something I can help with? I have to say though that objective c is code I never looked at in earnest, so I'm not sure how much help I'll be.
Thank you that helps a lot.

Looks like we are hitting a stack overflow, hitting a loop where we move a window, which triggers a viewWillDraw from the OS, which makes us update our current rendering state, which moves any windows which need to be moved, which triggers a viewWillDraw, and repeat.

Markus, haven't we run into this kind of problem before? (And by before I mean in the past 5 or so years)
Flags: needinfo?(mstange)
Blocks: fx-UITour
Keywords: crash
Whiteboard: [Australis:P2]
In the absence of being able to reproduce this myself, Axel, do you have the ability to make your own builds? Or use a custom build with some logging?
Flags: needinfo?(l10n)
Talked with Axel on irc, got some stacks from gdb that I need to analyze. I will upload them here so they don't get lost.
Flags: needinfo?(l10n)
I was just commenting that I can reproduce on 10.9 but I see that you got the info from Axel.

In a clean profile, I position the window about half-way off-screen on the right side of my primary display (with my secondary display on the right). Opening the URL for the tour then causes a crash most of the time.
(In reply to Timothy Nikkel (:tn) from comment #6)
> Markus, haven't we run into this kind of problem before? (And by before I
> mean in the past 5 or so years)

Indeed, in bug 708278 and bug 788189. But those were caused by the context menu offset... I can't think of a reason that would cause us to oscillate between two completely different rects, which is what attachment 8378668 [details] seems to show (and aHeight=2880 really seems a bit much).
Flags: needinfo?(mstange)
(In reply to Markus Stange [:mstange] from comment #12)
> (and aHeight=2880 really seems a bit much).

I'm testing this on a 24' monitor in portrait mode, height arguments can be big. The browser window wasn't, though.
(In reply to Markus Stange [:mstange] from comment #12)
> (In reply to Timothy Nikkel (:tn) from comment #6)
> > Markus, haven't we run into this kind of problem before? (And by before I
> > mean in the past 5 or so years)
> 
> Indeed, in bug 708278 and bug 788189. But those were caused by the context
> menu offset... I can't think of a reason that would cause us to oscillate
> between two completely different rects, which is what attachment 8378668 [details]
> [details] seems to show (and aHeight=2880 really seems a bit much).

Actually, it seems like it starts out with quite small parameters and does a "multiply by 3, divide by 2" sequence (swapping between two screen positions), presumably until it hits either window or screen constraints. Here's a grep through the loop for CocoaWindow::DoResize:

#42397 aX=1568.5, aY=50, aWidth=600, aHeight=960,
#42439 aX=4706, aY=150, aWidth=1800, aHeight=2880,
#42462 aX=1568.5, aY=50, aWidth=600, aHeight=960,
#42504 aX=4706, aY=150, aWidth=1800, aHeight=2880,
#42527 aX=1568.5, aY=50, aWidth=600, aHeight=960,
#42569 aX=4706, aY=150, aWidth=1800, aHeight=2880,
#42592 aX=1568.5, aY=50, aWidth=600, aHeight=960,
#42634 aX=4706, aY=150, aWidth=1800, aHeight=2367,
#42657 aX=1568.5, aY=50, aWidth=600, aHeight=789,
#42699 aX=4706, aY=150, aWidth=1476, aHeight=1578,
#42722 aX=1568.5, aY=50, aWidth=492, aHeight=526,
#42764 aX=4706, aY=150, aWidth=984, aHeight=1052,
#42787 aX=1568.5, aY=50, aWidth=328, aHeight=350.5,
#42829 aX=4706, aY=150, aWidth=656, aHeight=701,
#42852 aX=1568.5, aY=50, aWidth=218.5, aHeight=233.5,
#42894 aX=4706, aY=150, aWidth=437, aHeight=467,
#42917 aX=1568.5, aY=50, aWidth=145.5, aHeight=155.5,


From the other attachment, it's clear we started at:

#43177 0x000000010305e94b in nsCocoaWindow::DoResize (this=0x126a47820, aX=1560.5, aY=111.5, aWidth=23, aHeight=23, aRepaint=true, aConstrainToCurrentScreen=false) at nsCocoaWindow.mm:1373 .

Which looks like it's probably the circular highlight on the main panel menu?
This sounds pretty serious (especially since all of our users will be exposed to this when Australis is released), and it would likely be a startup crash, so P1.

Timothy, can you own this? (I will assign assuming yes :), feel free to reassign if you can't). Do note that beta uplift is just a few short weeks away.
Assignee: nobody → tnikkel
Whiteboard: [Australis:P2] → [Australis:P1]
I don't know if I will have cycles for this before the current train ships, so if you need this you might want to find another owner.
(In reply to Timothy Nikkel (:tn) from comment #16)
> I don't know if I will have cycles for this before the current train ships,
> so if you need this you might want to find another owner.

Could you suggest someone who is sufficiently at home in this code to stand a chance of fixing it? :-)
Flags: needinfo?(tnikkel)
Perhaps one of mats, smichaud, enndeakin, mstange, jfkthame. It seems to be an issue with retina displays, so someone who has one can probably reproduce and get to the bottom quicker (unlike myself).
Flags: needinfo?(tnikkel)
I'll take it.
Assignee: tnikkel → mstange
Any updates on this, mstange? We're quickly approaching Beta uplift point...
Flags: needinfo?(mstange)
I don't have a patch yet, but I know roughly what the problem is.

There are at least two problems here:
 1. The fact that nsCocoaWindow::DoResize attempts the widget to the target screen.
 2. The mismatch of scale factors between a popup widget and its parent window.

I reproduced the crash like this: I connected a 1920x1080 external, non-Retina monitor to my Retina Macbook. I made the external monitor the main screen, and positioned the internal Retina screen to the left hand side of it. Now the screens were sized and positioned like this in screen coordinates:
 - main screen, external, non-Retina, positioned at 0,0 and sized 1920x1080
 - secondary screen, internal, Retina, positioned at -1440,0 and sized 1440x900

Now I opened the Firefox window on the external screen, and moved it to the right so that the hamburger button was off-screen. Then I navigated to https://www.mozilla.org/en-US/firefox/29.0a2/whatsnew/ and watched the crash.

With the window position / size in my test, nsView::DoResetWidgetBounds wants to resize the circular highlight popup to this rect: 2018, 124, 46, 46. This rect is off the screen. So the first call to nsCocoaWindow::DoResize constrains the target rect to the main screen, and ends up with 1874, 124, 46, 46. During the resize call, viewWillDraw is called, which ends up in nsView::DoResetWidgetBounds again, and notices that the popup is not at the position that it was asked to be at. So nsCocoaWindow::Move is called and the popup is moved properly off screen, to 2018, 124, 46, 46.
Now what happens is that the popup's backing scale factor changes. It's not on any screen, which makes it default to a scale factor of 2 (I don't know why), so now its mBounds are doubled (4036, 248, 92, 92). When nsView::DoResetWidgetBounds is called the next time, it compares the widget's client bounds, which are now 4036, 248, 92, 92, to the intended bounds (still 2018, 124, 46, 46), doesn't notice that the mismatch is due to backing scale factor differences, and calls ResizeClient again. This brings us back to the beginning and we enter an infinite loop (growing the stack all the time).
Flags: needinfo?(mstange)
(In reply to Markus Stange [:mstange] from comment #21)
> I don't have a patch yet, but I know roughly what the problem is.
> 
> There are at least two problems here:
>  1. The fact that nsCocoaWindow::DoResize attempts the widget to the target
> screen.
>  2. The mismatch of scale factors between a popup widget and its parent
> window.
> 
> I reproduced the crash like this: I connected a 1920x1080 external,
> non-Retina monitor to my Retina Macbook. I made the external monitor the
> main screen, and positioned the internal Retina screen to the left hand side
> of it. Now the screens were sized and positioned like this in screen
> coordinates:
>  - main screen, external, non-Retina, positioned at 0,0 and sized 1920x1080
>  - secondary screen, internal, Retina, positioned at -1440,0 and sized
> 1440x900
> 
> Now I opened the Firefox window on the external screen, and moved it to the
> right so that the hamburger button was off-screen. Then I navigated to
> https://www.mozilla.org/en-US/firefox/29.0a2/whatsnew/ and watched the crash.
> 
> With the window position / size in my test, nsView::DoResetWidgetBounds
> wants to resize the circular highlight popup to this rect: 2018, 124, 46,
> 46. This rect is off the screen. So the first call to
> nsCocoaWindow::DoResize constrains the target rect to the main screen, and
> ends up with 1874, 124, 46, 46. During the resize call, viewWillDraw is
> called, which ends up in nsView::DoResetWidgetBounds again, and notices that
> the popup is not at the position that it was asked to be at. So
> nsCocoaWindow::Move is called and the popup is moved properly off screen, to
> 2018, 124, 46, 46.

Dumb question: can we make DoResetWidgetBounds be aware of 'off screen' and not try to move things off screen?
(In reply to :Gijs Kruitbosch from comment #22)
> Dumb question: can we make DoResetWidgetBounds be aware of 'off screen' and
> not try to move things off screen?

We could do that, but then the highlight circle would no longer be over the thing that it's supposed to highlight. I think our case is a very valid use case for placing popups partly or completely offscreen.

Incidentally, the highlighter popup (#UITourHighlightContainer) has flip="none" set on it, which means "don't attempt to keep this panel on-screen".

For other flip values, nsMenuPopupFrame.cpp already takes care of keeping the popup on-screen, so DoResetWidgetBounds shouldn't need to handle it.
This fixes the endless recursion because nsView::DoResetWidgetBounds now no longer needs to move the popup to the right place, so it only calls Resize; and even though curBounds in newBounds still differ due to HiDPI scaling, recursion is stopped at the second Resize call because nsCocoaWindow::DoResize exits early (because newFrame == [mWindow frame]).

Constraining popups to the screen shouldn't be needed in nsCocoaWindow::Resize because nsMenuPopupFrame.cpp should already have done all necessary constrain-to-screen calculations. And any constraining in nsCocoaWindow::Resize will be reverted by nsView::DoResetWidgetBounds in another call to nsCocoaWindow::Move.
Attachment #8386772 - Flags: review?(enndeakin)
In case there are other ways to hit an infinite recursion here, this additional safety measure should at least prevent us from crashing.
Attachment #8386773 - Flags: review?(smichaud)
Comment on attachment 8386772 [details] [diff] [review]
part 1: don't constrain popups in nsCocoaWindow::DoResize

This seems fine for nsMenuPopupFrame. There are two other cases where eWindowType_popup is used (html:select and the popup flag to window.open) where this might be needed but I'm not familiar with the former enough to know.
(In reply to Neil Deakin from comment #26)
> html:select

Hmm, looks like we don't constrain those to the screen at all.
In any case, this also sizes its widget through an nsView, so it goes through nsView::DoResetWidgetBounds, so it probably does the same resize->move dance as XUL popups.

> and the popup flag to window.open

Huh! I didn't know about that one. I'll do some tests.
(In reply to Markus Stange [:mstange] from comment #27)
> Huh! I didn't know about that one. I'll do some tests.

I think we probably should end up removing support for the flag entirely. It's not documented and I think it is only used for the alert notifications.
(In reply to Neil Deakin from comment #28)
> (In reply to Markus Stange [:mstange] from comment #27)
> > Huh! I didn't know about that one. I'll do some tests.
> 
> I think we probably should end up removing support for the flag entirely.
> It's not documented and I think it is only used for the alert notifications.

I agree. I couldn't find any other users apart from nsXULAlerts.cpp either.
Comment on attachment 8386773 [details] [diff] [review]
part 2: refuse reentrant nsCocoaWindow::DoResize as an additional safety measure

Looks fine to me.
Attachment #8386773 - Flags: review?(smichaud) → review+
I've pushed part 2 to inbound, because it already fixes the crash on its own:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0daba7f5bcf7

Neil, about part 1: Do you want me to make any changes, or do any more tests, before you review it?
Keywords: leave-open
I assumed you might test or ask someone more familiar with the other popup uses.
Markus, can we land the crash fix on 29 as well? (ie get aurora or beta approval)
Flags: needinfo?(mstange)
Comment on attachment 8386773 [details] [diff] [review]
part 2: refuse reentrant nsCocoaWindow::DoResize as an additional safety measure

Sure.

[Approval Request Comment]
I'm requesting approval for 29.

Bug caused by (feature/regressing bug #): unknown, probably in the initial HiDPI implementation
User impact if declined: crash when the UI tour starts and the user has multiple monitors, one of them HiDPI, and positions the window offscreen
Testing completed (on m-c, etc.): manual testing + 1 day on mozilla-central
Risk to taking this patch (and alternatives if risky): very low risk, this patch is just a recursion guard
String or IDL/UUID changes made by this patch: none
Attachment #8386773 - Flags: approval-mozilla-beta?
Attachment #8386773 - Flags: approval-mozilla-aurora?
Flags: needinfo?(mstange)
Comment on attachment 8386773 [details] [diff] [review]
part 2: refuse reentrant nsCocoaWindow::DoResize as an additional safety measure

28 is going to be publish in a few days. We cannot accept this patch for beta.
Attachment #8386773 - Flags: approval-mozilla-beta?
Attachment #8386773 - Flags: approval-mozilla-beta-
Attachment #8386773 - Flags: approval-mozilla-aurora?
Attachment #8386773 - Flags: approval-mozilla-aurora+
What's this staying open for? Tests?
Flags: needinfo?(mstange)
Part 1 needs review and landing
Flags: needinfo?(mstange)
(as the crash is now fixed, I think the remainder of this is much less important - we have few users on mac, and even fewer will have a second screen, and still fewer will open their browser halfway on each screen)
Whiteboard: [Australis:P1] → [Australis:P3]
Removing from the Australis tracking dashboard.
Whiteboard: [Australis:P3]
Whiteboard: [Australis:P-]
Attachment #8386772 - Flags: review?(enndeakin) → review+
Bug 985947 blocks (and is triggered by) this bug's part 2 patch ("refuse reentrant nsCocoaWindow::DoResize as an additional safety measure").
The leave-open keyword is there and there is no activity for 6 months.
:mstange, maybe it's time to close this bug?
Flags: needinfo?(mstange)

The leave-open keyword is there and there is no activity for 6 months.
:mstange, maybe it's time to close this bug?

Flags: needinfo?(mstange)

I think this is no longer an issue because modern versions of macOS don't allow having a window visible across screens with different HiDPI-ness.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(mstange)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: