Claus Brod


Jason Hardester posted today about some of the decisions made at Microsoft around WER, and concluded with a call to action to collect requirements for WER from a developer point of view. I'm glad to oblige (And thanks, Jason, for your helpful post.)

My main issue with the current implementation of WER, particularly on Vista, is that it makes it hard for a developer to find out whether a crash report actually made it to Microsoft's Watson servers, and whether that report actually contains the kind of data he expects. At least this was my main problem while implementing WER for one of our applications. It took me weeks to understand all the little requirements until I finally saw my first crash reports at Winqual.

Several factors contributed to this. First, the documentation on the new WER APIs is far from sufficient. I thought that the WER API was meant to be used to report crashes to Microsoft - but it really isn't. If you want to use it for this purpose, you have to know and learn undocumented subtleties about those APIs, which I would never have been able without some hints from Jason and Saar here in this forum. There is no sample code for this out there which uses the new WER APIs. (There is now, of course - on my blog ) True, there are some examples out there which are based on the older ReportFault(), but that API is marked as obsolete in the documentation, and developers are referred to the new WER APIs instead - but the documentation never explains how to use WerReportCreate() & friends so that they can actually replace the older ReportFault().

The second issue is the new minidump protocol on Vista. On Vista, minidumps are only produced if the Watson servers request them. That request, however, will never occur if your application isn't signed properly, or is not registered, or not mapped. Even if you go through those steps, chances are that the server will request the minidumps only after you go to the Winqual portal, check the latest crash reports, and then explicitly "request" minidumps to be collected the next time the same kind of crash occurs. Once you've mastered that step, you'll still have to wait at least three days until you see your crashdump on Winqual. If you're lucky, that is. In my case, I once had to wait over a week.

Issue #2 wouldn't have hurt that badly if CER was still around. With CER, you could very easily configure a developer or test machine so that a) the crashdumps were always produced and b) they were written to some shared directory. Very simple scheme, simple to understand, and reasonably convenient to use.

With Vista, CER is no longer an option. Instead, if you want to collect crash information locally, you have to install and configure Microsoft Operations Manager and its AEM module. This week, I tried this on my Windows XP x64 development system. The download was 280 MB, and when I started the installer, it checked my local configuration and then came up with 10 issues or so which prevented the setup from proceeding - it wanted things like SQL Server, ASP.NET and other server-ish products which I would never want to install on my development system, particularly not for the simple task of writing a few crashdumps to a directory. But even if I was willing to go through the lengthy setup procedure requiring me to install the other 10 packages first before attempting MOM installation again - it still wouldn't let me because it refuses to install on anything but Windows Server 2003 SP2. Which is an OS version which we use only in our IT group here, and not in the labs.

In practice, the hurdle of using MOM/AEM is insurmountable for a developer, IMHO. There are other options to force the system to produce crashdumps and error information, as I've outlined in my blog. But many of them are not exactly obvious, and some require configuration or some setup which is acceptable on a developer system, but not necessarily on a test or customer system.

Now that I've figured out most of the issues, I can describe them reasonably calmly. (Although I'm still not 100% sure that my implementation now works correctly, mostly because whenever we test a new version, we have to wait and wait and wait until we see the results on Winqual. If we ever see them there, that is.) But those who have followed the previous discussions will have noticed that at least for about a week or so, the whole WER situation, particularly on Vista systems, made me go almost berserk. (Thanks to Jason for explaining and for calming me down; without this, I probably would have burst in flames...)

So what would I like to see
  • I want CER back. Yes, it's somewhat kludgy, but it does the job of collecting crashdumps for testing purposes.
  • Alternatively, please give us a client which uses the new http/https-based protocol used for AEM and which can be deployed easily on an average developer or tester machine.
  • Fix the WER API documentation, and document which options need to be used to send crash reports to Microsoft.
  • Provide sample code which uses the WER API (and not just the application recovery APIs).
There's probably other stuff that I forget right now, but the above would be a great start.

Thanks a lot for listening!

Claus

http://www.clausbrod.de/Blog




Re: WER wish list for developers 8-)

Jason Hardester - MSFT


Thanks Claus, for starting this up! ...keep 'em coming!

The idea behind WER is to improve the software ecosystem by offering a service that can report, analyze (either automatically using debugger automation or manually), and close the loop with a response for a fix if appropriate. Early development scenarios are very interesting since this is prevention and enables the ecosystem to improve without user impact. It is the most cost effective to fix issues early in development. The reactive scenarios (crashes happening in release code) will always exist though, and it is good to think about these scenarios as well when we think about features.

Claus, I envision that the CER request (or the new 'client') is more along the lines of redirect and auto-process so the volumes of issues (maybe by function within the file) are clear. Maybe bring this into the IDE Clearly we (Microsoft) need to provide a solution for developers to locally redirect and access crash dumps in early development!

Fixing documentation is known and planned and code samples are being discussed.

What about new features in the WER client or the WER Service Think about what is hard to do right now around fixing WER crashes, and distribution of the fix. Is it easy enough to set up a symbol server How about tracking the effectiveness of the fix There are endless service aspects, and lots of interesting client scenarios that could enable folks to more easily identify problems in released code, prioritize those fixes, release, and notify users of the availability (and how to install) the fix.

Thanks,

-Jason







Re: WER wish list for developers 8-)

Claus Brod

> Clearly we (Microsoft) need to provide a solution for developers to locally redirect
> and access crash dumps in early development!

Not just in early development. Imagine the following scenario: A customer finds a problem in my software which leads to a crash. He calls our support folks and wants a fix yesterday. Because the problem is not readily reproducible, support asks him for the crashdump data. The customer goes to an easily discoverable directory where crashdump data are sorted by application/crashing module, and sends the crashdump to our support group via email, and they can then analyse it and/or forward it to development. From crash to fix development in 10 minutes!

Alternatively, what if WER provided options to the user to send their crash data to the software vendor rather than to Microsoft The software vendor would then set up their own (small-scale) equivalent of Microsoft's Watson servers, and could configure them to operate in " eager" mode (always send crashdump data no matter what).

Yet another alternative: If the Watson/Winqual process didn't take as long as it does today, that would also help to deal with such situations. Customer runs app, app crashes, app reports to Microsoft, crash data appear on Winqual 10 minutes later - instead of four days or a week later.

For illustration: To test our WER code, I provoked some crashes on Friday, and today (Tuesday), I still don't see any results on Winqual. This kind of turnaround time is pretty much hopeless, frankly.

While I love the ideas you presented, such as improved IDE integration etc., I'd rather see the current process fixed first. Once that has been done, more and more developers will use it, and you'll be swamped with ideas anyway

Claus






Re: WER wish list for developers 8-)

DHON

Further to what Claus has mentioned I would like to add the following points.

Some of the documentation available in the net on WER are obsolete. So developers are mislead. In one of my post, I mentioned about this. Available information stated that once you map for version say x.x , then you do not need to map the later version of the same product separately. Later on Jason confirmed that for each release, mapping has to be explicit and agreed to make the documentation part updated. For some time I was having a different understanding. So, one of the first step should be - update all the available information on WER in net.

One minor change I would like to see is either you make Bucket id in Problem reports and Solutions as well as in Winqual or event id in both the places to make the understanding on the concept easier.

Also, please improve the turnaround time for reports in Winqual. 1 - 4 days is too much from developer's perspective.

Thanks,

DHON





Re: WER wish list for developers 8-)

Jason Hardester - MSFT

Thanks DHON,

We can make the change in text from Event ID to Bucket ID to match the client logging, and update the documentation (folks are already working on that).

What does the turnaround time (Aggregating and calculating growth trends for WER reports from around the world) need to be for developers to be effective

Thanks,

-Jason






Re: WER wish list for developers 8-)

Claus Brod

Many support contracts with customers promise a reaction within 24 hours - that would be the upper limit for the processing delay which makes the Winqual portal a viable tool for customer support in crash situations.

However, it is very desirable to have turnaround times even significantly lower than that (less than an hour).

I realize this is asking a lot of the Watson/Winqual servers; after all, they probably have to process huge amounts of data each day. This is why I would love to see an option for ISVs to set up their own local "Watson servers" to which customers could send crash data when the WER dialog appears.

For instance, an application could use a WER API configuration function to specify an alternate WER server. In our case, this could be something like watson.cocreate.com. When a crash occurs in one of our apps, WER would display its usual dialogs. Instead of presenting the simple choice between "Don't Send" and "Send", the user would be informed that he can send the crash data both to Microsoft (default), or to us, or both.

From a WER API point of view, this is probably a simple feat. However, it means that we would have to set up a publicly accessibly CER/AEM-like server - which could be a little challenging, for example from a security point of view. Also, since the WER API needs to change, it would mean we wouldn't see any improvement at all before at least Vista SP1 or SP2, probably even later.

But alternatively, what if we could tell the Winqual portal that crashes reported for our applications should be routed immediately to one of our servers This way, the WER API wouldn't need to change at all, and since only Microsoft would know about the whereabouts and details of our server connection, secure operation would be easier to achieve.

So once we'd register our own Watson-like server with Microsoft, here's a typical sequence of events:
  • Application crashes at customer site
  • WER dialog is displayed, customer chooses to send crash data
  • Watson server at Microsoft checks if there is an ISV registration for the crashing app, and whether the ISV has also registered an ISV-local Watson server
  • If so, the crash report is immediately forwarded to the ISV's server.
  • Process the crash reports as usual, and make them available through Winqual.
In this scenario, Winqual delays of 3-4 days or even a week wouldn't be a big issue anymore for an ISV.

Claus

http://www.clausbrod.de/Blog





Re: WER wish list for developers 8-)

DHON

Hi Jason,

The application I am working already sends a crash report to one of our company's mail id. So, I am least bothered about addressing the customer issue(crash) as I don't need to be dependent on Winqual.

But, presently I am doing implementation for Vista certification and one of the requirement is that WER report must be sent to Winqual. If I post a report it takes sufficient time to reflect the report in Winqual. So, my point is developers have to wait sufficiently longer time to verify his changes. I would like to see this turn around time reduced.

Regards,

Dhon





Re: WER wish list for developers 8-)

Jason Hardester - MSFT

Thanks Dhon,

I understand... reduce the time it takes for a report to be viewable in Winqual.

On an aside, I want to point out that sending a crash to a mail client does not prioritize or organize these crashes but rather provides a simple transport. There is still the tax of evaluating each report and determining the priority and severity of the crash. If your reporting scenario is smart enough to provide a classification that can be automatically processed and 'triaged' against a bug bar based on volumes and growth over a period of time relative to the development cycle of the product and assign the bugs to the developer ...that is gold. WER Services does the classification and prioritization and also offers a mechanism to automatically share information with your end-users when a solution to their crash is available.

Do you have any ideas for features that would make your life easier around Error Reporting Maybe in the way we classify the reports, or a view that enables you to select different builds of a process and show the crash reports in a visual way (maybe an area chart) that overlays the process information to show improvements build over build






Re: WER wish list for developers 8-)

Claus Brod

Can't speak for DHON, and my own priorities will probably change once we're flooded by crash reports in Winqual, but right now, the nicest visual classification and representation wouldn't buy me anything unless the turnaround times can be improved very significantly.

It would also help a lot if the Winqual web server itself would be snappier. Right now, any click on pretty much any link on that site takes at least 5, often 15 seconds or more until something happens... not very useful when you want to browse a number of incoming bug reports.

Don't get me wrong, the whole WER/Winqual idea is great, and it's fantastic that Microsoft shares this service with ISVs and developers, but is it possible that all those crash reports are simply too many to be processed efficiently at Microsoft If so, wouldn't a decentralized approach make more sense

Just my 0.02E,

Claus





Re: WER wish list for developers 8-)

DHON

WER is quite new compared to the existing flow of my application. I agree that one has to pay tax for this sort of flow(my appln). If WER can solve that part it's great. But, still we have to wait and see to what extent Winqual can help us in solving our crash issues ! Most importantly the turnaround time. This view is w.r.t my product. May be people have started benefitting from Winqual.

Right now my prority is to get the WER implemented. Once we feel that WER is speeding up our turnaround time for crashes, then of course we can change the flow of our application. Otherwise changing the flow of an age old application is the biggest hurdle in any product development.

Regards,

Dhon





Re: WER wish list for developers 8-)

Matt Houser

We used to have our own crash report system, but we moved to using WER simply because it was a requirement for Certification.

As such, we're finding that there is some good and bad to using WER.

(a) We cannot link a crash with a customer. If a customer calls us up about a crash they're having, we used to be able to look through the crash reports for their email/name and process the crash. We cannot do this in WER (even if the turnaround time was faster), the link to a customer just does not exist (atleast that I have been able to see). We need an option for a customer to provide info to help us.

(b) Sometimes a crash is due to specific data. It would be nice if a customer could "attach" sample files or data to the crash report being sent to Microsoft and then downloaded by the developer.

(c) It would be nice to have a Crash-by-date (not all grouped by event) view so we can see the crashes as they come in.

(d) We used to be able to just glance at our crash data (because it was text) and in many cases, we could determine it's the same crash as one in another version (ie. link events between versions). However, using WER, we cannot know any details of a crash until we download the dump and load it in a dev environment. This process is much slower because it means we need to fish out the old builds. Maybe there's some tool I am not aware of that can combine dumps and MAP files to get a stack in text format

(e) Add an option to download all cabs for an event. Recently it was changed to download a cab per crash. If I want them all, then I have to download many.

Thanks,

...Matt





Re: WER wish list for developers 8-)

Matt Houser

(f) Allow us to add comments to an event and also add comments and/or otherwise mark cabs so we know what we've downloaded and looked at.

Thanks,

...Matt





Re: WER wish list for developers 8-)

Jason Hardester - MSFT

Thanks Matt! These are very good suggestions, and we will review them today. Some are already understood (like the ability to download all cabs).

Offhand:

For A), the privacy policy prevents us from doing this for legal reasons since the privacy laws are different in various parts of the world. Can you elaborate how this would be helpful to you (connecting an individual user to a crash report) At Microsoft, we use the ID to correlate an issue (bucket of reports) to a user if the user calls our support asking for general help with crashes.

For B), we will be enabling you to set custom data types to collect through the service. The Error Reporting client is smart enough to collect the additional data requested on the server side at the time of the report. This will be one of our new features in the very next WER Service release.

For C) ...this isn't something we have thought about. Can you talk a bit here on how this will be helpful Offhand my first reaction was that you would not be able to properly prioritize your issues and would simply try to look at each report for that day.

For D), Let me talk with the teams here and see if there is a solution that maps to what you are looking for today.

For E) ˇ­Yep.

For F), would you use these comments for tracking an issue to closure, or for general information purposes Would this request dovetail to assigning an ownership

Kind regards,

-Jason






Re: WER wish list for developers 8-)

Matt Houser

(a) The common case is that a customer calls support, we say to submit the crash to Microsoft, but we have no idea what crash is for that customer (for the developers to examine and provide feedback to support/customer).

(b) Great.

(c) This may help out with (a) above. But for instance, if I know that Q/A had a crash on Thursday and I want to track it down to debug, then a view by date would be helpful. It's impossible in a by-event view.

(d) Thanks.

(e) Thanks.

(f) Part of it is to provide some comments from the developer. For example, say a customer has a crash and our developer by (a) above figure out which one is his, then we could say "oh, that's the so-and-so crash". Otherwise, we have to download the cabs and debug just to determine that we already know the details of the crash. People could put their bug-tracking IDs in it also. As well, when looking at the "Hotlist", when I see the top items, I cannot tell if we've fixed them or not.

...Matt





Re: WER wish list for developers 8-)

Joel Stein

Jason Hardester - MSFT wrote:

For A), the privacy policy prevents us from doing this for legal reasons since the privacy laws are different in various parts of the world. Can you elaborate how this would be helpful to you (connecting an individual user to a crash report) At Microsoft, we use the ID to correlate an issue (bucket of reports) to a user if the user calls our support asking for general help with crashes.

In our case it would be helpful to know if the report came from internal (QA) or external (customer) source. Would privacy policy concerns allow you to categorize that far

We often have product modules on different release cycles because of inter-dependencies, so it is not unusual for product A to be released, and still get crash reports from Product B (in test).

thanks!

-j.