Error Handling and Reporting in Applications

All applications have to deal with what to do with errors. An unexpected condition is bound to happen one rainy day no matter how good your QA is.

When a problem does happen, would your application display a meaningful message to a user or something to the tune of “A system error occurred. Try again later.”? Are application errors automatically monitored, collected, triaged, and reported to developers?

Handling of Expected and Unexpected Errors in Code

When a runtime error happens inside code it is encapsulated in some sort of an Exception object in languages with structured exception handling. The basic question to ask when handing it is whether the error is by its nature an unexpected error condition or not.

Unexpected error conditions are:

a) problems happening outside of your system control: for example, the network went down, the system password expired, memory ran out, etc.

b) defects in code

In both of these cases code receiving an unexpected error can’t do that much about it. Retry the call? Well, if the called code has a reproducible defect, you’re just going to reproduce it again. Retries might help with temporary outages: e.g. the database went down, but DBAs are busy restarting it right now. However, retries usually have a narrow scope of usefulness and are intrinsically limited: by the number of retires, time interval, user expecting a timely response, etc. Once the limit is reached, the retry logic has to be abandoned and we’re back to handling the original problem.

Error Swallowing Anti-Pattern

I’ve seen this anti-pattern in most projects and languages I dealt with:

// pseudo-code

try {
  doSomething(); // this call errors out
} catch (Exception) {
  // we're supposed to handle errors, right?
  // log some message, perhaps?

  // continuing... ?
// why do we end up here? the rest of the code might not expect this exceptional condition to have happened

This is what’s known as Error Swallowing. Sweeping under the rug, isn’t it? What happens when an error is “swallowed” could be quite unpredictable. Perhaps, a report rendered ends up with a blank space instead of a piece of data. This situation could lead to bad business decisions and outcomes beyond the immediate system issue.

Error Swallowing is a violation of an important technique in software development: Fail Fast (a.k.a. Fail Early).

Failing fast is a nonintuitive technique: “failing immediately and visibly” sounds like it would make your software more fragile, but it actually makes it more robust. Bugs are easier to find and fix, so fewer go into production.

Jim Shore, September/October 2004 IEEE SOFTWARE

This is an anti-pattern with the shortest fix imaginable: simply not doing this is often good enough of a fix.

// pseudo-code

doSomething(); // this call errors out
// unexpected errors propagate

Advice #1. Let unexpected errors propagate and fail the execution path fast.

Further discussion of the Error Swallowing anti-pattern could be found in Effective Java, 3rd Edition. Item 77: Don’t ignore exceptions.

Expected Errors and Exception Translation

Of course, some exceptional conditions could be expected by system developers. What if a user enters an invalid piece of data? The system ideally should validate all user input. If it really does then a good human-readable message should result. Inside layers of code, which handles expected exceptional conditions, the raised error should be specific to a particular failed condition. In the case of user input failing validation, there are 400 (Bad Request) and 403 (Forbidden) codes in HTTP. A RESTful service on the server side should return them with an error message payload. More generally, you can use e.g. java.lang.IllegalArgumentException in Java or System.ArgumentException in C#.

Should code intercept low-level exceptions and wrap them into higher-level exceptions? This practice is known as Exception Translation. The usefulness of such translation for unexpected exceptions could be limited. So what, if you wrapped DbPasswordExpiredException into ServiceUnavailableError? It would not make difference for an end user. The bottom line is that the system errors out unexpectedly because it broke internally.

Such a translation would make the most sense when you manage to expect a particular problem and display a meaningful message to the user. If you, say, anticipate an error in case a user’s credit card is valid, but over the limit or blocked, you can intercept a particular exception from a lower-level payment provider service, and translate it into a custom error, resulting in “Your credit card was declined. Please update your payment method.“.

Reporting of Unexpected Errors

From a system development perspective, developers should receive as much technical information as possible when unexpected errors happen. This could happen in one of two ways: either from the backend (think, error logs) or from user reports.

All errors could be logged; logs could be monitored and unexpected error reports could be extracted and sent to a developer team’s notification pipeline.

One thing to keep in mind when developing unexpected error notification pipeline is to avoid flooding developers with notifications:

  • consider implementing periodic (e.g. nightly) rather than real-time notifications
  • consider filtering out repeating errors which are caused by known (filed) defects

Advice #2. Monitor and report unexpected system errors in Production.

When an unexpected inevitably happens what should the user see?

Unexpected error report in macOS: iCal application
Unexpected error report in macOS: iCal application

There are security guidelines that warn against showing technical details like this to users on the Internet because bad actors can make the system fail and then get clues out of technical details. This practice of “Error Detail Hiding” is a part of guidelines in the OWASP’s “Improper Error Handling” risk.

Error Detail Hiding though, if followed blindly, might have a large cost. The development feedback loop should better be as short as possible. The development loop is making a change, deploying code to a target environment, and then observing the result. When a system is being developed each error cause needs to be as obvious to developers as possible. While developers can often run the system on their machines (that is, in a Local environment) and just watch the error details in logs the same can’t be said about all other non-production environments.

Non-production environments are not available to end users and tend to be far less exposed to security risks. While security considerations are, obviously, important, each system architecture strikes some balance of functional, User Experience, and other requirements: performance, ease of maintenance, resilience, etc. While trying to strike that balance, keep in mind that Error Detail Hiding is quite detrimental to ease of maintenance and speed of development.

Advice #3. Integrate detailed error reporting into a system and always make technical error details immediately accessible in non-Production environments.

The Improper Error Handling risk used to be in the OWASP Top 10 risks (in 2003-2007) and people, understandably, were reticent to show error details in the apps. The risk is no longer there as of this writing (in the 2021 Top 10 version). It is not hard to guess why: for example, a basic technology stack can be detected without the arduous process of trying to make the system fail. BuiltWith, as of this writing detects 142 technologies used to run

But, what if you still want to minimize the risk? If robust error monitoring/reporting is in place on the backend, then it is fine to hide the details in production because developers would get reports anyway. But if you have nothing like that for your system, then Error Detail Hiding would substantially harm development productivity.

Advice #4. Only hide error details in Production when you have robust system error monitoring (per Advice #2) in place.


How slow can you go?

System performance testing is often done as a high-ceremony performed for major releases only.

  • Project on its own, designing:
    • Requirement Analysis and Gathering
    • PoC/Tool selection
    • Performance Test Plan
    • Performance Test Modeling
    • Test Execution
    • Test Analysis
  • Done in a Test environment designed similar to Production
    • Highly controlled environment
  • Aiming to produce high loads and hard results: numbers, pass/fail
  • Done by performance testing professionals

In today’s Agile and DevOps era teams are craving for rapid feedback and that certainly includes the impact of changes on system performance from one build to another. Performance testing can be done with reduced ceremony, continuously:

  • As a part of a build pipeline
  • In a lower-end Test environment
  • Aiming to produce build trends because results could be more volatile
  • Done by a DevOps team

Continuous Performance Testing is not difficult to set up on most projects. All you need is the following three ingredients, and each of them could be an open-source tool:

  1. Load Generator Tool
  2. Continuous Integration Server
  3. Chart Generator

The overall architecture (the big picture) of a solution relies on the usual Continuous Integration approach:


Load Generator Tools

Load generators can load a system-under-test with (HTTP) requests simulating the activity of virtual users. They support load logic (# of simulated clients, # of loops, spikes, fixed time testing, etc.) as well as simple scripting and assertion logic. Load generation could also be distributed to multiple machines.

Open source choices include:

  • Apache JMeter: a mature performance testing IDE. Tests are developed primarily via composition and configuration of JMeter components.
  • Gatling: Scala code-based tests, which can be generated recorder UI gene5
  • Locust: Python code-based tests

You can find a more comprehensive comparison here: Open Source Load Testing Tools: Which One Should You Use?

Commercial tools include LoadRunner, Telerik Test Studio and New Relic Synthetics. Notably, Microsoft performance testing offerings in Visual Studio Enterprise and Azure DevOps were deprecated in 2011 and Microsoft recommends a migration path to JMeter.

Finally, load generators can be hosted and used as a service. Offerings in this space include BlazeMeter and WebPagetest (a cross of performance testing and web page loading profiler).

Continuous Integration (CI) Servers

Jenkins is a popular open-source choice offering a wide variety of plug-ins, including chart generators. Any other CI server is also an option.

Chart Generators

The critical part of Continuous Performance Testing is providing feedback to the DevOps team, ideally, on every build. This could be done in the form of generated charts showing performance trends. This could be done by feeding raw output data (e.g. CSVs, XMLs) to dashboards. An easier solution to implement though is to use CI server built-in or third-party chart generation capabilities. There are two good options that can be used if you’re running Jenkins.

Jenkins Performance Plug-In

Jenkins Performance Plug-In can be used to chart runtime fo executed tets. It understands JMeter, JUnit and Taurus formats. Taurus is a wrapper around 20+ other load generators.

Usage Tips:

  • Use Advanced > Display Performance Report Per Test Case = on. This would chart one line per a performance test case (e.g. a JMeter Sampler) average data point.
  • Chart titles = data file names. You can omit .xml suffix and give files verbose names with spaces to use them as chart titles. Patterns containing spaces would be matched. For example, the build/jmeter-report/ * pattern would match any data file name that starts from space.
  • Performance Plug-in would publish reports only for data files found in the latest build. Data files previously found in builds, but not present in the latest build are ignored. If you want to start a fresh baseline, you can do this by simply renaming data file(s).
  • If there is a single data file, its charts are published on the main job page. If there are several reports, charts could be found on the job’s Performance Trend page, ordered by titles.
  • You have to have at least two builds in order to see trend charts. If you have a test, which was temporarily disabled or removed then re-enabled, you would need to have it in at least two builds in order to see a corresponding trend line.
  • Deleted builds are automatically removed from reports.
  • It is recommended to split reports containing more than 10-20 tests so that they are easier to read.


Jenkins Plot Plug-In

Jenkins Plot Plug-In can be used to chart arbitrary generated data. It’s useful for fixed-time performance tests generating, for example, Transactions Per Second (TPS) metrics.

Using OneDrive for Business to Sync Large Volumes of Data

Do you use Office 365? If you do, you can also use 1TB cloud storage at no additional cost, courtesy of OneDrive for Business. This gives Microsoft a leg up on competitors: Dropbox, Box Sync, Google Drive, iCloud, etc. As we learned during the First Browser War (IE vs Netscape Navigator), it’s hard for a paid product to compete with a free one. On the other hand, OneDrive places a lot of restrictions on file names, file path lengths and so on and might be a bit hard around the edges as of this writing (February 2017). Nevertheless, why not give it a shot?

I took a plunge on my Mac, put about 150 GB of data into OneDrive on Office 365 and has been syncing it for several months. The road was bumpy, to say the least, but in the end, I could resolve all the issues I faced so far.

Here is what I learned. (These tips apply to macOS or Windows. Included scripts are Bash scripts for Mac, but they could be potentially ported to PowerShell for Windows.)

The information below applies to syncing your local files to OneDrive cloud and between computers. If you need to sync existing (e.g. on premises) SharePoint libraries, you would face additional restrictions.

Tip 0. Give up on syncing OS-specific data.

On Macs, the following artifacts either don’t sync or sync incorrectly:

  1. Symbolic links
  2. Aliases
  3. Extended file attributes: tags, etc.

Tip 1. Scan your files and folders for long path names.

File path length restriction: The entire path, including the file name, must contain fewer than 255 characters.

Long file paths were immediately killing my OneDrive client (with a pop-up dialog message). They were fatal.

You can scan for these problems before putting your files into OneDrive (see shell script in the Tip 2 section). Once you identify long file paths, shorten your folder structure or archive files as necessary.

Tip 2. Scan your files and folder names for problematic characters.

OneDrive on Office 365 does not sync files or folders that:

(BTW, practically all characters are allowed in macOS filenames except ‘/’ at the Unix layer. ‘:’ is not allowed at the Carbon layer, but it is possible to create filenames with ‘:’.)

The problematic files or folders can’t be synced and manifest themselves as sync errors (that is as those badges with numbers on a OneDrive Dock icon on Mac). If you have many of them, they could cause other issues, for example, an unresponsive OneDrive client:


Or very wide OneDrive UI windows:

2 - OneDrive - Very Wide Window.png

Here is the combined shell script to detect Tip 1 and Tip 2 problems:

#!/usr/bin/env bash
#set -x
if [[ "$1" == "-h" ]]; then
exec 1>&2
cat << EOF
Finds files and directories in OneDrive that are invalid to sync

Practically all characters are allowed in macOS filenames except / at the Unix layer.
: is not allowed at the Carbon layer, but it is possible to create filenames with :. See and

OneDrive does not sync files or folders that:
* Include any of the following characters: \#%:"|?*/
* Begin or end with a space
* End with .
* Begin with ..
* Have very long pathnames (>= 255 characters)

Usage: $0 [-h]

-h Help
Ignores unreachable directories (and other errors)

exit 127

# Comment out the following line to search prior to putting files into OneDrive
cd ~/OneDrive*
echo "Searching for files and directories that are invalid to sync in $PWD..."

find -E . -regex '(.*[\#%:"|?*].*)|.{255,}' -or -name ' *' -or -name '* ' -or -name '?*.' -or -name '..*' 2>/dev/null
# -regex pattern is true if the whole path of the file matches pattern.
# For example: to match a file named `./foo/xyzzy`, you can use the regular expression `.∗/[xyz]∗` or `.∗/foo/.∗`, but not `xyzzy` or `/foo/`.
# -name '*/*' could be added to test for filenames including /, but those are illegal in macOS.
# \ (backslash) inside a pattern character class is used as escape only when followed by any of these: ^-]\
# '?*.' glob pattern prevents the root directory (.) from matching

echo "Done."

Once you identify problematic names, you have several choices on how to resolve them:

  1. Rename
  2. Archive (into ZIP, RAR, etc.)
  3. Move out of OneDrive
  4. Delete if not needed.

Tip 3. Watch for sync conflicts.

If you sync among two or more computers, be prepared for conflicts to occur sooner or later. A conflict is a situation where OneDrive can’t decide on the definitive version of a file to use. In this case it’d silently create the second file in the same directory using naming convention {original name}-{computer name}.{original extension}. Often file contents would be the same, but to avoid any data or storage loss you need to detect and manually resolve these conflicts.

The following shell script watches for conflicts in real time. When a conflict is found, the script logs the conflict and notifies via Growl. This script is designed to run in the background. You can launch it when your computer starts, e.g. using Login Items or launchd:

#!/usr/bin/env bash
#set -x
exec 1>>~/Library/Logs/conflict-watches.log 2>>~/Library/Logs/conflict-watches.log
echo "`date` conflict-watch-onedrive> Starting ..."


# Base file name marker, e.g. "Communications-MyComputer.msf" or "Communications-MyComputer"
conflictMarker=-`hostname -s`

echo "`date` conflict-watch-onedrive> Watching" $watchedPath/ "..."

# Enters the waiting loop
# Includes only Created and Renamed (= moved) events. Deletions, updates and other events are ignored.
fswatch -r0 --event Created --event Renamed -e ".*" -i "$conflictMarker" $watchedPath | while read -d "" filename
echo "`date` OneDrive conflict: $filename"
growlnotify "OneDrive conflict" -m "$filename"

Tip 4. Reboot less often.

After OneDrive starts, it takes hours on my machine to finish scanning of about 100 GB of data. During all that time OneDrive consumes about 100% of one core’s CPU:


Temperature goes up and fans spin all that time. This is a lot wasted resources. To avoid this, I changed my habits and stopped shutting down my laptop daily.

Tip 5. Sync only what needs to be synced. Don’t sync large, frequently changed files.

If your data already lives on some servers and get synced among your devices, you don’t need to put it into OneDrive. For example, corporate Exchange email is stored on Exchange servers and any changes in it get automatically synced among Outlook clients. On the other hand, personal Outlook archives might be locally stored files, which you might want to put into OneDrive.

I have private email stored in a Mozilla Thunderbird profile, in a combination of Thunderbird IMAP and POP mailboxes and email archives. I tried to put the entire profile into OneDrive to sync all email along with all the Thunderbird settings. It turned out to be a bad idea because of a way Thunderbird stores and caches mail. Even for IMAP mailboxes, I have, for example, 1.7 GB (after compacting) GMail All Mail folder, which is stored in a single file.

In OneDrive for Business, unlimited versioning is usually turned on by default (this might depend on how your IT administrators configured your Office365 Sharepoint). Unlike, for example Dropbox, previous versions in OneDrive consume storage and count towards 1TB storage limit.

Frequent updates in 1.7 GB and my other large mailbox folders were stored as separate versions and they quickly consumed all 1 TB of OneDrive space. The solution for this issue was to move Thunderbird profile out of OneDrive and sync it by other means.

Tip 6. Watch for phantom syncs.

A couple of my files caused OneDrive to never finish: files were always shown in the process of being synced, e.g.:


Earlier versions of OneDrive client didn’t have a detail progress window to see what’s going on. After the progress window had been added in the later versions I was able to catch onto these phantoms and dig deeper. It turns out that OneDrive server was showing some sort of virus problem in a couple of my old mailbox folders and that’s probably why phantoms were occurring. Once I removed offending files, the problem was resolved.

Tip 7. Use selective sync to solve round-robin crashes.

OneDrive synced all the files, but after some time entered round-robin crash cycle that went like this:

  1. OneDrive keeps crashing after start on computer A and prompts to reset it. Reset OneDrive on computer A.
  2. Wait hours for initial sync to complete on computer A.
  3. OneDrive crashes on computer B immediately and keeps crashing after start. Reset OneDrive on computer B.
  4. Wait hours for initial sync to complete on computer B.
  5. Go to step 1.

I resolved this situation by selectively syncing only some files on both computers and gradually increasing sync scope until the whole drive is covered and full sync (sync all files) could be turned on.

When you sync selectively, OneDrive folder is emptied initially. Hence, if you need files from it in the meantime, move them aside. Full backup is highly recommended.

And here it is: a successfully synced OneDrive.


July 17, 2017 Updates: Added tips 0 and tip 7.

March 29, 2017 Update: script has been updated to respond only to Created and Renamed events.

The Quest for Great Coffee: Kiev

Recently I was in Kiev on a vacation and, taking advantage of an occasion, explored its vibrant coffee scene. Years ago (think former Soviet Union time), Kiev had several great cafés making decent espresso, Turkish coffee, drinking chocolate, etc.

Nowadays, the scene is a mixture of numerous shops of several competing espresso chains, individual specialized coffeehouses, restaurants and even mobile “espresso trucks”:

The competition, especially in the downtown, is fierce. I notice one small single building (Chervonoarmiis’ka St, 48) that had 3 different shops.

My project was to find high-end shops that make a great “wet” cappuccino. After some initial Internet research, I ended up with a list of 11 shops. I tried cappuccino in all of these shops and also tried food in most. The prices for a small cappuccino varied roughly from US $1.5 to US $6.

I graded them on the scale of 1 to 5. Note: I am not a professional master taster; I am just a coffee lover with a passion for a great cappuccino. I’m not affiliated in any way with any of these shops.

Coffeeshop Kiev’s Location Seating / Décor WiFi Food Cappuccino

Espreso Kіmnata (Еспресо Кімната)

Khreshchatyk St, 40/1 (вул. Xрещатик, 40/1)

2 No N/A 3

Kofe Khauz (Кофе Хауз)

Velyka Zhytomyrs’ka St, 8/14 (Велика Житомирська вул., 8/14) + 27 locations

4 No (?) 5- 4-

Dom Kofe “Olympyyskyy” (Дом Кофе “Олимпийский”)

Chervonoarmiis’ka St, 72 (Червоноармійська вул., 72)

3 No N/A 5

Espressamente Illy at Arena

Baseina St, 2a, Arena, 2nd floor

(Басейна вул., 2а, Арена, 2й поверх)

3 Yes 4 3+

Restoran “Bel’veder” (Ресторан “Бельведер”)

Dniprovs’kyi descent, 1 (Дніпровський узвіз, 1)

5+ Yes N/A 3

boutiquebar Biancoro

Baseina St, 4 (Басейна вул., 4)

4 Yes N/A 3+

Aroma Espresso Bar KIEV

Dymytrova St, 5 (вул. Димитрова, 5)+ 2 other locations

5 No (?) 3+ 3+

Golden Ducat (Золотий дукат)

Chervonoarmiis’ka St, 48 (Червоноармійська вул., 48); Instytuts’ka St, 16 (вул. Інститутська, 16) + 1 other location

5+ No (?) 4+ 3

Coffee Life

Maidan Nezalezhnosti, 2 (майдан Незалежності, 2) + 5 other locations

5 Yes 5 3+

Kaffa (“Каффа” на Подоле)

Hryhoriya Skovorody St, 5 (вул. Григорія Сковороди, 5)

5 No (?) N/A 2

Double Coffee

Mykhailivs’ka St, 8а (Михайлівська вул., 8а) + 7 other locations

5 Yes 5+ 3+

Most of these shops do make “wet” cappuccino, but their quality can be improved. Often, the taste is not bitter, as in a bad espresso, but rather stale or harsh. Still, most of them do a better job than Starbucks. Starbucks would get a grade of 2 to 3- on my scale.

The clear winner for me is Dom Kofe “Olympyyskyy” (Дом Кофе “Олимпийский”). It’s a very small shop, but it makes a great cappuccino and offers a selection of beans for your espresso drinks.

Automate your Feature and Acceptance Tests in Four Easy Steps

Any significant web application at some point faces the question of how to approach testing of user-facing functionality. Developers could be (hopefully) happily churning out their JUnit-s, NUnit-s and Test::Unit-s, practicing TDD and Continuous Integration, but what does testing your classes, methods, and functions have to do with what your customers actually need from your system?

The answer is: not much, really. Testing web application from the end-user perspective involves dealing with test cases written in a business language, and secondly, firing multiple browsers and hunting down why this particular feature works perfectly fine in a browser Foo and does not work at all in a browser Baz. The right tools for automating this process are Behavior-Driven Development tools and browser “drivers”. These tools come from different development worlds and communities: Ruby, .NET, Java.

Fortunately, you can use many of these tools to test any web application regardless of your application platform whether it’s JavaEE, .NET, ColdFusion, PHP or Ruby on Rails. The tools came the long way since early days of Fit/Fitnesse and you can get from zero feature tests to automation in few easy steps using any desktop OS: Windows, Mac OS X or Linux. I promise.

What might be the most difficult part of this process is bewildering number of choices to make. A lot of choice is a Good Thing and all the tools mentioned in this post are free, open-source tools, but how it all fits together could be quite confusing. Fear not. Let’s walk through the steps.

Step 1. Select and Install a BDD Tool

Behavior-Driven Development is taking automated unit testing a level higher. The emphasis is on a language and readability of tests (or rather scenarios, broken down to steps in BDD parlance). One of the most active communities championing BDD tools is Ruby world. The framework that generated a lot of buzz in the last couple of years is Cucumber. It is a flexible Ruby framework that uses a wonderful domain-specific language for writing your scenarios: Gherkin. Gherkin is pretty much structured English or one of 37 other supported spoken languages. Cucumber natively uses Ruby for glue code (step definitions) that ties your features, scenarios and steps to browser drivers (or to native target applications).

Now, if you don’t know Ruby you might be tempted to go with one of Cucumber ports, for example cucumber-jvm (Java) or SpecFlow (.NET). You would get to write glue code in your programming language of choice, but the setup if often more complex, ports could be less mature or lagging behind flagship Ruby tools. It is hard to come up with other benefits of ports for testing web applications. Why not use this opportunity and learn a fun dynamic language with a lot of interesting ideas? It is used as the first language to teach kids, after all, and you don’t have to learn Rails and a lot of other Ruby frameworks, just the language itself.

My choice: Cucumber

Other choices: Fit/Fitnesse, SpecFlow, RSpec, easyb, etc.

What to do:

  • Windows: Install Ruby 1.8.x and add it to the PATH. Latest versions of Mac OS X come with Ruby 1.8.x. Don’t use Ruby 1.9.x just yet (see below).
  • Set RUBYOPT=-rubygems in your environment.
  • Go to your command line and install the Cucumber Ruby gem.
    >gem install cucumber

    Note: Any gem installation might fail with the following:
    ERROR: does not appear to be a repository
    This problem is transient. Try again in 5 minutes. Another option is to download needed .gem manually and install it:

    >cd {download directory}
    >gem install -lV cucumber

    The local installation (-l) will install Gem dependencies only if they are available in the Gem caches or in the local directory. You might need to download dependencies manually as well.

  • Windows: (optional, but highly recommended in order to display colored output) Install ANSICON.
  • Install RSpec for writing assertions in your glue code.
    >gem install rspec