design (asai.design)

Five Independent Parameters of a Diagnostic

In addition to the main message, the API should allow an implementer to easily specify the following five factors of a diagnostic, and it should be possible to specify them independently.

Whether the program terminates after the sending. This is indicated by the choice between emit (for non-fatal diagnostics) and fatal (for fatal ones).
A succinct code for Googling, for example V0003. A succinct representation is useful for an end user to report a bug or ask for help.
How seriously the end user should take the message. Is it a warning, an error, or just a hint? See the type severity for available classifications. In practice, diagnostics with the same message tend to have the same severity, and thus our API requires an implementer to specify a default severity level for each message. While this seems to violate the independence constraint, our API allows overriding the default severity level at each call of emit or fatal.
A stack backtrace. It should be straightforward to push new stack frames. Our implementation is trace.
Additional notes. It should be possible to attach any numbers of additional notes with location information. Currently, emit and fatal are taking an optional argument extra_remarks.

Free-Form and Structured Reporting

We realized there are two distinct use modes when coming to message reporting:

One is focusing on free-form texts, which means one directly specifies the long explanation every time a diagnostic is sent; the messages are only a loose categorization of the long explanations.
The other is focusing on fully structured messages, which means one directly specifies a structured message (e.g., an element of a variant type) and the long explanation is determined by the message. The message captures all information in the long explanation.

We should support at least the free-form use mode, and ideally support both. The free-form reporting is implemented as Reporter and the structured one is implemented as StructuredReporter.

Compositionality: Using Libraries that Use `asai`

It should be easy for an application to use other libraries who themselves use asai, even if the application and the library made different choices between free-form and structured reporting. Our current implementation uses the same diagnostic type for both reporting styles and allows an application to adopt diagnostics from a library.

Inheritance of Location Information

The original design of asai did not retain any location information across API calls. That is, the location used to create a backtrace frame will not be inherited by inner API calls:

Reporter.trace ~loc "outer trace with a location" @@ fun () ->
Reporter.emit Greeting "hello"
(* the message ["hello"] originally would not have a location *)

The idea was to prevent an excessive amount of location information. However, all early adopters of this library implemented their own mechanisms to retain the location information, which indicated that the original design was cumbersome in practice.

The new design (partially introduced in 0.1 and then fully implemented in 0.3) is to remember the locations of backtrace frames (the argument loc when calling trace) and use the innermost one as the default location for emit or fatal. An inner backtrace frame, however, will not use the inherited location of another frame to avoid duplicated locations in a backtrace (unless an implementer explicitly specifies the same location). We believe this strikes a good balance between convenience and succinctness. For example,

Reporter.trace ~loc "outer trace with a location" @@ fun () ->
Reporter.trace "inner trace" @@ fun () ->
(* the frame ["inner trace"] will not have a location *)
Reporter.emit Greeting "hello"
(* the message ["hello"] will have [loc] as its location *)

Note: inherited locations can be overwritten by the optional argument loc at any time.

Unicode Art

There is a long history of using ASCII printable characters and ANSI escape sequences, and recently also non-ASCII Unicode characters, to draw pictures on terminals. To display compiler diagnostics, this technique has been used to assemble line numbers, code from the end user, code highlighting, and other pieces of information in a visually pleasing way. Non-ASCII Unicode characters (from the implementer or from the end user) greatly expand the vocabulary of ASCII art, and we will call the new art form Unicode art to signify the use of non-ASCII characters.

In asai, we made the unusual choice to abandon column numbers (and any Unicode art that depends on them) so that our Unicode art remains flawless in the presence of tricky Unicode character sequences. In particular, the following highlighting method goes against our design principle:

let foo = f x
    ^^^ variable name too ugly

The highlighting assumes that the visual location of foo is at the column 4 (if you count from 0), an assumption that is forbidden to make in asai. The following will explain why a flawless support of Unicode necessarily leads to tossing out the concept of column numbers completely.

Note: "Unicode characters" are not really defined in the Unicode standard, and here they mean Unicode scalar values, that is, all Unicode code points except the surrogate code points (special code points for UTF-16 to represent all scalar values). Although the word "character" has many incompatible meanings and usages, we decided to call scalar values "Unicode characters" anyway because (1) most people are not familiar with the official term "scalar values" and (2) scalar values are the only context-independent unit one can work with in a programming language.

No Column Numbers (but Still with Highlighting)

The arrival of non-ASCII Unicode characters imposes new challenges as their visual widths are unpredictable without knowing the exact terminal (or terminal emulator), the exact font, etc. Unicode emoji sequences might be one of the most challenging cases: a pirate flag (🏴‍☠️) may be shown as a single emoji flag on supported platforms but as a sequence with a black flag (🏴) and a skull (☠️) on other platforms. This means the visual width of the pirate flag is unpredictable. (See Unicode Emoji Section 2.2.) The rainbow flag (🏳️‍🌈), skin tones, and many other emoji sequences have the same issue. Other less chaotic but still challenging cases include characters whose East Asian width is "Ambiguous". (See UAX #11 for more information about East Asian width.) These challenges bear some similarity with the unpredictability of the visual width of horizontal tabulations, but in a much wilder way.

Due to the unpredictability of visual widths, it is wise to think twice before using emoji sequences or other tricky characters in Unicode art. However, it is difficult to make visually pleasing Unicode art without any assumption. To quantify the degree to which a Unicode art can remain visually pleasing on different platforms, we specify the following four levels of display stability. The levels go from 0 (the most unstable) to 3 (the most stable), where Level 0 (the most unstable) makes the most assumptions, and Level 3 (the most stable) makes almost none. Note that if an implementer decide to integrate content from the end user into their Unicode art, the end user should have the freedom to include arbitrary emoji sequences and tricky characters in their content. The final Unicode art must remain visually pleasing (under the assumptions allowed by the display stability levels) for any user content.

Level 0 (the most unstable): Stability under the assumption that every Unicode character occupies exactly the same visual width. This assumption is simply false on almost any platform. Thankfully, programs meeting only this level are mostly considered outdated.

Level 1: Stability under the assumption each Unicode string visually occupies a multiple of some fixed width, where the multiplier is determined by heuristics (such as various implementations of wcwidth and wcswidth). These heuristics are created to help programmers handle more characters, in particular CJK characters, without dramatically changing the code. They however do not solve the core problem (that is, visual width is fundamentally ill-defined) and they often could not handle tricky cases such as emoji sequences. Many compilers are at this level.

Level 2a: Stability under very limited assumptions on which characters should have the same widths. For example, if a Unicode art only assumes full-width (全形 or 全角) CJK characters are of the same visual width (which is the case in all conceivable situations), then its display stability is at this level. However, the phrase "very limited" is somewhat subjective, and thus we present a more precise version below.

Level 2b: Stability under only theses assumptions:
- All characters whose East Asian width is either "Halfwidth" or "Narrow" have the same visual width. Note that this class includes all ASCII printable characters.
- All characters whose East Asian width is either Fullwidth or "Wide" have the same visual width (as long as they are not used as part of an emoji sequence). Note that we do not assume the visual width of these characters is exactly double the width of the characters in the previous class.
- All box-drawing characters whose East Asian width is "Ambiguous" have the same visual width. Note that "Ambiguous" here means a character could have different visual widths on different platforms, but here we assume all such box-drawing characters have the same (wide or narrow) width on any platform.
- Equivalent (extended) grapheme clusters have the same visual width. Roughly speaking, this means the "same" strings will have the same visual width regardless of the context. Note that the Unicode standard allows applications to customize grapheme clusters, and for this assumption we use the default segmentation algorithm to determine (extended) grapheme clusters.
Level 2b is the explicit version of Level 2a; we might update the details of Level 2b later to better match our understanding of Level 2a. Collectively, Levels 2a and 2b are called "Level 2". This is where asai is.

Level 3 (the most stable): Stability under only one assumption that equivalent (extended) grapheme clusters have the same visual width (the last assumption of Level 2b). This means that the Unicode art will remain visually pleasing in almost all situations. It can even be rendered with a variable-width font. We believe this level is too restricted for Unicode art.

Unlike most implementations, which are only at Level 1, our terminal handler strives to achieve Level 2. That means we must not make any assumption about the visual width of the end user's code. The reason is that without (incorrect) strong assumptions about how Unicode characters are rendered, the visual column numbers are ill-defined. On the other hand, Level 3 seems to be too restricted for compiler diagnostics because we cannot show line numbers along with the end user's code. (We cannot assume the numbers "10" and "99" will have the same visual width at Level 3.)

Note: a fixed-width font with enough glyphs that covers many Unicode characters is often technically duospaced, not monospaced, because many CJK characters would occupy a double character visual width. Thus, we do not use the terminology "monospaced".

Caveat: No Support of Bidirectional Text Yet

Proper support of bidirectional text will benefit many potential end users, but unfortunately, we currently do not have the capacity to implement it. The general support of bidirectional text in most system libraries and tools is lacking, and without dedicated effort, it is hard to display bidirectional text properly. This is the area where our current implementation falls short.

On a related note, Unicode Source Code Handling suggests that source code should be segmented into atoms and their display order should remain the same throughout the document to maintain the lexical structure. Each atom should then be displayed via the usual Unicode Bidirectional Algorithm with a few exceptions. Our current implementation cannot follow this advice because it does not know the lexical structure of the end user content.

Raw Bytes as Positions

All positions should be byte-oriented. We believe other popular alternatives proposals are worse:

Unicode characters (Unicode scalar values): This is a technically well-defined choice. The main problem is that it may take linear time to count the number of characters from raw bytes without a clever data structure (unless we are using the UTF-32 encoding), and they often do not match what the end user perceives as "characters". That is, it takes time to count characters but characters still do not match the visual perception.
Code units used in UTF-16: This is also a technically well-defined choice. It is somewhat similar to Unicode characters, but with quirks from UTF-16: a Unicode scalar value above U+FFFF (such as 😎) will require two code units to form a surrogate pair. Therefore, it is arguably worse than just using Unicode characters. This scheme was unfortunately chosen by the Language Service Protocol (LSP) as the default unit, and until LSP version 3.17 was the only choice. The developers of the protocol made this decision probably because Visual Studio Code was written in JavaScript (and TypeScript), whose strings use UTF-16 encoding. It still takes linear time to count characters from other encodings (such as UTF-8), and the count still does not match the visual perception; even worse, UTF-16 usually takes more space than UTF-8 when ASCII is dominant.
(Extended) grapheme clusters or user-perceived characters. The notion of grapheme clusters can help segment a Unicode text for the end user to edit or select part of it in an "intuitive" way. It is not trivial to implement the segmentation algorithm (though there is OCaml library uuseg for the default algorithm). Moreover, the default rules can (and maybe should) be overriden for each application. The complexity and locale dependency of grapheme clusters make it an unreliable unit for specifying positions. It also takes at least linear time to count the number of grapheme clusters from raw bytes.
Visual column numbers, the visual width of a string in display. As analyzed above, this is the most ill-defined unit of all, and a heuristic that can give passable results in most cases still takes linear time.