Light lex repair - a feature that might have been

Neil Hodgson

2014-03-25 12:46:27 UTC

This is the story of a feature I designed for Scintilla that would have increased lexing performance and reduced redrawing but decided not to include.

When the document changes, Scintilla (in effect) resets the end of the validly lexed range to the start of the line changed. All of the the changed line and following lines will eventually have to be relexed and refolded. Years ago, I implemented a different model for the SinkWorld prototype: damage and repair. The range of the change was considered damaged and a lexer would be called to repair the lexical information for that range and it could also expand the range considered damaged if the change had later effects such as starting a stream comment. This feature proved too difficult to implement so I stopped working on it.

Almost all of the changes made to the document in most cases are from the user typing and deleting with the backspace and delete keys. These changes affect only a single character and mostly have little effect on the lexical state and may leave all of the characters remaining with the same style value. For typing, the style value for the newly typed character will have to be chosen although it is often the same as the previous character.

A new lexer method could be added to check to see if a change was trivially repairable, say
class RepairContext { enum {typing, deleting}; int character; int previousStyle; ... }
int Repair(RepairContext &context)
with the return value being the style of the new character for typing, 0 when deleting is trivial, or a special PERFORM_FULL_LEX value if the change can not be fixed. The method would not be called when a new line is typed as that may cause wider changes. For a language with "/*" stream comments and "#" line comments, this might be implemented like

int Repair(RepairContext &context) {
if (context.typing) {
switch (context.previousStyle) {
case STYLE_COMMENT_LINE:
return STYLE_COMMENT_LINE;
case STYLE_COMMENT_STREAM:
return (context.character == '/') ? PERFORM_FULL_LEX : STYLE_COMMENT_STREAM;
}
} else {
switch (context.previousStyle) {
case STYLE_COMMENT_LINE:
return (context.character == '#') ? PERFORM_FULL_LEX : STYLE_COMMENT_LINE;
case STYLE_COMMENT_STREAM:
return (context.character in "/*") ? PERFORM_FULL_LEX : STYLE_COMMENT_STREAM;
}
}
return PERFORM_FULL_LEX;
}

In many languages most lines contain whitespace, identifiers, numbers, and operators. Identifiers are matched against keyword lists, so any changes in or next to identifiers will need to check the surrounding text and determine if the change has produced or destroyed a keyword match. Whitespace and operators are mostly easier although they may also require keyword searches if they split or combine identifiers. This is just the start of the potential complexity in an implementation of Repair.

There are strong benefits to Repair as simple implementations like that above will perform very rapidly. If successful, it would avoid much of the cost of the full lexing/folding machinery. It would be reasonable to perform this synchronously in the keystroke handler instead of delaying to perform in the normal idle/redraw handler. The other major benefit is that it retains much more lexical, fold, and screen state which may be covering many lines when further lines of the document are being lexed in the background or even being drawn in the background as occurs on Cocoa. I'd estimate that as few as 5% of user typing/deletion keystrokes in C++ would require a full lex.

A problem with this technique are that there are now two places implementing the lexing logic with different structures and both these implementations would have to remain exactly synchronised. Whenever a change was made to the main lexer, then the Repair method would also need to be checked. Its likely this would lead to many faults, particularly for complex features.

As an example, the C++ folder has fold.cpp.explicit.* settings that allow, for example, to start a fold if the text "ant" is found in any styles and end the fold if "ze bra" is found in any styles. This feature would require significant logic inside Repair so its likely that Repair would return PERFORM_FULL_LEX if any of the fold.cpp.explicit.* settings are active.

Development and extension of more complex and dynamic features like fold.cpp.explicit.* would be inhibited by the need to modify Repair. Its likely that lexers will evolve towards these sorts of features. For example there is a good chance that the C++ lexer will eventually create symbol tables in order to style variables and expressions based upon variable scope.

Due to the likelihood of implementation faults, I won't be adding Repair to Scintilla at this time. It is, however, an approach that others may want to pursue particularly if they have a simple language and huge documents.

Neil

--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.