Input filtering

Neil Hodgson

2014-04-14 10:24:28 UTC

Over Scintilla’s history there have been requests to enable filtering of user input to Scintilla and for this to be easy to implement in one place without having to understand how to intercept keyboard, paste, drag-and-drop, and other events.

One example would be to disallow the entry of most control characters and the associated blobs in Scintilla. Some users are apparently confused when they type an unassigned control character and see something like [BEL]. Another situation is where a technical limitation or policy decision requires that the name “Löwis” be inserted as “Loewis” or “₩” be inserted as “\u20A9”. Characters that look very similar to other characters like Cyrillic ‘о’ could be inserted as “о” in a HTML file.

Some applications keep Scintilla in Unicode mode and translate between UTF-8 and the file encoding when loading and saving. With filtering they can ensure that any characters that can not be saved into the destination encoding are transformed into something that can be saved. Since this transformation occurs immediately, any mistake is more likely to be seen and fixed in context than if an encoding check is performed at save time.

A problem with implementing this has been that Scintilla itself inserts text with the expectation that the insertion will succeed and that code would have to change to allow for the filtering. There are over 40 call sites where text is inserted into the document. One approach would be for there to be two calls for inserting text, one that runs the filter and another that always inserts the exact requested text. I currently think that there may be useful transformations that the application could want in all circumstances and it is possible to rewrite the calling sites.

To implement this, there needs to be a notification (say, SC_MOD_INSERTCHECK) from Scintilla to the application that some text is being inserted to the document. Then there needs to be a call to change the text if wanted. Call that SCI_CHANGEINSERTION. An example implementation that changes control characters to octal escapes and changes spaces to a Unicode character with a small square box could look like this:

std::string sInsertion(notification->text, notification->length);
std::string sChanged;
for (unsigned char ch : sInsertion) {
if (ch == '\r' || ch == '\n' || ch == '\t') {
sChanged.push_back(ch);
} else if (ch == ' ') {
// Small white square
sChanged.append("\xe2\x96\xab");
} else if (ch < ' ') {
char szOctal[10];
sprintf(szOctal, "\\%03o", ch);
sChanged.append(szOctal);
} else {
sChanged.push_back(ch);
}
}
if (sChanged != sInsertion) {
wEditor.CallString(SCI_CHANGEINSERTION, sChanged.length(), sChanged.c_str());
}

When Scintilla is itself performing insertions, it is often for whitespace formatting using the space, tab, new line and carriage return characters and transforming these characters may confuse Scintilla or application code. However, it could conceivably be useful to insert a Unicode line end character like PS (paragraph separator) for new line or a non-breaking space for space.

This feature would be purely to remove or transform individual characters. It would not be suitable for generic translation of one string to another as the triggering string could be entered over multiple insert operations, possibly interleaved with other modifications. Text inserted for undo and redo would not trigger the notification.

Neil

--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.