Discussion:
Using BOM with UTF-8 in wxSTC
Paul K
2013-12-25 22:49:22 UTC
Permalink
Happy holidays to all the readers!

I have a question on BOM handling for files with UTF-8 encoding. I'm using
wxSTC and have an interesting issue with UTF-8 encoded files with BOM (0xEF,0xBB,0xBF).
The files are displayed with BOM characters (as one zero width character)
and when I add any character at position 0, it ends up being *before* the
BOM in the file. I also checked the latest Scite and it works correctly:
BOM is always at the beginning of the file.

Do you do any special processing for files with BOM (for example, remove
BOM after loading and add it back before saving the file) or is it expected
to work with scintilla-based components without any additional steps? I
believe the current version of wxSTC is based on v3.2.1. Thank you.

Paul.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
Matthew Brush
2013-12-26 05:21:25 UTC
Permalink
Post by Paul K
Happy holidays to all the readers!
I have a question on BOM handling for files with UTF-8 encoding. I'm using
wxSTC and have an interesting issue with UTF-8 encoded files with BOM (0xEF,0xBB,0xBF).
The files are displayed with BOM characters (as one zero width character)
and when I add any character at position 0, it ends up being *before* the
BOM is always at the beginning of the file.
Do you do any special processing for files with BOM (for example, remove
BOM after loading and add it back before saving the file) or is it expected
to work with scintilla-based components without any additional steps? I
believe the current version of wxSTC is based on v3.2.1. Thank you.
Geany removes it before putting the text into Scintilla:
https://github.com/geany/geany/blob/d80bc7ce56d8f69b1cfe933bc02c4d58dadc2a7e/src/encodings.c#L878

And puts it back before saving file:
https://github.com/geany/geany/blob/1f2279aefe165b874a42457f4a24af498f92dc27/src/document.c#L1737

FWIW.

Cheers,
Matthew Brush
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
Paul K
2013-12-26 05:41:27 UTC
Permalink
https://github.com/geany/geany/blob/d80bc7ce56d8f69b1cfe933bc02c4d58dadc2a7e/src/encodings.c#L878


That's what I suspected. Thank you for the details on Geany processing.

Paul.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
Loading...