Discussion:
Annotation encoding in C#
Misha Konvisar
2013-08-15 20:40:21 UTC
Permalink
Hello everyone,

I'm having a minor issue.
Writing a plugin for Notepad++. This plugin is adding annotations to lines
in document.
The problem is that I'm not able to display a Cyrillic text in annotation
when I have my document in UTF encoding.
When I switch document to ANSI, then I'm able to see a Cyrillic annotation
text, but not able to see a Cyrillic text in document.
I see just strange unreadable characters in annotations box.

That is how I'm adding annotation box:
public static void AddCommentToLine(int position, string text)
{
Encoding unicode = Encoding.Unicode;


Encoding encoder = Encoding.UTF8;//.GetEncoding(1253); here I've tried
different outuput encodings, but no result...

string strEncoded = encoder.GetString(Encoding.Convert(unicode,
encoder, unicode.GetBytes(text)));
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position,
strEncoded);
}

I had similar problem, when was trying to read line from document, but this
was solved with encoding scintilla output to UTF8, after that my C# code
was able to work with scintilla text correctly.

Has anybody face this problem?
I think my problem is transferring Unicode string to scintilla editor.
Should I use some styles for annotations?

Thank in advance!
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
Neil Hodgson
2013-08-16 00:01:42 UTC
Permalink
Writing a plugin for Notepad++. This plugin is adding annotations to lines in document.
The problem is that I'm not able to display a Cyrillic text in annotation when I have my document in UTF encoding.
The encoding used for annotations is the same as for the document. For a document in UTF-8, the annotations should also be UTF-8.
Should I use some styles for annotations?
Yes, you should set the styles for annotations. First try the same style settings as the text being annotated.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
Misha Konvisar
2013-08-16 08:33:50 UTC
Permalink
Hi Neil,

thank you for help.

I'm trying to set annotation style in this way.

//before adding annotation to line, I'm reading style at position (guess
here should be position of a character position, not line)
int style = (int)Win32.SendMessage(curScintilla, SciMsg.SCI_GETSTYLEAT,
position, 0);

//add annotation to line
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position,
text);

//apply saved style to annotation
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETSTYLE, position,
style);

But still have my annotation unreadable.

When I switch Npp encoding to ANSI, annotations are displayed correctly,
but Cyrillic text in document is unreadable.
[image: ÷ÓÔÒÏÅÎÎÏÅ ÉÚÏÂÒÁÖÅÎÉÅ 1]

When Npp encoding is switched to UTF8, situation is reverted.
[image: ÷ÓÔÒÏÅÎÎÏÅ ÉÚÏÂÒÁÖÅÎÉÅ 2]
From Scintilla Documentation, I couldn't find any style message, that is
able to set annotation encoding.
Could you please clarify how to get "First try the same style settings as
the text being annotated." is a "SCI_GETSTYLEAT" message a good way?


Thank you.
Writing a plugin for Notepad++. This plugin is adding annotations to lines in document.
The problem is that I'm not able to display a Cyrillic text in annotation
when I have my document in UTF encoding.
The encoding used for annotations is the same as for the document. For
a document in UTF-8, the annotations should also be UTF-8.
Here's an image of a UTF-8 file in SciTE showing Unicode annotations by
Should I use some styles for annotations?
Yes, you should set the styles for annotations. First try the same
style settings as the text being annotated.
Neil
--
You received this message because you are subscribed to the Google Groups
"scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
zebrox
2013-08-16 13:00:16 UTC
Permalink
Still no success.
I've set defined AnnotationStyleId == 3 and set character set for it to
Cyrilic

Win32.SendMessage(curScintilla, SciMsg.SCI_STYLESETCHARACTERSET,
AnnotationStyleId, (int)SciMsg.SC_CHARSET_CYRILLIC);

Then, in code adding annotation I'm doing conversion to UTF8, adding
annotation, applying AnnotationStyleId style to it.
But still get those strange characters...

public static void AddCommentToLine(int position, string text)
{
Encoding encSrc = Encoding.Unicode;
Encoding encDest = Encoding.UTF8;
string strEncoded = encDest.GetString(Encoding.Convert(encSrc, encDest,
encSrc.GetBytes(text)));

Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position,
strEncoded);
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETSTYLE,
position, AnnotationStyleId);
}

Now I'm a bit stuck...
Post by Misha Konvisar
Hello everyone,
I'm having a minor issue.
Writing a plugin for Notepad++. This plugin is adding annotations to lines
in document.
The problem is that I'm not able to display a Cyrillic text in annotation
when I have my document in UTF encoding.
When I switch document to ANSI, then I'm able to see a Cyrillic annotation
text, but not able to see a Cyrillic text in document.
I see just strange unreadable characters in annotations box.
public static void AddCommentToLine(int position, string text)
{
Encoding unicode = Encoding.Unicode;
Encoding encoder = Encoding.UTF8;//.GetEncoding(1253); here I've tried
different outuput encodings, but no result...
string strEncoded = encoder.GetString(Encoding.Convert(unicode,
encoder, unicode.GetBytes(text)));
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT,
position, strEncoded);
}
I had similar problem, when was trying to read line from document, but
this was solved with encoding scintilla output to UTF8, after that my C#
code was able to work with scintilla text correctly.
Has anybody face this problem?
I think my problem is transferring Unicode string to scintilla editor.
Should I use some styles for annotations?
Thank in advance!
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
Neil Hodgson
2013-08-16 13:21:27 UTC
Permalink
Post by zebrox
Encoding encSrc = Encoding.Unicode;
Encoding encDest = Encoding.UTF8;
string strEncoded = encDest.GetString(Encoding.Convert(encSrc, encDest, encSrc.GetBytes(text)));
That looks confusing. Dump the bytes before and after conversion along with what you think the text should be.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
Misha Konvisar
2013-08-16 13:36:13 UTC
Permalink
Hi Neil,

thanks for answer, already did it, but dont know how to treat the results...

so code is following:
Encoding encSrc = Encoding.Unicode;
Encoding encDest = Encoding.UTF8;

text = "ÙÙÙÙ";
ShowBytes("before", encSrc, text);
string strEncoded = encDest.GetString(Encoding.Convert(encSrc, encDest,
encSrc.GetBytes(text)));
ShowBytes("after", encDest, strEncoded);

and ShowByts outputs are:
[image: ÷ÓÔÒÏÅÎÎÏÅ ÉÚÏÂÒÁÖÅÎÉÅ 1][image: ÷ÓÔÒÏÅÎÎÏÅ ÉÚÏÂÒÁÖÅÎÉÅ 2]
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
zebrox
2013-08-16 14:11:20 UTC
Permalink
public static void AddCommentToLine(int position, string text)
{
//Error 1, direction of conversion
Encoding encSrc = Encoding.UTF8;
Encoding encDest = Encoding.Unicode;

//Error 2, wrong procedure
string strEncoded = encDest.GetString(encSrc.GetBytes(text));

//Error 3, some tricks of passing managed strings to unmanaged code

//http://stackoverflow.com/questions/11090427/make-intptr-in-c-net-point-to-string-value
IntPtr strPtr = Marshal.StringToHGlobalUni(strEncoded);
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position,
strPtr);
Marshal.FreeHGlobal(strPtr);
}
So finally I got my annotations in Russian!

Thanks for help, Neil.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
Dave Brotherstone
2013-08-16 15:20:47 UTC
Permalink
Post by Misha Konvisar
public static void AddCommentToLine(int position, string text)
{
//Error 1, direction of conversion
Encoding encSrc = Encoding.UTF8;
Encoding encDest = Encoding.Unicode;
//Error 2, wrong procedure
string strEncoded = encDest.GetString(encSrc.GetBytes(text));
//Error 3, some tricks of passing managed strings to unmanaged code
//
http://stackoverflow.com/questions/11090427/make-intptr-in-c-net-point-to-string-value
IntPtr strPtr = Marshal.StringToHGlobalUni(strEncoded);
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT,
position, strPtr);
Marshal.FreeHGlobal(strPtr);
}
You shouldn't actually need to marshal the string when passing a string in
UTF8 to Scintilla (only when you want Scintilla to fill a buffer for you,
and even then a stringbuffer with a reserved capacity is normally
marshalled correctly automatically). The problem was your encoding and
subsequent decoding of the string. Strings in C# are UTF-16 (so,
Encoding.Unicode), always. Whenever you have a string object, it's always
(internally) encoded in UTF-16, there's no such thing as a C# "string"
object that has a different encoding. So when you do encSrc.GetBytes(text),
you're getting the UTF-8 bytes of the string. When you then pass that to
encDest.GetString( ...), you're passing in UTF-8 byte sequence and asking
it to treat it as UTF-16, which it then converts to a string object. This
obviously comes out as garbage. What you want to do is *just* convert it
to UTF-8, then pass *those* bytes to Scintilla.

I don't know the signature of the Win32.SendMessage method, but I expect
the following would do what you're after.

byte[] utf8Text = Encoding.UTF8.GetBytes(text);
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position,
utf8Text);

Depending on the signature, you might need to cast it to something.

Hope that helps,
Dave.

PS If you've not seen it before,
http://www.joelonsoftware.com/articles/Unicode.html is a great article on
how all this unicode/utf8/utf16 stuff fits together.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
Misha Konvisar
2013-08-16 22:44:35 UTC
Permalink
Hi Dave,

thanks for comment and interesting link
You suggestion is working. No need to make strange conversions.

But two things I still had to do.
1. Add terminating zero to original string, as scintilla was displaying
random characters at the end of annotation.
2. As I don't have overloaded method Win32.SendMessage accepting byte[] as
fourth parameter, I had to obtain IntPtr pointer to UTF8 byte[] array.

So, final code looks like this:
public static void AddCommentToLine(int position, string text)
{
//add teminating zero to string
text += char.MinValue;
byte[] utf8Bytes = Encoding.UTF8.GetBytes(text);

//
http://stackoverflow.com/questions/537573/how-to-get-intptr-from-byte-in-c-sharp
IntPtr unmanagedPointer = Marshal.AllocHGlobal(utf8Bytes .Length);
Marshal.Copy(utf8Bytes , 0, unmanagedPointer, utf8Bytes .Length);
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position,
unmanagedPointer);
Marshal.FreeHGlobal(unmanagedPointer);
}
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
Continue reading on narkive:
Loading...