Discussion:
a patch for korean IME on the spot on gtk2.0
(too old to reply)
johnsonj
2014-07-05 08:16:15 UTC
Permalink
hi, korean hangul users.
I am really surprised it works same as on win32.
This is a patch for korean IME on the spot on gtk2.0.
Try it.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-06 01:53:34 UTC
Permalink
Post by johnsonj
This is a patch for korean IME on the spot on gtk2.0.
The exception handling "try { ..." should be retained around the whole methods since Scintilla is C++ so may throw but GTK+ is C so will not handle exceptions.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-20 02:32:19 UTC
Permalink
A patch for korean IME on the spot on gtk2.0 .
With gtk3.0 not installed, I did not test this with gtk3.0.
I think gtk3.0 also could share this same codes.

It shapes and works as almost same as on win32.

How easy implemented!
How beautiful it is!
thank you very much for TentativeUndo().
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-21 13:14:43 UTC
Permalink
Post by johnsonj
A patch for korean IME on the spot on gtk2.0 .
The Windows version looks at the *input* code page so activates for files in either Code Page 949 and UTF-8. This version looks at the code page information in Scintilla and compares that to 949/1361 so doesn't activate for UTF-8 files. Possibly GTK+ has an API to ask which IME language mode is running.

I'm also seeing memory bugs but they may have been present before this change.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-21 14:22:02 UTC
Permalink
I have not found the API yet.
I found a similiar solution.
This has been used for my first patch.
===================================
char inputLang[3] = "\0\0";
char *locale = getenv("LANG");
strncpy(inputLang, locale, 2);
bool koreanIME = (strcmp(inputLang, "ko") == 0);
====================================
charset depends on locale.
If locale changed, charset also should change.
so I decided to use charset instead of locale.
What do think about?

I don't understand "so doesn’t activate for UTF-8 file"
It works good in utf8 and cp949.
It activates autocomplete and calltip on the spot for utf-8 files.
Document and calltip dictionay are all in utf-8.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-22 07:30:19 UTC
Permalink
Post by johnsonj
char *locale = getenv("LANG");
...
Post by johnsonj
If locale changed, charset also should change.
so I decided to use charset instead of locale.
What do think about?
Neither work well. My Linux $LANG is typically en_AU.UTF-8 and its difficult to update this temporarily for an application run from the system menu. I'll generally use UTF-8 files in preference to DBCS as more tools understand UTF-8 and I can use a variety of languages.

Code page is the more important setting for determining encoding in Scintilla. Character set should be subordinate to code page and was originally there to act as a hint when asking the platform to initialise a font. There may be some bugs in the GTK+ platform code on this subject.
Post by johnsonj
I don't understand "so doesn't activate for UTF-8 file"
My copy of SciTE will have character.set=0, so CharacterSetID() returns "ISO-8859-1", but when the file is in UTF-8 so I should still be able to enter Korean characters with the IME. You have probably set character.set=129.

On Windows, the input code page changes when you switch the current language from English to Korean to Russian. This lets me, with a computer set to Australian English, work in English, Russian, and Korean in one session - or it would if I actually understood Korean.

With the patch files, there is a problem with the scope of the localeval variable. This char array is local to the "if (conv) {" block but it is assigned to the hanval variable which is used in an outer scope after localeval is out-of-scope so potentially dead and overwritten. Move the declaration of localeval next to the declaration of hanval.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-21 14:37:33 UTC
Permalink
Memory leaks

Sorry I forgot two lines in CommitThis()
+ g_free(utfval);
+ pango_attr_list_unref(attrs);
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-22 01:31:14 UTC
Permalink
I made lots of mistake.
"make haste, make mistake"

Let me put off detecting keyboard language(KoreanIME()).

a patch attached
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-22 23:01:02 UTC
Permalink
I totally understand what you said.
I have installed ibus for multi language test.
(it was a hard work to make ibus work.)
You know multi language works only in utf-8.

I think using locale instead of charset is correct.

It works same as on win32.

I rewrite a patch.
I write a test file.
Please have a look.
thanks
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-22 23:04:57 UTC
Permalink
You already know but I inform you of

input method : ibus
code.page : 65001
character.set : 129

it also works well in cp949
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-22 23:23:47 UTC
Permalink
It only works in ko_kr locale.
That is the question.

Let me sleep on correct KoreanIME();
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-23 00:53:16 UTC
Permalink
Post by johnsonj
I have installed ibus for multi language test.
(it was a hard work to make ibus work.)
Yes, I'm never sure of all the steps: download Korean fonts, install ibus, add Hangul mode to ibus, start the ibus widget, maybe logout to make it work.
Post by johnsonj
It only works in ko_kr locale.
That is the question.
I will accept using $LANG matching "ko*", since this will be an advance for people that set up their machines for Korean. Its just that it would be better for multi-lingual use and testing for it to be based on the chosen language in the ibus widget.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-24 08:06:25 UTC
Permalink
I found out this code on the web.
I think XLocaleOfIM() may work.
I am at a loss how to find the way to get to gtk.
Is there any way?

==========================================
/*
XLocaleOfIM()
Returns the locale associated with a given IM.
The returned string is owned by Xlib
and should not be freed by the client.
It will be freed by Xlib when the IM is closed.
*/

// to compile : gcc -Wall main.c -o main -lX11

#include <X11/Xlib.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
int main() {
Display *dpy = XOpenDisplay(NULL);
if (!dpy) {
printf("fail to open display\n");
exit(1);
}
if (XSupportsLocale()) {
printf("Xlib supports current locale\n");
}
char *p = XSetLocaleModifiers("");
printf("current locale modifier: %s\n", p);
XIM im = XOpenIM(dpy, NULL, NULL, NULL);
if (!im) {
printf("fail to open im\n");
exit(2);
}
printf("locale of IM: %s\n", XLocaleOfIM(im));
return 0;
}
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-25 00:15:18 UTC
Permalink
Post by johnsonj
I found out this code on the web.
I think XLocaleOfIM() may work.
I am at a loss how to find the way to get to gtk.
Is there any way?
It can be difficult to bridge from GTK+ to X and it will not always be running on top of X. Even on Linux, X will be replaced with Wayland.

I haven't found any API that reveals the current IME or which language it is handling. So it will have to be $LANG.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-25 14:44:24 UTC
Permalink
I have made another implentation.
I think it works well.
But it needs more trying.

May I ask you one question?
Here follows code.
Under commitThis();

+ if (IsUnicodeMode()) {
+ AddCharUTF(utfVal, strlen(utfVal));
+ } else {
+ const char *source = CharacterSetID();
+ if (*source) {
+ Converter conv(source, "UTF-8", true);
+ if (conv) {
+ char localeVal[4] = "\0\0\0";
+ char *pin = utfVal;
+ size_t inLeft = strlen(utfVal);
+ char *pout = localeVal;
+ size_t outLeft = sizeof(localeVal);
+ size_t conversions = conv.Convert(&pin, &inLeft, &pout, &outLeft);
+ if (conversions != ((size_t)(-1))) {
+ *pout = '\0';
+ AddCharUTF(localeVal, strlen(localeVal), true);
+ } else {
+ fprintf(stderr, "Conversion failed '%s'\n", utfVal);
+ }
+ }
+ }
+ }


Please take a look at AddCharUTF() under DBCS.
Originally it was AddChar(dbcsval[i]);
In rectangular input, to delete one char, undo needs twice.
One undo makes (vertical) chars visually broken.

Is there any reason to use AddChar(dbcsval[i]) instead AddCharUTF();
I fear my replacement might affect other IMEs.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-26 04:42:58 UTC
Permalink
Post by johnsonj
May I ask you one question?
Here follows code.
Under commitThis();
...
Please take a look at AddCharUTF() under DBCS.
Originally it was AddChar(dbcsval[i]);
In rectangular input, to delete one char, undo needs twice.
One undo makes (vertical) chars visually broken.
Is there any reason to use AddChar(dbcsval[i]) instead AddCharUTF();
AddChar was the original method and then AddCharUTF was added for Unicode.
Post by johnsonj
I fear my replacement might affect other IMEs.
That AddChar code is old, first appearing in 2003. I think we were trying to support EUC encodings which could produce 2-4 bytes for a character and these byte sequences should not be treated as UTF-8.

I don't think EUC ever really worked correctly in Scintilla and only DBCS encodings that have 1 or 2 byte characters are handled.

It should be safe to change the code to call AddCharUTF. It is possible that client code is expecting to receive an SCN_CHARADDED notification for each byte with DBCS. However, that would be incompatible with Windows code that uses AddCharUTF with the treatAsDBCS argument true.

http://en.wikipedia.org/wiki/Extended_Unix_Code

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-26 06:52:16 UTC
Permalink
thank you for the tip!
I made a KoreanIME() for myself.
It works good for me.
I wonder this patch can work well for you too.

I am waiting your instructions.
A patch attached.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-27 00:40:50 UTC
Permalink
Post by johnsonj
I made a KoreanIME() for myself.
It works good for me.
I wonder this patch can work well for you too.
The patch worked for me.

I can see some small issues.

AddCharUTF is called with last argument true, even when the text isn't DBCS. This argument can probably be !UnicodeMode() here. The subsequent call to NotifyChar is only looking at 2 bytes which is correct for DBCS but not UTF-8.

There are some variables with "Exteneded" in their names which should be "Extended" (3 'e's, not 4).

Scintilla provides a UnicodeFromUTF8 function (in UnicodeFromUTF8.h) which will convert a valid UTF-8 string into a Unicode character value and this should be used in new code instead of writing out the calculation like line 47 of the patch. So,
int unicode = UnicodeFromUTF8(reinterpret_cast<unsigned char *>(utfVal));

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-27 01:51:22 UTC
Permalink
Thank you for your kind instructions.
I changed my patch according to your saying.
corected exteneded to extended -> sorry for my carelessness.
replaced with UnicodeFromUTF8 -> include UnicodeFromUTF8.h

Now let me talk about a very important issue!
My notify() has been added through a lots of trial and error.
It works practically good in both DBCS and UTF8 on both WIN32 and GTK
within KoreanIME.
I think "it is very mysterious why it works but working that's enough".
I bet it works good in practical.

I confess I do not know why theoritically.
If you arrange it for me, I would appreciate that!
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-27 07:53:20 UTC
Permalink
Now I remember why I choosed TreatAsDBCS.

comparison in autocomplet seems to use the char as it is.
"unicode" in Editor::AddCharUTF wokrs incompletely.
Only TreatAsDBCS works.

In utf8, it uses the first two byte of one three-byte char.
It happens the two byte correspons to hangul chars.
since Composing IME only cares about hangul chars, it has no problem

If I take other IMEs into consideration, I think this is the right thing.

(utf8) Editor::NotifyChar( (static_cast<unsigned char>(hanval[0])
<< 16)|
(static_cast<unsigned char>(hanval[1]) << 8) |
static_cast<unsigned char>(hanval[2]));

(dbcs) Editor::NotifyChar((static_cast<unsigned char>(hanval[0]) <<
8) |
static_cast<unsigned char>(hanval[1]));

I am not sure scintilla's internals.
I am waiting for your instruction!!!
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-28 00:16:43 UTC
Permalink
Post by johnsonj
Now I remember why I choosed TreatAsDBCS.
comparison in autocomplet seems to use the char as it is.
The autocompletion code may be faulty for non-ASCII characters.
Post by johnsonj
"unicode" in Editor::AddCharUTF wokrs incompletely.
Only TreatAsDBCS works.
In utf8, it uses the first two byte of one three-byte char.
It happens the two byte correspons to hangul chars.
since Composing IME only cares about hangul chars, it has no problem
It looks the other way around to me. If treatAsDBCS is true then only the first 2 bytes are used. If treatAsDBCS is false the it is UTF-8 and has to follow the UTF-8 rules for byte values; so 1, 2, or 3 bytes are combined together to make a Unicode code-point which is then sent to to the application through NotifyChar.
Post by johnsonj
If I take other IMEs into consideration, I think this is the right thing.
(utf8) Editor::NotifyChar( (static_cast<unsigned char>(hanval[0]) << 16)|
(static_cast<unsigned char>(hanval[1]) << 8) | static_cast<unsigned char>(hanval[2]));
The best thing would be to call UnicodeFromUTF8 to send the Unicode code point.
Post by johnsonj
(dbcs) Editor::NotifyChar((static_cast<unsigned char>(hanval[0]) << 8) |
static_cast<unsigned char>(hanval[1]));
For DBCS, there doesn't appear to be an easily determined code point value so the bytes are just jammed together. An application can then take that apart and recognise particular characters. If there was a simple standard for producing a code point value for DBCS, this would have been a better solution.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-27 15:18:10 UTC
Permalink
this a patch according your instruction.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-28 02:09:58 UTC
Permalink
Please take a look at above patch.
It works well.

let me put it simple.
I would like to talk only about utf8 mode in KoreanIME.
(do not care about AddCharUTF(), do not care about DBCS)

In HandleCompositionKoreanIME()

this never works well.
(utf8) int unicode = UnicodeFromUTF8(reinterpret_cast<unsigned char
*>(utfVal));
Editor::NotifyChar(unicode);

this works well.
(utf8) Editor::NotifyChar( (static_cast<unsigned char>(hanval[0])
<< 16)|
(static_cast<unsigned char>(hanval[1]) << 8) |
static_cast<unsigned char>(hanval[2]));
or this works well.
(utf8) Editor::NotifyChar((static_cast<unsigned char>(hanval[0]) <<
8) |
static_cast<unsigned char>(hanval[1]));

I think if DBCS use raw points, the UTF8 must use raw points.
Does Scintilla use unicode for autocomplete comparison internally?
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-28 02:15:49 UTC
Permalink
Post by johnsonj
let me put it simple.
I would like to talk only about utf8 mode in KoreanIME.
(do not care about AddCharUTF(), do not care about DBCS)
In HandleCompositionKoreanIME()
this never works well.
(utf8) int unicode = UnicodeFromUTF8(reinterpret_cast<unsigned char *>(utfVal));
Editor::NotifyChar(unicode);
NotifyChar goes to the application. Which application are you using? What exactly do you mean by "never works well". What isn't working?
Post by johnsonj
I think if DBCS use raw points, the UTF8 must use raw points.
Does Scintilla use unicode for autocomplete comparison internally?
Scintilla does not receive NotifyChar as it is there for the application. Autocomplete does not see NotifyChar.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-28 02:23:15 UTC
Permalink
Of cousre, SciTE

I have been touching AddCharUTF()
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-28 03:40:13 UTC
Permalink
Post by johnsonj
Of cousre, SciTE
I have been touching AddCharUTF()
SciTE doesn't really handle characters greater than 255 since it passes the notification's sc.ch as a char to SciTEBase::CharAdded, so the character is being truncated and may work or not randomly.
Post by johnsonj
or this works well.
(utf8) Editor::NotifyChar((static_cast<unsigned char>(hanval[0]) << 8) |
That is probably going into SciTEBase::CharAdded as 0 since the low byte is 0. As 0 it is likely to be ignored. Perhaps SciTEBase::CharAdded shouldn't be called for characters > 255 or > 127.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-28 02:44:21 UTC
Permalink
This is the promblem
I wonder why this works.
It should not work theoritically.
-----------------------------------------------------------------------------------------------------------------------------------------
this works well.
(utf8) Editor::NotifyChar( (static_cast<unsigned char>(hanval[0])
<< 16)|
(static_cast<unsigned char>(hanval[1]) << 8) |
static_cast<unsigned char>(hanval[2]));
or this works well.
(utf8) Editor::NotifyChar((static_cast<unsigned char>(hanval[0]) <<
8) |
-----------------------------------------------------------------------------------------------------------------------------------------
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-28 05:02:02 UTC
Permalink
" SciTE doesn’t really handle characters greater than 255 since it passes
the notification’s sc.ch as a char to SciTEBase::CharAdded, so the
character is being truncated and may work or not randomly. "

In Editor::AddCharUTF():
Yes it may work or not randomly in raw points.
it never works in unicode points, but first preedit char.

but in Korean IME:
it works welll in raw points, never unicode points.
even NotifyChar(static_cast<unsigned char>(hanval[0])); works well.

I see SciTEBase::CharAdded use char.
If I know this fact, I would not try to make autocompletion work in
KoreanIME.
char can not hold korean hangul chars which execeeds 10,000 over.
I wonder why it works well in raw points in koreanIME.
Now I am at a loss.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-29 00:01:06 UTC
Permalink
Post by johnsonj
I see SciTEBase::CharAdded use char.
If I know this fact, I would not try to make autocompletion work in KoreanIME.
char can not hold korean hangul chars which execeeds 10,000 over.
I wonder why it works well in raw points in koreanIME.
Now I am at a loss.
CharAdded is normally looking for ASCII characters like '(', ')', or '.' to trigger call tips and autocompletions. Its unlikely it should be interpreting any Korean characters but there may be false matches when a byte from a Korean character is interpreted as ASCII.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-29 00:41:23 UTC
Permalink
Now I know as you said
even NotifyChar(static_cast<unsigned char>(hanval[0])); works well.
It works as triggering autocompletion not as comparing.

What should I do?
I want Korean IME on the spot to be committed.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-07-29 00:52:34 UTC
Permalink
Post by johnsonj
Now I know as you said
even NotifyChar(static_cast<unsigned char>(hanval[0])); works well.
It works as triggering autocompletion not as comparing.
What should I do?
I want Korean IME on the spot to be committed.
The most recent full patch you sent on July 28 (koreanIMEForGTK.patch) appears OK except for extra calls to Editor::NotifyChar which I want to remove.

I'd also prefer to not define the PreEditStart/End functions since they currently do nothing. If they are ever needed then they can be added.

If you think this is a good set of changes it can be committed.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-29 01:38:07 UTC
Permalink
This is the patch following your instruction.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-07-31 01:30:19 UTC
Permalink
While I test multilingual IME.
I fixed inOverstrike behavior correct.
I changed localeVal[4] to localeVal[200].

For other IMEs localeVal[4] causes fatal error.

Thank you for your hard work.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-02 02:05:19 UTC
Permalink
Post by johnsonj
While I test multilingual IME.
I fixed inOverstrike behavior correct.
I changed localeVal[4] to localeVal[200].
OK. Are you really sure that the Korean IME will never overflow the one remaining localeVal[4]?

AddCharUTF was recently changed to take a const char * first argument so it is no longer necessary to cast "" to (char *). Its better to keep other values that may point to constants (like hanval) as const char *. This prevents accidentally modifying values.

There are some calls added to ShowCaretAtCurrentPosition and I can't see why these are needed.

In PreeditChangedThis, the call to gtk_im_context_get_preedit_string, its arguments and the calls to free the arguments are duplicated for the Korean and non-Korean branches. This method could be reduced by moving this code outside the Korean-or-not branch.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-02 13:52:50 UTC
Permalink
""'OK. Are you really sure that the Korean IME will never overflow the
one remaining localeVal[4]? "'

It is my caresleness. I forgot changing it. localeVal[4] is enough on
win32 but On linux it has to be localeVal[5] at least when it comes to
korean IME.
I am studying how to input japanese. I reallzed the meaning of
maxLenInputIME in scintillaWin.
It is better for me to respect the authors intention. yes I repect
so maxLenInputIME= 200;
utfval = maxLenInputIME * 3;
dbcsval= maxLenInputIME * 2;
GlobaIME on the spot is nearing. be ready for IME on the spot.
What do you think about? I will follow your instructions.

""" AddCharUTF was recently changed to take a const char * first argument
so it is no longer necessary to cast “” to (char *). Its better to keep
other values that may point to constants (like hanval) as const char *.
This prevents accidentally modifying values. """

Thank you for the tip. I will fix it.

""" There are some calls added to ShowCaretAtCurrentPosition and I can’t
see why these are needed. ""'

In my old patch it shows carets regularly. but now carets constantly
flickers.
I will delete it.


""" In PreeditChangedThis, the call to gtk_im_context_get_preedit_
string, its arguments and the calls to free the arguments are duplicated
for the Korean and non-Korean branches. This method could be reduced by
moving this code outside the Korean-or-not branch. ""'
I think it is better original code remains untouched if possible.

I will rewrite the patch according your instructions.

I hope koreanIME can become globalIME on the spot sooner or later.
Thank you very much
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-03 08:24:46 UTC
Permalink
I am studying how to input japanese. I reallzed the meaning of maxLenInputIME in scintillaWin.
It is better for me to respect the authors intention. yes I repect
so maxLenInputIME= 200;
utfval = maxLenInputIME * 3;
On GTK+ utfval comes from the system so should be the basis of other strings. You might want to allocate other strings dynamically so that the results always fit the utfval.
dbcsval= maxLenInputIME * 2;
FOR CJK languages, DBCS will normally take less space than UTF-8 since CJK characters mostly take 2 bytes in DBCS and 3 bytes in UTF-8. Making the DBCS string the same length as the UTF-8 string should be safe.
GlobaIME on the spot is nearing. be ready for IME on the spot.
What do you think about? I will follow your instructions.
The gtk_im_context_get_preedit_string returns a PangoAttrList which contains drawing instructions like displaying particular underlines. The block caret appearance may not be sufficient to represent this.

I don't know how to make this appear on GTK+ but on Windows using the Japanese IME the underlines start as wavy then become straight when space is pressed and it displays candidate lists. The active part of the preedit string (where the selection from the candidate list will go) is thick and the inactive parts thin. OS X is similar, starting with a thick continuous underline then a non-continuous underline with thick for the active part.

If the on-the-spot IME doesn't provide these visuals there could be a setting to allow client code to choose between on-the-spot and original appearance.
""" In PreeditChangedThis, the call to gtk_im_context_get_preedit_
string, its arguments and the calls to free the arguments are duplicated for the Korean and non-Korean branches. This method could be reduced by moving this code outside the Korean-or-not branch. ""'
I think it is better original code remains untouched if possible.
OK. The original code should be indented from its current position to make the indentation consistent.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-03 01:14:42 UTC
Permalink
Here is the patch attached.
I like it. I hope you like it too.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-03 01:16:30 UTC
Permalink
Here is the patch attached.
I like it. I hope you like it too.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-03 09:17:50 UTC
Permalink
maxLenInputIME is for human.
it means a user can type in to maxLenInputIME characters.

if he types in over maxLenInputIME characters.
I have experienced it may cause Scite to be exit abrutply.

I think we have to check utfval length before allocation.
---- for example
if (IsUnicodeMode()){
if (strlen(utfval) > maxLenInputIME*3) {
//process error
}
else {
if (strlen(utfval) > maxLenInputIME*2)
//process error.
}
}
------
I think it is better to follow semantics -- do not type in over 200
characters.
Although not now, Korean IME also may have got to follow this rule for the
future.
Are you still sure "Making the DBCS string the same length as the UTF-8
string should be safe."?
If so, I will follow you.


I am struggling with the problems of "IME on the spot" you point.
japanese IME is a really goblin to me.
So I implemented korean IME first.
""""If the on-the-spot IME doesn’t provide these visuals there could be a
setting to allow client code to choose between on-the-spot and original
appearance."""
I am with you.
let me open a new thread to discuss it sooner or later.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-03 14:35:44 UTC
Permalink
Sorry for my lots of carelessness.
Thank you for your kind instructions.
I have learned a lot of things from you.

my patch has been being enhanced thanks to you.
Here is my patch attached.
I added a code to check utf string length.
I cleaned source manually and used "diff -u" option only to keep text
formatting in consistency.

I am ready for accepting your comments with pleasure.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-03 23:37:14 UTC
Permalink
Post by johnsonj
my patch has been being enhanced thanks to you.
Here is my patch attached.
I added a code to check utf string length.
I can accept this version if you are happy with it being in the next release. There are now 3 separate definitions of maxLenInputIME and there should be only one. Constants normally go towards the top of the file, probably just before the enum that defines COMMAND_SIGNAL unless you can find a better place. Yes, ScintillaWin.cxx has 2 definitions of maxLenInputIME and it should be fixed too.

There are now 6 calls to gtk_im_context_get_preedit_string in ScintillaGTK.cxx and each allocates some resources. It would simplify the code and ensure the resources are freed to define a small wrapper:

class PreEditString {
public:
gchar *str;
gint cursor_pos;
PangoAttrList *attrs;

PreEditString(GtkIMContext *im_context) {
gtk_im_context_get_preedit_string(im_context, &str, &attrs, &cursor_pos);
}
~PreEditString() {
g_free(str);
pango_attr_list_unref(attrs);
}
};

This then replaces bare calls to gtk_im_context_get_preedit_string like so:

gboolean ScintillaGTK::DrawPreeditThis(GtkWidget *widget, cairo_t *cr) {
try {
PreEditString pes(im_context);
PangoLayout *layout = gtk_widget_create_pango_layout(PWidget(wText), pes.str);
pango_layout_set_attributes(layout, pes.attrs);

cairo_move_to(cr, 0, 0);
pango_cairo_show_layout(cr, layout);

g_object_unref(layout);
} catch (...) {
errorStatus = SC_STATUS_FAILURE;
}
return TRUE;
}

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-05 11:34:15 UTC
Permalink
The PreEditString class is now committed so that it can be used in future patches.

This is purely a code clean-up. There should be no behavioural difference with this change set.

https://sourceforge.net/p/scintilla/code/ci/dddb5e32707cb09be9255680d4ce7da1aaa81776/

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-04 01:03:15 UTC
Permalink
I have it applied.
lt looks better.
It wokrs well.

I'd like to ask a question of you.
Wil it be a problem to replace this code under commitThis() with
AddCharUTF();

for (int i = 0; localeVal[i]; i++) {
AddChar(localeVal[i]);
}

I am sure it is better to replace it.
If you allow it, commitThis() can be unified to half the size.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-04 02:38:01 UTC
Permalink
Wil it be a problem to replace this code under commitThis() with AddCharUTF();
for (int i = 0; localeVal[i]; i++) {
AddChar(localeVal[i]);
}
I am sure it is better to replace it.
I think that code is wrong in that it reports each byte of the DBCS string instead of each character.

What was probably wanted there was to report each of the characters. With the Japanese IME, you may have multiple characters in the composition string such as “エド”. When the composition is committed, the application should receive an SCN_CHARADDED for each character.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-04 02:44:43 UTC
Permalink
Here is my patch attached.
more enhanced! enhanced day by day.
I like it more.

I have it applied as your instructions.

I replaced addchar() with addCharUTF() under commitThis().
It works well.
But I want you to say somthing on this change.
I am afraid it may cause a problem.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-04 02:51:14 UTC
Permalink
"""When the composition is committed, the application should receive an
SCN_CHARADDED for each character."""

I see.
It proved addChar() is not right both in theory and in practical.
Maybe notifychar() under addCharUTF() has to be fixed for SCN_CHARADDED to
work.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-04 23:47:12 UTC
Permalink
"""When the composition is committed, the application should receive an SCN_CHARADDED for each character."""
I see.
It proved addChar() is not right both in theory and in practical.
Maybe notifychar() under addCharUTF() has to be fixed for SCN_CHARADDED to work.
The branch of AddCharUTF that is used almost exclusively in the current release is the Unicode path, that is, with treatAsDBCS false. It was your patches that added calls with treatAsDBCS true.

The only route through the current release code that passes treatAsDBCS true is in AddCharBytes and that only occurs with WM_IME_CHAR (which never appear to trigger). For WM_CHAR, there is a path to AddCharBytes but it passes '\0' as the first argument so misses treatAsDCS true. Its possible that WM_IME_CHAR triggered on previous editions of Windows or with languages I am not trying.

I'd be reasonably happy now with dropping the treatAsDBCS parameter and only using the treatAsDCS=false path.

BTW, to see some more complex IME composition text visualisations for Japanese on Windows, open the Text Services and Input Languages dialog (right click on the IME control and select Settings) select the "Microsoft IME" under Japanese, click Properties..., on the Editing Page choose VJE for the Color Template. A similar path reveals more options for Korean. The set of options may change when third-party IMEs are added.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-05 01:23:34 UTC
Permalink
""" I’d be reasonably happy now with dropping the treatAsDBCS parameter and
only using the treatAsDCS=false path. """
It is good for you set up one way to unicode.
I will walk the way. I will be with you.
I may be the first and the last who have walked the other way.

"""Its possible that WM_IME_CHAR triggered on previous editions of Windows
or with languages I am not trying. """
I do not know how WM_IME_CHAR works. but it appears to have something to
discuss.
As I know theoritically WM_IME_CHAR should exit with break, but scintilla's
does with return 0;.
I wonder why does not it cause no problem.

Options of japanese IME are too many to understand.
korean IME and chinese IME also have as many options.
I even as korean do not know all the options.

For autocompletion and inOverstrike:
I want to know how to addCharUTF() character by character not byte stream.
On win32 I have unicode. On gtk utf8 comes in.
unicode can treat characters. but How can I tell characters in utf8 string?
for example I want to send chars one by one like AddChar(dbcsval[i]);

for (i = 1; i < last; i++) {
addCharUTF(charval[i], charlen);
}

Is it possible in utf8?
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-05 01:44:49 UTC
Permalink
"""Its possible that WM_IME_CHAR triggered on previous editions of Windows or with languages I am not trying. """
I do not know how WM_IME_CHAR works. but it appears to have something to discuss.
As I know theoritically WM_IME_CHAR should exit with break, but scintilla's does with return 0;.
I wonder why does not it cause no problem.
In this code "break" is equivalent to "return 0" since the break immediately exits the switch and executes the last statement of the method which is "return 0l".
I want to know how to addCharUTF() character by character not byte stream.
On win32 I have unicode. On gtk utf8 comes in.
unicode can treat characters. but How can I tell characters in utf8 string?
UTF-8 is defined so the first byte in a character specifies how many bytes there are in that character. This article may help explain:
http://en.wikipedia.org/wiki/Utf-8#Codepage_layout

For Scintilla, this is implemented by the UTF8BytesOfLead array in UniConversion.h. It is important to index this array by an unsigned value which may require casting. The char type may be signed which will fail when used as an index.

To count the characters in a UTF-8 string on GTK+, there is a function
https://developer.gnome.org/glib/stable/glib-Unicode-Manipulation.html#g-utf8-strlen
Inside ScintillaGTK.cxx you can use either Scintilla's Unicode features or those provided by GTK+ - whichever is easier for you.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-07 23:58:46 UTC
Permalink
The current release doesn't call AddCharUTF(..., ..., true) under any observed circumstances. The patched version of ScintillaWin.cxx, for Korean IME, uses true for DBCS character sets. While this is fulfilling the design, it will make it more difficult, and thus less likely, for applications to use the reported character value successfully.

For SciTE, it would be good to update the autocompletion code to handle Korean characters correctly. To do so, it needs to extend its concept of a character set so Korean characters can be included. There are different implementations possible for character sets but requiring that they work for both DBCS and UTF-8 will be around twice the work.

Therefore, I want to change the Korean IME mode in ScintillaWin.cxx to always call AddCharUTF with final parameter false.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-06 09:00:57 UTC
Permalink
Thank you for the tips.

I fixed IME to send chars instead of byte stream.
IMEs on the spot behave correctly in inOverstrike now.
That is what I have wanted.

Please take a scan.
I hope you like it.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-06 09:33:15 UTC
Permalink
Post by johnsonj
I fixed IME to send chars instead of byte stream.
IMEs on the spot behave correctly in inOverstrike now.
That is what I have wanted.
It needs to g_free the utfChar variable which was allocated by g_utf8_substring.
https://developer.gnome.org/glib/2.30/glib-Unicode-Manipulation.html#g-utf8-substring

g_utf8_substring only became available in GTK+ version 2.30. Scintilla currently supports GTK+ 2.8 and requiring 2.30 would be too much of a jump. It should be possible to use other calls like g_utf8_find_next_char which are available on older versions of GTK+.

New releases of Scintilla are made approximately 6 weeks apart and there is a 'quiet time' before each release where changes are not made except for fixing regressions. This will start in a couple of days so we need to work out whether or not this is stable enough to be committed and which pieces should be committed.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-06 11:07:44 UTC
Permalink
"""" g_utf8_substring only became available in GTK+ version 2.30"""
sorry I have not known that.

I already have tried before
g_utf8_next_char() and g_utf8_find_next_char()

It returns pointer not one character.

I got characters like this.
Japanese IME on the spot:
While the for statement performs.
1st round "かなた”
2nd round "なた”
3rd round ""た”

I can not handle pointer.
How can I get the one char at index?
Can you show me some exaple?
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-06 13:17:45 UTC
Permalink
I have used this code before.

char *utfChar;
for (utfChar=utfVal; *utfChar != 0; utfChar =
g_utf8_find_next_char(utfChar, NULL)) {
. . .
}

It works well in preedit_changed() thanks to tentativeUndo();
But It does above behavior in commit();
I am struggling for IME on the spot.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-06 22:54:36 UTC
Permalink
Post by johnsonj
char *utfChar;
for (utfChar=utfVal; *utfChar != 0; utfChar = g_utf8_find_next_char(utfChar, NULL)) {
. . .
}
It works well in preedit_changed() thanks to tentativeUndo();
But It does above behavior in commit();
That should work. Make sure you are using UTF8BytesOfLead or the GTK+ equivalent for the length argument to AddCharUTF.

The call to TentativeStart in the most recent patch appears to occur for every character and it should probably occur once for the whole composition.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-07 07:27:33 UTC
Permalink
I have gone too far from korean IME.
There are far much work left for IME on the spot in general.

I should be back to korean IME.
No time left.

Take care of commitThis() which may affect other IMEs.
I am waiting for you to tell me what should be fixed.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-07 23:58:44 UTC
Permalink
Post by johnsonj
I should be back to korean IME.
No time left.
Take care of commitThis() which may affect other IMEs.
I am waiting for you to tell me what should be fixed.
I can commit this patch if you haven't seen any problems with it.

Using AddCharUTF("", 0, false) to remove overstrike characters does not look good as it is not something that AddCharUTF was designed to do. It might work but there may also be problems.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-07 23:58:48 UTC
Permalink
An IME feature request has appeared on the tracker. I don't know if surrounding context is important for Korean.
https://sourceforge.net/p/scintilla/feature-requests/1066/

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-08 02:47:29 UTC
Permalink
"""Using AddCharUTF(“”, 0, false) to remove overstrike characters"""

It is Ok to delete it.
It is of no use practically for korean IME.

It is Ok to put it.
It does not affect other IMEs.
I bet it does not cause problems in Korean IME.

I already do not use the third parameter of AddCharUTF() in other places.

I will follow you.
Thank you.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-08 05:50:09 UTC
Permalink
The removal of the third argument to AddCharUTF in ScintillaWin.cxx was committed. This change set also moves the maxLenInputIME constant to ScintillaBase.h where it can be shared by each of the platform layers.

https://sourceforge.net/p/scintilla/code/ci/70e9770a6e03f65359c193f25f7c8e898643a09b/
"""Using AddCharUTF("", 0, false) to remove overstrike characters"""
It is Ok to delete it.
It is of no use practically for korean IME.
It is committed now with some minor changes, particularly to comments, as
https://sourceforge.net/p/scintilla/code/ci/6ba41407f820b40e49a41882d3d1df77d88cb5c5/

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
johnsonj
2014-08-11 00:32:54 UTC
Permalink
I find I made a mistake.
While I tried to save indents I made a critical bug.

Sorry I have troubled you.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-11 04:27:40 UTC
Permalink
Post by johnsonj
I find I made a mistake.
While I tried to save indents I made a critical bug.
strlen returns an unsigned number so it can't be less than 0 and testing it <= 0 may cause warnings. So committed as

if ((strlen(utfval.str) == 0) || strlen(utfval.str) > maxLenInputIME * 3) {

https://sourceforge.net/p/scintilla/code/ci/a431f85e13c9499923b79d4890e8642604863a27/

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Loading...