Discussion:
Reverse lexing and Nested Comments
Charly Dante
2014-03-11 14:28:26 UTC
Permalink
Hello,

I currently have a problem with my lexer and I don't really know how to
solve it. The thing is I am writing this lexer for a language that supports *Nested
Block-Comments*. This means that each comment that is opened by a /* has to
be closed by another */, even if it was opened inside another
block-comment. The problem is, that this works in both directions: forwards
*and* backwards. To provide an example:

int myfunction(int n)
{
return n;
}

If I now add a normal opening block-comment at the top of that function
everything beyond it gets highlighted green. This is also the way how it
works in C++, which would look like this:


/* int myfunction(int n)
{
return n;
}

However, in languages that support nested block-comments, the other way
round is also possible and adding a */ at the *end* of the code will also
comment out everything before, just like this:

int myfunction(int n)
{
return n;
} */

This doesn't work in C++ obviously as C++ doesn't have nested
block-comments. The highlighting there for this case will look like this:

int myfunction(int n)
{
return n;
} */


And thats exactly my problem now, that the highlighting is not applied in
this case because it would require the lexer to iterate bi-directional.

There are several languages supproting nested block-comments and at the
moment I see no way to provide reliable highlighting for them? Especially
when having multiple nested statements its very confusing that one cannot
see if all comments are closed now or not because the highlighting bugs....
How can I solve this issue?

Best Regards,
CD
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Philippe Lhoste
2014-03-11 14:51:39 UTC
Permalink
I currently have a problem with my lexer and I don't really know how to solve it. The
thing is I am writing this lexer for a language that supports /Nested
Block-Comments/.//This means that each comment that is opened by a /* has to be closed by
another */, even if it was opened inside another block-comment. The problem is, that this
int myfunction(int n)
{
return n;
}
If I now add a normal opening block-comment at the top of that function everything beyond
it gets highlighted green. This is also the way how it works in C++, which would look like
/* int myfunction(int n)
{
return n;
}
However, in languages that support nested block-comments, the other way round is also
possible and adding a */ at the /end/of the code will also comment out everything before,
int myfunction(int n)
{
return n;
} */
I don't see the point there. Nor the relation with nesting comments.
This doesn't work in C++ obviously as C++ doesn't have nested block-comments. The
int myfunction(int n)
{
return n;
} */
And thats exactly my problem now, that the highlighting is not applied in this case
because it would require the lexer to iterate bi-directional.
There are several languages supproting nested block-comments and at the moment I see no
way to provide reliable highlighting for them? Especially when having multiple nested
statements its very confusing that one cannot see if all comments are closed now or not
because the highlighting bugs....
How can I solve this issue?
You can take a look at the LexPOV (C-style comments) or at LexLua (supporting nesting
level, so needing a state).
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Charly Dante
2014-03-11 15:47:42 UTC
Permalink
Ok,

let me explain further then. The code above was only a minimal example. I
know how C-Style Comments work and I know how Lua Comments work. Both
approaches do not cover what I am trying to achieve.

C-Style comments are not nestable at all, so they are not really
interesting for me. The Lua lexer btw has the same problem which I
currently have. So I will maybe illustrate my example with the Lua lexer.
Consider the following Lua Code:


local foo = bar

local test1 = 1

local test2 = 2


Now lets say I want to comment out the second line with using a
block-comment. I would write something like:

local foo = bar
--[[
local test1 = 1
]]
local test2 = 2


Until here everything fine. Now lets say I want to comment out the whole
block. I can easily do this by using the --[=[ operator. If I add this at
the top of the code, everything will look fine and will be highlighted in
green:

--[=[ local foo = bar
--[[
local test1 = 1
]]
local test2 = 2


However, consider now doing it the other way round. I dont start with the
commenting at the top, but at the bottom. I will first add the ending
operator --]=] at the end of the code. It will look like this:

local foo = bar
--[[
local test1 = 1
]]
local test2 = 2 --]=]

And thats exactly the problem: While in forward direction everything after
the comment operator gets highlighted, in the backward direction this
doesn't happen. Now imagine a language were such comment blocks are opened
and closed with the same operators as in C namely /* and */, but in
contrary to C, nested block comments are allowed.

If you now have the following situation:

int myfunction(int n)
{
int test1 = 1;
int test2 = 2;
return n;
}

and comment out the first two lines of the function with a block-comment it
would look like this:

int myfunction(int n)
{
/*int test1 = 1;
int test2 = 2;*/
return n;
}

Ok, everything fine. If I now add another comment opening operator *inside *this
comment block and don't close it, it will throw a syntax error due to
unclosed block-comments. The highlighting will look like:

int myfunction(int n)
{
/*int test1 = 1; /*
int test2 = 2;*/
return n;
}


Luckily I can directly see the error from the highlighting: Somewhere there
must be an open comment block, because my return n; statement is also
highlighted green. So I can directly see the error here.

But what happens if I do the opposite: I don't add an opening comment
operator and forget to close it, I add a closing comment operator and
forget to open it! Then the highlighting will look like this:

int myfunction(int n)
{
/*int test1 = 1; */
int test2 = 2;*/
return n;
}

No difference observable. However, as such nested comments work
*bi-directional* the highlighting for them should also work bi-directional.
What I would like it to look like would be:

int myfunction(int n)
{
/*int test1 = 1; */
int test2 = 2;*/
return n;
}

Then it would be consistent to the behaviour into forward direction. In Lua
this problem is not so strong, as you have to add a further equal sign to
each new nest level of your block-comment like --[=[ then --[==[ and so on.
But in languages where you only have /* and */ and the nesting level has to
be detected by the lexer himself, proper highlighting would really help in
many situations.

You now can see the point and the relation to nested block-comments? They
are treated in both directions equally by the compiler of those languages
and so should they by the lexer.
Post by Philippe Lhoste
I don't see the point there. Nor the relation with nesting comments.
You can take a look at the LexPOV (C-style comments) or at LexLua (supporting nesting
level, so needing a state).
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Philippe Lhoste
2014-03-11 16:23:05 UTC
Permalink
let me explain further then. The code above was only a minimal example. I know how C-Style
Comments work and I know how Lua Comments work. Both approaches do not cover what I am
trying to achieve.
C-Style comments are not nestable at all, so they are not really interesting for me.
POV-Ray comments are C-style /* */ but are nestable, that's what I meant.
The Lua lexer btw has the same problem which I currently have.
Ah, OK.
No difference observable. However, as such nested comments work /bi-directional/ the
highlighting for them should also work bi-directional.
I still fail to understand, but I will blame my feeble mind.
My understanding is that if you have a closing comment without a matching opening one, it
is a syntax error, and it should be highlighted as such. Except that C-like comments are
also successive valid operators, and most (if not all) simple Scintilla lexers just are OK
with them.

The fact that an opening comment without a closing one go down to the end of the document
is quite related to the way Scintilla lexers work, highlighting from the current point to
the bottom of the screen.

Perhaps lexers should detect we have reached the end of the document in comment mode
without finding a closing tag, and should change the state to error.

And I suppose that if such lexer finds */ (outside strings, etc.) when not in block
comment mode, it should complain, as this sequence of operators isn't valid (in all
languages I know, except perhaps Scala... Mmm, even there, it should not be legal!).

An improvement that should be contributed to the project... :)
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Philippe Lhoste
2014-03-12 11:54:03 UTC
Permalink
KHMan and Mike Lischke expressed more clearly than I did the objections I tried to raise.

In other word, your reverse lexing has sense only if a compiler / interpreter treats all
the text from the beginning of the source to a lone */ as a comment.

It is more probable that the language parser will treat this lone */ as an error on the
line where it stands. That's why I suggested that the Scintilla lexer does the same.

Can be an improvement of the current lexers.
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
KHMan
2014-03-11 17:23:13 UTC
Permalink
[snipped everything]
Everybody scans from beginning to end. It's unambiguous behaviour
and everybody knows what to expect.

"Bi-directional nested comments" in the way you described is just
bad news.

So it looks like I can have a 1Mbyte source code file and the
interpretation of the first character can depend on (say */) the
last two characters of the file. But the meaning of the last two
character of the file depends on the preceding text. Thus you need
to do some semantic analysis on the _entire_file_ to recognize
that extra block-comment-end. That will totally kill the
performance of editors not mainly using a compiled lexer.

The entire file can now be highlighted as a comment block because
of say an ending */, yet we should keep all the semantic
information gathered during the forward scan as highlighting state.

Or maybe we scan from end to beginning, but then how do we know
it's an extra block-comment-end? Oh, now we need to scan right to
the beginning to sort out the ambiguity.

Novel is not always good. I don't recall any existing programming
language doing this sort of thing, thank god. Good luck, anyway.
--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Charly Dante
2014-03-11 17:01:17 UTC
Permalink
Hi,

first of all thanks for your interest in this problem :)
Post by Philippe Lhoste
POV-Ray comments are C-style /* */ but are nestable, that's what I
meant.

Ok, I didn't know that, I thought they work like normal C-Comments.
Post by Philippe Lhoste
My understanding is that if you have a closing comment without a matching
opening one, it
Post by Philippe Lhoste
is a syntax error, and it should be highlighted as such. Except that
C-like comments are
Post by Philippe Lhoste
also successive valid operators, and most (if not all) simple Scintilla
lexers just are OK
Post by Philippe Lhoste
with them.
Yes and no. Yes it is a syntax error, but no it should not be highlighted
as a syntax error. Scintilla
doesn't do error checking of the code at all and I don't want to change
that.

What I mean is really only the comment highlighting. Another minimal
example (without nesting,
but describing the same problem):


int myfunction(int n)
{
/* int test1 = 1;
int test2 = 2;
return n;
}

Here you can *directly* see from the highlighting of the code that there is
something wrong because
the return n; statement is commented out - so you see visually, as all code
untill the end of the
document is now green, that you *somewhere* have an unclosed open
block-comment operator.

You don't need any error signal by Scintilla or Syntax-Error highlighting -
you can just *directly* see it
as all your code turns green.

What I want now is the same behavior for exactly the opposite situation:

int myfunction(int n)
{
int test1 = 1;
int test2 = 2;
return n; */
}

Here we have basically the same situation - with the only difference that
not an opening but a closing
comment operator is missing his opening operator.

I want the code now to be highlighted like:

int myfunction(int n)
{
int test1 = 1;
int test2 = 2;
return n; */
}

Then you would again see *directly* from the highlighting that there has to
be an un-opened closing comment
operator, as again all your code turns green - just consistent to the
behavior with opening operator.

This functionality would be especially usefull for languages with nested
block-comments, as it can get very tricky
to see on which nest-level you currently are when coding and the
highlighting doesn't work in both directions.
Post by Philippe Lhoste
The fact that an opening comment without a closing one go down to the end
of the document
Post by Philippe Lhoste
is quite related to the way Scintilla lexers work, highlighting from the
current point to
Post by Philippe Lhoste
the bottom of the screen.
Exactly. But I want it to work in both directions - from the current point
to the bottom *and* to the top.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Philippe Lhoste
2014-03-11 17:22:25 UTC
Permalink
Yes and no. Yes it is a syntax error, but no it should not be highlighted as a syntax
error. Scintilla
doesn't do error checking of the code at all and I don't want to change that.
Not really. In languages with single line string, an unclosed string is highlighted with
an error style.
C lexer highlight doc comment tags not in the lists of supported tags. Idem for the HTML
lexer with unknown tags / attributes, etc.

So, highlighting specifically an error isn't shocking / new in a Scintilla lexer...
--
Philippe Lhoste
-- (near) Paris -- France
-- http://Phi.Lho.free.fr
-- -- -- -- -- -- -- -- -- -- -- -- -- --
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Andreas Tscharner
2014-03-12 08:33:16 UTC
Permalink
Post by Charly Dante
Hello,
Hello,
Post by Charly Dante
I currently have a problem with my lexer and I don't really know how to
solve it. The thing is I am writing this lexer for a language that
supports /Nested Block-Comments/.//This means that each comment that is
opened by a /* has to be closed by another */, even if it was opened
inside another block-comment. The problem is, that this works in both
What about lexing after forward also backwards, only for comments maybe?
It should be possible to add a second for-loop in the lexer code after
the first one?

HTH and best regards
Andreas
--
Andreas Tscharner ***@gmail.com
------------------------------------------------------------------------
Der entscheidende Vorteil eines Chats gegenueber einem normalen Telefon-
anruf ist der, dass ersterer langsamer geht und mehr kostet (fuer den
lebenswichtigen Austausch von Informationen wie "hya folks", "C U
l8er" und ":-)") ... Aus Murphy's Computergesetzen
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Mike Lischke
2014-03-12 10:51:04 UTC
Permalink
Post by Charly Dante
I currently have a problem with my lexer and I don't really know how to
solve it. The thing is I am writing this lexer for a language that
supports /Nested Block-Comments/.//This means that each comment that is
opened by a /* has to be closed by another */, even if it was opened
inside another block-comment. The problem is, that this works in both
What about lexing after forward also backwards, only for comments maybe? It should be possible to add a second for-loop in the lexer code after the first one?
Backwards lexing is utter nonsense. How can this work actually? Do you want to revert characters in words? Certainly not. Do you want to reverse word orders? Certainly not, either. The only situation were backwards lexing makes sense is when you lex a language that is written bottom to top or at least right to left. No sane programming language is designed like that, tho.

Even only scanning for multiline comments it would not make sense at all. It would mean to change the meaning of a perfectly fine text at the top if there's an unbalanced closing multiline comment separator at the bottom. This would be plain wrong.

It's by design of programming languages, parser/lexers and syntax highlighters that a text part *never* influences previous text parts, never ever. The original idea of showing text as commented is wrong in all possible scenarios, as it shows wrong information.

And btw, multiline comments are the syntax element with the most impact on a text, no other type produces more work, especially when changed in a large file. Adding another run certainly will not make the situation better.

Mike
--
www.soft-gems.net
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Loading...