Discussion:
C++11 regex
Neil Hodgson
2013-07-24 00:41:09 UTC
Permalink
Attached is an updated patch to use C++11's regular expression implementation within Scintilla. Unlike previous versions it performs backwards regex searches. The limited state of regex implementations in C++ support libraries means this isn't being committed now. The libstdc++ library used with g++ has an empty regex implementation, so this will not work at all with g++.

Other implementations have problems with '^' and '$'. The runtime for Visual C++ only recognises \n line ends for start/end line, not \r\n or \r. libc++ used by Clang on OS X doesn't match '^' or '$' inside a search range, only at the ends. To get around these issues, the patch performs line by line searches, just like Scintilla's current limited regex. This means that regex searches will not match over line ends. It may be possible to add another flag to regex to choose between line by line and whole range searches but the results of whole range searches are still going to be platform dependent.

I still want to add C++11 regex to Scintilla at some point but, for now, applications that want good regex support should implement this in the application code, possibly using calls like SCI_GETRANGEPOINTER for fast access to the text inside Scintilla.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.
minico
2014-05-11 07:06:35 UTC
Permalink
Hi Neil,
I used this patch and scintilla crashed when I searched
+ if (matched) {
+ for (size_t co=0; co < matches; co++) {
+ size_t lenMatch = search.eopat[co] - search.bopat[co];
+ search.pat[co] = new char[lenMatch + 1];
+ for (int iPos=search.bopat[co]; iPos < search.eopat[co]; iPos++)
+ search.pat[co][iPos - search.bopat[co]] = doc->CharAt(iPos);
+ search.pat[co][lenMatch] = '\0';
+ }
+ posMatch = search.bopat[0];
+ *length = search.eopat[0] - search.bopat[0];
+ }


"pat" is defined as a string array: std::string pat[MAXTAG];
but it seems to be used as a pointer array.


圚 2013幎7月24日星期䞉UTC+8䞊午8时41分09秒Neil Hodgson写道
Post by Neil Hodgson
Attached is an updated patch to use C++11's regular expression
implementation within Scintilla. Unlike previous versions it performs
backwards regex searches. The limited state of regex implementations in C++
support libraries means this isn't being committed now. The libstdc++
library used with g++ has an empty regex implementation, so this will not
work at all with g++.
Other implementations have problems with '^' and '$'. The runtime for
Visual C++ only recognises \n line ends for start/end line, not \r\n or \r.
libc++ used by Clang on OS X doesn't match '^' or '$' inside a search
range, only at the ends. To get around these issues, the patch performs
line by line searches, just like Scintilla's current limited regex. This
means that regex searches will not match over line ends. It may be possible
to add another flag to regex to choose between line by line and whole range
searches but the results of whole range searches are still going to be
platform dependent.
I still want to add C++11 regex to Scintilla at some point but, for
now, applications that want good regex support should implement this in the
application code, possibly using calls like SCI_GETRANGEPOINTER for fast
access to the text inside Scintilla.
Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-05-11 11:45:23 UTC
Permalink
+ if (matched) {
+ for (size_t co=0; co < matches; co++) {
+ size_t lenMatch = search.eopat[co] - search.bopat[co];
+ search.pat[co] = new char[lenMatch + 1];
+ for (int iPos=search.bopat[co]; iPos < search.eopat[co]; iPos++)
+ search.pat[co][iPos - search.bopat[co]] = doc->CharAt(iPos);
+ search.pat[co][lenMatch] = '\0';
+ }
+ posMatch = search.bopat[0];
+ *length = search.eopat[0] - search.bopat[0];
+ }
"pat" is defined as a string array: std::string pat[MAXTAG];
but it seems to be used as a pointer array.
Yes, Scintilla has moved on from that patch. You could probably fix up the patched code to use string, possibly similar to RESearch::GrabMatches.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
minico
2014-05-11 12:21:35 UTC
Permalink
Thank you, Neil:)

圚 2014幎5月11日星期日UTC+8䞋午7时45分23秒Neil Hodgson写道
Post by minico
+ if (matched) {
+ for (size_t co=0; co < matches; co++) {
+ size_t lenMatch = search.eopat[co] -
search.bopat[co];
+ search.pat[co] = new char[lenMatch +
1];
+ for (int iPos=search.bopat[co]; iPos <
search.eopat[co]; iPos++)
+ search.pat[co][iPos -
search.bopat[co]] = doc->CharAt(iPos);
+ search.pat[co][lenMatch] = '\0';
+ }
+ posMatch = search.bopat[0];
+ *length = search.eopat[0] - search.bopat[0];
+ }
"pat" is defined as a string array: std::string pat[MAXTAG];
but it seems to be used as a pointer array.
Yes, Scintilla has moved on from that patch. You could probably fix up
the patched code to use string, possibly similar to RESearch::GrabMatches.
Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Piotr Komoda
2014-08-04 08:10:46 UTC
Permalink
Hi Neil,
Post by Neil Hodgson
Attached is an updated patch to use C++11's regular expression implementation within Scintilla. Unlike previous versions it performs backwards regex searches. The limited state of regex implementations in C++ support libraries means this isn't being committed now. The libstdc++ library used with g++ has an empty regex implementation, so this will not work at all with g++.
Other implementations have problems with '^' and '$'. The runtime for Visual C++ only recognises \n line ends for start/end line, not \r\n or \r. libc++ used by Clang on OS X doesn't match '^' or '$' inside a search range, only at the ends. To get around these issues, the patch performs line by line searches, just like Scintilla's current limited regex. This means that regex searches will not match over line ends. It may be possible to add another flag to regex to choose between line by line and whole range searches but the results of whole range searches are still going to be platform dependent.
I still want to add C++11 regex to Scintilla at some point but, for now, applications that want good regex support should implement this in the application code, possibly using calls like SCI_GETRANGEPOINTER for fast access to the text inside Scintilla.
Neil
I've compiled Scintilla with the patch you posted using Visual Studio 2010 and I'm a bit disappointed. I was hoping to get full support for lookarounds, because Scintilla is currently completely missing this feature of regex. Unfortunately only lookahead is working as it should, negative lookahead is malfunctioning and both lookbehind methods don't work at all.

Is there any chance you could update this patch? Pretty please :-)
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Neil Hodgson
2014-08-04 22:18:44 UTC
Permalink
Post by Piotr Komoda
I've compiled Scintilla with the patch you posted using Visual Studio 2010 and I'm a bit disappointed. I was hoping to get full support for lookarounds, because Scintilla is currently completely missing this feature of regex. Unfortunately only lookahead is working as it should, negative lookahead is malfunctioning and both lookbehind methods don't work at all.
Is there any chance you could update this patch? Pretty please :-)
It is most likely that these limitations are with the implementation of regex in Visual Studio, since the patch is restricted to providing a view of the document to that code and is not responsible for implementing particular regular expression features. There are options set when assigning the wregex object and when calling regex_search and there may be changes that could be made to those.

You could write a small test program to investigate whether Visual Studio implements the features you want using std::string for the text being searched.

Neil
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Piotr Komoda
2014-08-06 13:33:05 UTC
Permalink
Neil, I suck at programming so I wouldn't be able to test that, but after
some reading I'm starting to think that you might be right and it's a
limitation of Visual C++ or .Net Framework. That's probably the reason why
the guys from Notepad++ team are using Boost Libraries.Thank you for your
input on this matter.
--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-interest+***@googlegroups.com.
To post to this group, send email to scintilla-***@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/d/optout.
Loading...