首页猿问 RegEx拆分camelCase或...

RegEx拆分camelCase或TitleCase（高级）

Java 正则表达式

繁星淼淼 2019-10-08 15:06:15

我找到了一个出色的RegEx来提取camelCase或TitleCase表达的一部分。 (?<!^)(?=[A-Z])它按预期工作：值->值camelValue-> camel / ValueTitleValue->标题/值例如，使用Java：String s = "loremIpsum";words = s.split("(?<!^)(?=[A-Z])");//words equals words = new String[]{"lorem","Ipsum"}我的问题是在某些情况下它不起作用：情况1：VALUE-> V / A / L / U / E情况2：eclipseRCPExt-> eclipse / R / C / P / Ext在我看来，结果应该是：情况1：VALUE情况2：日食/ RCP /外部换句话说，给定n个大写字符：如果n个字符后跟小写字符，则组应为：（n-1个字符）/（第n个字符+小写字符）如果n个字符位于末尾，则该组应为：（n个字符）。关于如何改善此正则表达式的任何想法吗？

查看完整描述

3 回答

拉风的咖菲猫

TA贡献1995条经验获得超2个赞

以下正则表达式适用于所有上述示例：

public static void main(String[] args)

{

for (String w : "camelValue".split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])")) {

System.out.println(w);

}

它通过强制否定的向后看不仅在字符串的开头忽略匹配项，而且还忽略在大写字母后跟另一个大写字母的匹配项。这样可以处理“ VALUE”之类的情况。

正则表达式的第一部分本身由于无法在“ RPC”和“ Ext”之间分割而在“ eclipseRCPExt”上失败。这是第二个条款的目的：(?<!^)(?=[A-Z][a-z]。此子句允许在每个大写字母前跟一个小写字母前进行拆分，但字符串的开头除外。

反对回复 2019-10-08

狐的传说

TA贡献1804条经验获得超3个赞

看来您正在使此过程变得比所需的更为复杂。对于camelCase，拆分位置仅是大写字母紧跟在小写字母之后的任何位置：

(?<=[a-z])(?=[A-Z])

这是此正则表达式如何拆分示例数据的方法：

value -> value
camelValue -> camel / Value
TitleValue -> Title / Value
VALUE -> VALUE
eclipseRCPExt -> eclipse / RCPExt

与所需输出的唯一区别是与eclipseRCPExt，我认为这是在此处正确分割的。

附录-改进版本

注意：这个答案最近得到了好评，我意识到有更好的方法...

通过在上述正则表达式中添加第二种替代方法，可以正确拆分所有OP的测试用例。

(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])

这是改进的正则表达式如何拆分示例数据的方法：

value -> value
camelValue -> camel / Value
TitleValue -> Title / Value
VALUE -> VALUE
eclipseRCPExt -> eclipse / RCP / Ext

反对回复 2019-10-08

斯蒂芬大帝

TA贡献1827条经验获得超8个赞

我无法获得aix的解决方案（也不能在RegExr上运行），所以我想出了自己的经过测试的方法，似乎可以完全满足您的要求：

((^[a-z]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($))))

这是一个使用它的示例：

; Regex Breakdown: This will match against each word in Camel and Pascal case strings, while properly handling acrynoms.

; (^[a-z]+) Match against any lower-case letters at the start of the string.

; ([A-Z]{1}[a-z]+) Match against Title case words (one upper case followed by lower case letters).

; ([A-Z]+(?=([A-Z][a-z])|($))) Match against multiple consecutive upper-case letters, leaving the last upper case letter out the match if it is followed by lower case letters, and including it if it's followed by the end of the string.

newString := RegExReplace(oldCamelOrPascalString, "((^[a-z]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($))))", "$1 ")

newString := Trim(newString)

在这里，我用空格分隔每个单词，因此，下面是一些如何转换字符串的示例：

ThisIsATitleCASEString =>这是一个标题案例字符串

andThisOneIsCamelCASE =>而这一个是Camel CASE

上面的解决方案可以满足原始帖子的要求，但是我还需要一个正则表达式来查找包含数字的骆驼和帕斯卡字符串，因此我也想出了一种包含数字的变体：

((^[a-z]+)|([0-9]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($)|([0-9]))))

以及使用它的示例：

; Regex Breakdown: This will match against each word in Camel and Pascal case strings, while properly handling acrynoms and including numbers.

; (^[a-z]+) Match against any lower-case letters at the start of the command.

; ([0-9]+) Match against one or more consecutive numbers (anywhere in the string, including at the start).

; ([A-Z]{1}[a-z]+) Match against Title case words (one upper case followed by lower case letters).

; ([A-Z]+(?=([A-Z][a-z])|($)|([0-9]))) Match against multiple consecutive upper-case letters, leaving the last upper case letter out the match if it is followed by lower case letters, and including it if it's followed by the end of the string or a number.

newString := RegExReplace(oldCamelOrPascalString, "((^[a-z]+)|([0-9]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($)|([0-9]))))", "$1 ")

newString := Trim(newString)

以下是一些使用此正则表达式转换数字字符串的示例：

myVariable123 =>我的变量123

my2Variables =>我的2个变量

3rdVariableIsHere =>第3rdVariable在这里

12345NumsAtTheStartIncludedToo => 12345 Nums在开始时也包含

反对回复 2019-10-08

3 回答
0 关注
730 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

RegEx拆分camelCase或TitleCase（高级）

RegEx拆分camelCase或TitleCase（高级）

3 回答

附录-改进版本

添加回答