admin 管理员组文章数量: 1086019
I'd like to be able to constrain user input to a white list of valid characters, but I don't want to prevent people from other cultures from signing up. So far, I have this:
^[a-zA-Z0-9èéêëàáâãäçìíîïòóôõöùúûü-_]*$
It allows for most French accents, but the list of accents in the latin character set are IMMENSE! I would prefer to use a white list instead of a black list, in case I miss something.
Note, This will be for C# but I'd like to use the regex for client side validation to be consistent on both sides. I'm HTML encoding the input when I save it to the database as well.
Is there a more elegant way of making the regex accent insensitive, but still being restrictive enough to prevent XSS? I don't want to alienate my users.
I would like to be able to have some punctuation but not open myself up for XSS attacks, for example, I want someone to enter their pany name: If someone worked at Yahoo!, they should be able to sign up.
I'd like to be able to constrain user input to a white list of valid characters, but I don't want to prevent people from other cultures from signing up. So far, I have this:
^[a-zA-Z0-9èéêëàáâãäçìíîïòóôõöùúûü-_]*$
It allows for most French accents, but the list of accents in the latin character set are IMMENSE! I would prefer to use a white list instead of a black list, in case I miss something.
Note, This will be for C# but I'd like to use the regex for client side validation to be consistent on both sides. I'm HTML encoding the input when I save it to the database as well.
Is there a more elegant way of making the regex accent insensitive, but still being restrictive enough to prevent XSS? I don't want to alienate my users.
I would like to be able to have some punctuation but not open myself up for XSS attacks, for example, I want someone to enter their pany name: If someone worked at Yahoo!, they should be able to sign up.
Share Improve this question edited Apr 14, 2011 at 15:41 Dave Harding asked Apr 14, 2011 at 15:24 Dave HardingDave Harding 1,3902 gold badges16 silver badges31 bronze badges 3-
The ECMAscript RegExp class does not support unicode, beyond the \u.... escape to match a single code point: [ECMA-262 Standard][1]. For example, the
\w
escape only includes the ASCII letters and digits, plus "_". [1]: ecma-international/publications/files/ECMA-ST/ECMA-262.pdf – odrm Commented Apr 14, 2011 at 15:40 - Am I going about this the wrong way? I guess the broader question is what's the best validation on the server side to prevent XSS (other than simply HTML encoding everything)? – Dave Harding Commented Apr 14, 2011 at 15:46
- I'm going to split up the server side functions as having one for only alphanumeric and one with punctuation. Thank you for your help! – Dave Harding Commented Apr 14, 2011 at 17:46
6 Answers
Reset to default 2Maybe you could use unicode range like [\u00C0-\u017E] propably covers all bases for accent (but you should check character map to make sure, as i don't know what accents italian language has).
fwiw: I use a home brew function that returns a RegExp for all diacrits:
function diacritsRegEx(global, caseinsitive, multiline){
var modifiers = (global ? 'g' : '')
+ (multiline ? 'm' : '')
+ (caseinsitive ? 'i' : '');
return new RegExp(
['[\\.\\-a-z\\s]|', // [a-z, . - and space]
'[\\300-\\306\\340-\\346]|', // all accented A, a
'[\\310-\\313\\350-\\353]|', // all accented E, e
'[\\314-\\317\\354-\\357]|', // all accented I, i
'[\\322-\\330\\362-\\370]|', // all accented O, o
'[\\331-\\334\\371-\\374]|', // all accented U, u
'[\\321-\\361]|', // all accented N, n
'[\\307-\\347]' // all accented C, c
]
.join(''), modifiers);
}
^\w+$
Couldn't you just use the alphanumeric flag, I believe that accepts the accents.
In some regex implementations a simple \w
will cover all those. See http://www.regular-expressions.info/charclass.html
If you want to allow letter (with diacritics or not) and some punctuation you can use:
^[\w_-]+$
where \w stands for any letter and _- are the 2 allowed extra punctuations allowed. Dont-t forget to put the - at the end is used.
For user input in order form I'm using this: [^\w\s+\/_,.@-] This allows characters for emails, zip-codes, first name, last name etc.
本文标签: cRegex white list for input validationaccent insensitiveStack Overflow
版权声明:本文标题:c# - Regex white list for input validation - accent insensitive - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.roclinux.cn/p/1744080794a2530203.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论