A procedural approach to the problem.
An excellent object-oriented approach can be found here:
cyberai.com (I used it as a reference.)
<?php
/* FILE INFO
Form Package
File: form_package.inc.php
Last Update: Dec 2005
Author: Tom Atwell (klenwell@gmail.com)
FUNCTIONS:
NOTES:
/*______________________________________________*/
// *** GLOBAL SETTINGS
// bad tag attributes
$THIS_FORM['bad_atts'] = 'src\s*=\s*java|action|background|codebase|dynsrc|lowsrc|'.
'onAbort|onActivate|onAfterPrint|onAfterUpdate|onBeforeActivate|'.
'onBeforeCopy|onBeforeCut|onBeforeDeactivate|onBeforeEditFocus|'.
'onBeforePaste|onBeforePrint|onBeforeUnload|onBlur|onBounce|'.
'onCellChange|onChange|onClick|onContextMenu|onControlSelect|onCopy|'.
'onCut|onDataAvailible|onDataSetChanged|onDataSetComplete|onDblClick|'.
'onDeactivate|onDrag|onDragEnd|onDragLeave|onDragEnter|onDragOver|'.
'onDragDrop|onDrop|onError|onErrorUpdate|onExit|onFilterChange|'.
'onFinish|onFocus|onFocusIn|onFocusOut|onHelp|onKeyDown|onKeyPress|'.
'onKeyUp|onLayoutComplete|onLoad|onLoseCapture|onMouseDown|'.
'onMouseEnter|onMouseLeave|onMouseMove|onMouseOut|onMouseOver|'.
'onMouseUp|onMouseWheel|onMove|onMoveEnd|onMoveStart|onPaste|'.
'onProgress|onPropertyChange|onReadyStateChange|onReset|onResize|'.
'onResizeEnd|onResizeStart|onRowEnter|onRowExit|onRowDelete|'.
'onRowInserted|onScroll|onSelect|onSelectionChange|onSelectStart|'.
'onStart|onStop|onSubmit|onUnload';
/* fx validate_textarea
*************************************************/
function validate_textarea($text, $required=TRUE, $max_words=250, $min_words=0, $allowed_tags=FALSE)
{
// *** DATA
# internal
$_prompt = '';
$_word_num = 0;
$_word_label = 'word';
# return
$is_valid = 0;
$prompt = '';
$text_out = $text;
$REPORT = array();
// *** MANIPULATE
# required
if ( $required && empty($text) )
{
$is_valid = 0;
$prompt = 'Please fill in box';
$REPORT = array( 'is_valid' => $is_valid, 'prompt' => $prompt, 'text_out' => '' );
return $REPORT;
}
# sanitize
$text_out = sanitize_string($text, $allowed_tags);
# get prompt verbage
# get word num
$_word_num = str_word_count(strip_tags($text_out));
# get word label
if ( $_word_num > 1 )
{
$_word_label = 'words';
}
# get word descrip
$_word_descrip = "$_word_num $_word_label";
# minimum length
if ( $_word_num < $min_words )
{
$is_valid = 0;
$prompt = "Too short. Your submission must be at least $min_words $_word_descrip long.";
$REPORT = array( 'is_valid' => $is_valid, 'prompt' => $prompt, 'text_out' => $text_out );
return $REPORT;
}
# maximum length
if ( $_word_num > $max_words )
{
$is_valid = 0;
$prompt = "Too long. Your submission must be no more than $max_words $_word_descrip long.";
$REPORT = array( 'is_valid' => $is_valid, 'prompt' => $prompt, 'text_out' => $text_out );
return $REPORT;
}
# passed
$is_valid = 1;
$prompt = 'validated';
# build REPORT
$REPORT = array( 'is_valid' => $is_valid, 'prompt' => $prompt, 'text_out' => $text_out );
// *** RETURN
return $REPORT;
} # end Fx
/*______________________________________________*/
/* fx sanitize_string
*************************************************/
function sanitize_string($string, $allowed_tags=FALSE) {
// *** DATA
# default allowed tags
$good_tags = '<a><blockquote><br><br /><b><div><em><h1><h2><h3><h4><h5><h6><i>'
. '<img><li><ol><p><pre><span><strong><table><tr><td><th><u><ul>';
# regex
$_regex = '/<(.*?)>/ie';
# return
$clean_string = '';
// *** MANIPULATE
# default allowed tags
if ( empty($allowed_tags) )
{
$allowed_tags = $good_tags;
}
# magic quotes
if ( get_magic_quotes_gpc() )
{
$string = stripslashes($string);
}
# strip tags
$string = strip_tags($string, $allowed_tags);
# sanitize string
$clean_string = preg_replace($_regex, "'<'.strip_attributes('\\1').'>'", $string);
// *** RETURN
return $clean_string;
} // end Fx
/*______________________________________________*/
/* fx strip_attributes
*************************************************/
function strip_attributes($string)
{
// *** DATA
# globals
global $THIS_FORM;
# regex
$_regex = '/(' . $THIS_FORM['bad_atts'] . ')/ie';
# internal
$_replace = 'xxx' . substr(md5(uniqid()),-3) . '_';
# return
$stripped_string = '';
// *** MANIPULATE
# strip unwanted tag markup
$stripped_string = preg_replace($_regex, $_replace, $string);
// *** RETURN
return $stripped_string;
} # end Fx
/*______________________________________________*/
?>
I'm going to try to post it to Googlebase when I have a chance and will continue refining it.
The following script can be used to test it:
// TEST
$wicked_string = 'I like <div good="blah" onContextMenu=blah>php</div> and <span style="color:red;">certain tags like <span> <b><b></b></span> but not <br> XSS <img src=javascript:alert(\'bad!\')>';
$wicked_string_esc = htmlspecialchars($wicked_string);
echo "wicked_string:<br /> $wicked_string <br /><br /><br />";
echo "wicked_string_esc:<br /> $wicked_string_esc <br /><br /><br />";
$REPORT = validate_textarea($wicked_string);
print_r($REPORT);
$cleaned = $REPORT['text_out'];
$cleaned_esc = htmlspecialchars($cleaned);
echo "<br /><br /><br />cleaned:<br /> $cleaned <br /><br /><br />";
echo "cleaned_esc:<br /> $cleaned_esc";
Test Example:
wicked_string:
I like
php
and certain tags like <span> <b> but not
XSS
wicked_string_esc:
I like <div good="blah" onContextMenu=blah>php</div> and <span style="color:red;">certain tags like <span> <b><b></b></span> but not <br> XSS <img src=javascript:alert('bad!')>
Array ( [is_valid] => 1 [prompt] => validated [text_out] => I like
php
and certain tags like <span> <b> but not
XSS )
cleaned:
I like
php
and certain tags like <span> <b> but not
XSS
cleaned_esc:
I like <div good=\"blah\" xxx916_=blah>php</div> and <span style=\"color:red;\">certain tags like <span> <b><b></b></span> but not <br> XSS <img xxx6b3_script:alert('bad!')>
keywords: PHP, MySQL, XSS, form, sanitize, security