lime icon

Phosphorus and Lime

A Developer's Broadsheet

This blog has been deprecated. Please visit my new blog at klenwell.com/press.
PHP: input filters
A procedural approach to the problem.

An excellent object-oriented approach can be found here: cyberai.com (I used it as a reference.)

<?php

/* FILE INFO

Form Package

File: form_package.inc.php
Last Update: Dec 2005
Author: Tom Atwell (klenwell@gmail.com)

FUNCTIONS:


NOTES:

/*______________________________________________*/


// *** GLOBAL SETTINGS

// bad tag attributes
$THIS_FORM['bad_atts'] = 'src\s*=\s*java|action|background|codebase|dynsrc|lowsrc|'.
'onAbort|onActivate|onAfterPrint|onAfterUpdate|onBeforeActivate|'.
'onBeforeCopy|onBeforeCut|onBeforeDeactivate|onBeforeEditFocus|'.
'onBeforePaste|onBeforePrint|onBeforeUnload|onBlur|onBounce|'.
'onCellChange|onChange|onClick|onContextMenu|onControlSelect|onCopy|'.
'onCut|onDataAvailible|onDataSetChanged|onDataSetComplete|onDblClick|'.
'onDeactivate|onDrag|onDragEnd|onDragLeave|onDragEnter|onDragOver|'.
'onDragDrop|onDrop|onError|onErrorUpdate|onExit|onFilterChange|'.
'onFinish|onFocus|onFocusIn|onFocusOut|onHelp|onKeyDown|onKeyPress|'.
'onKeyUp|onLayoutComplete|onLoad|onLoseCapture|onMouseDown|'.
'onMouseEnter|onMouseLeave|onMouseMove|onMouseOut|onMouseOver|'.
'onMouseUp|onMouseWheel|onMove|onMoveEnd|onMoveStart|onPaste|'.
'onProgress|onPropertyChange|onReadyStateChange|onReset|onResize|'.
'onResizeEnd|onResizeStart|onRowEnter|onRowExit|onRowDelete|'.
'onRowInserted|onScroll|onSelect|onSelectionChange|onSelectStart|'.
'onStart|onStop|onSubmit|onUnload';


/* fx validate_textarea
*************************************************/
function validate_textarea($text, $required=TRUE, $max_words=250, $min_words=0, $allowed_tags=FALSE)
{
// *** DATA

# internal
$_prompt = '';
$_word_num = 0;
$_word_label = 'word';

# return
$is_valid = 0;
$prompt = '';
$text_out = $text;

$REPORT = array();


// *** MANIPULATE

# required
if ( $required && empty($text) )
{
$is_valid = 0;
$prompt = 'Please fill in box';
$REPORT = array( 'is_valid' => $is_valid, 'prompt' => $prompt, 'text_out' => '' );
return $REPORT;
}

# sanitize
$text_out = sanitize_string($text, $allowed_tags);

# get prompt verbage

# get word num
$_word_num = str_word_count(strip_tags($text_out));

# get word label
if ( $_word_num > 1 )
{
$_word_label = 'words';
}

# get word descrip
$_word_descrip = "$_word_num $_word_label";


# minimum length
if ( $_word_num < $min_words )
{
$is_valid = 0;
$prompt = "Too short. Your submission must be at least $min_words $_word_descrip long.";
$REPORT = array( 'is_valid' => $is_valid, 'prompt' => $prompt, 'text_out' => $text_out );
return $REPORT;
}

# maximum length
if ( $_word_num > $max_words )
{
$is_valid = 0;
$prompt = "Too long. Your submission must be no more than $max_words $_word_descrip long.";
$REPORT = array( 'is_valid' => $is_valid, 'prompt' => $prompt, 'text_out' => $text_out );
return $REPORT;
}

# passed
$is_valid = 1;
$prompt = 'validated';

# build REPORT
$REPORT = array( 'is_valid' => $is_valid, 'prompt' => $prompt, 'text_out' => $text_out );


// *** RETURN

return $REPORT;

} # end Fx
/*______________________________________________*/



/* fx sanitize_string
*************************************************/
function sanitize_string($string, $allowed_tags=FALSE) {

// *** DATA

# default allowed tags
$good_tags = '<a><blockquote><br><br /><b><div><em><h1><h2><h3><h4><h5><h6><i>'
. '<img><li><ol><p><pre><span><strong><table><tr><td><th><u><ul>';

# regex
$_regex = '/<(.*?)>/ie';

# return
$clean_string = '';


// *** MANIPULATE

# default allowed tags
if ( empty($allowed_tags) )
{
$allowed_tags = $good_tags;
}

# magic quotes
if ( get_magic_quotes_gpc() )
{
$string = stripslashes($string);
}

# strip tags
$string = strip_tags($string, $allowed_tags);

# sanitize string
$clean_string = preg_replace($_regex, "'<'.strip_attributes('\\1').'>'", $string);


// *** RETURN

return $clean_string;

} // end Fx
/*______________________________________________*/



/* fx strip_attributes
*************************************************/
function strip_attributes($string)
{
// *** DATA

# globals
global $THIS_FORM;

# regex
$_regex = '/(' . $THIS_FORM['bad_atts'] . ')/ie';

# internal
$_replace = 'xxx' . substr(md5(uniqid()),-3) . '_';

# return
$stripped_string = '';


// *** MANIPULATE

# strip unwanted tag markup
$stripped_string = preg_replace($_regex, $_replace, $string);


// *** RETURN

return $stripped_string;

} # end Fx
/*______________________________________________*/

?>


I'm going to try to post it to Googlebase when I have a chance and will continue refining it.

The following script can be used to test it:

// TEST
$wicked_string = 'I like <div good="blah" onContextMenu=blah>php</div> and <span style="color:red;">certain tags like &lt;span&gt; <b>&lt;b&gt;</b></span> but not <br> XSS <img src=javascript:alert(\'bad!\')>';
$wicked_string_esc = htmlspecialchars($wicked_string);

echo "wicked_string:<br /> $wicked_string <br /><br /><br />";
echo "wicked_string_esc:<br /> $wicked_string_esc <br /><br /><br />";
$REPORT = validate_textarea($wicked_string);
print_r($REPORT);
$cleaned = $REPORT['text_out'];
$cleaned_esc = htmlspecialchars($cleaned);
echo "<br /><br /><br />cleaned:<br /> $cleaned <br /><br /><br />";
echo "cleaned_esc:<br /> $cleaned_esc";


Test Example:

wicked_string:
I like
php
and certain tags like <span> <b> but not
XSS


wicked_string_esc:
I like <div good="blah" onContextMenu=blah>php</div> and <span style="color:red;">certain tags like &lt;span&gt; <b>&lt;b&gt;</b></span> but not <br> XSS <img src=javascript:alert('bad!')>


Array ( [is_valid] => 1 [prompt] => validated [text_out] => I like
php
and certain tags like <span> <b> but not
XSS )


cleaned:
I like
php
and certain tags like <span> <b> but not
XSS


cleaned_esc:
I like <div good=\"blah\" xxx916_=blah>php</div> and <span style=\"color:red;\">certain tags like &lt;span&gt; <b>&lt;b&gt;</b></span> but not <br> XSS <img xxx6b3_script:alert('bad!')>


keywords: PHP, MySQL, XSS, form, sanitize, security