Update core 8.3.0

This commit is contained in:
Rob Davies 2017-04-13 15:53:35 +01:00
parent da7a7918f8
commit cd7a898e66
6144 changed files with 132297 additions and 87747 deletions

View file

@ -0,0 +1,26 @@
# Changelog
All notable changes to this project will be documented in this file, in reverse chronological order by release.
## 2.5.2 - 2016-06-30
### Added
- [#11](https://github.com/zendframework/zend-escaper/pull/11),
[#12](https://github.com/zendframework/zend-escaper/pull/12), and
[#13](https://github.com/zendframework/zend-escaper/pull/13) prepare and
publish documentation to https://zendframework.github.io/zend-escaper/
### Deprecated
- Nothing.
### Removed
- Nothing.
### Fixed
- [#3](https://github.com/zendframework/zend-escaper/pull/3) updates the
the escaping mechanism to add support for escaping characters outside the Basic
Multilingual Plane when escaping for JS, CSS, or HTML attributes.

View file

@ -0,0 +1,43 @@
# Contributor Code of Conduct
The Zend Framework project adheres to [The Code Manifesto](http://codemanifesto.com)
as its guidelines for contributor interactions.
## The Code Manifesto
We want to work in an ecosystem that empowers developers to reach their
potential — one that encourages growth and effective collaboration. A space that
is safe for all.
A space such as this benefits everyone that participates in it. It encourages
new developers to enter our field. It is through discussion and collaboration
that we grow, and through growth that we improve.
In the effort to create such a place, we hold to these values:
1. **Discrimination limits us.** This includes discrimination on the basis of
race, gender, sexual orientation, gender identity, age, nationality, technology
and any other arbitrary exclusion of a group of people.
2. **Boundaries honor us.** Your comfort levels are not everyones comfort
levels. Remember that, and if brought to your attention, heed it.
3. **We are our biggest assets.** None of us were born masters of our trade.
Each of us has been helped along the way. Return that favor, when and where
you can.
4. **We are resources for the future.** As an extension of #3, share what you
know. Make yourself a resource to help those that come after you.
5. **Respect defines us.** Treat others as you wish to be treated. Make your
discussions, criticisms and debates from a position of respectfulness. Ask
yourself, is it true? Is it necessary? Is it constructive? Anything less is
unacceptable.
6. **Reactions require grace.** Angry responses are valid, but abusive language
and vindictive actions are toxic. When something happens that offends you,
handle it assertively, but be respectful. Escalate reasonably, and try to
allow the offender an opportunity to explain themselves, and possibly correct
the issue.
7. **Opinions are just that: opinions.** Each and every one of us, due to our
background and upbringing, have varying opinions. The fact of the matter, is
that is perfectly acceptable. Remember this: if you respect your own
opinions, you should respect the opinions of others.
8. **To err is human.** You might not intend it, but mistakes do happen and
contribute to build experience. Tolerate honest mistakes, and don't hesitate
to apologize if you make one yourself.

View file

@ -227,3 +227,8 @@ repository, we suggest doing some cleanup of these branches.
```console
$ git push {username} :<branchname>
```
## Conduct
Please see our [CONDUCT.md](CONDUCT.md) to understand expected behavior when interacting with others in the project.

View file

@ -5,10 +5,9 @@
The OWASP Top 10 web security risks study lists Cross-Site Scripting (XSS) in
second place. PHPs sole functionality against XSS is limited to two functions
of which one is commonly misapplied. Thus, the `Zend\Escaper` component was written.
of which one is commonly misapplied. Thus, the zend-escaper component was written.
It offers developers a way to escape output and defend from XSS and related
vulnerabilities by introducing contextual escaping based on peer-reviewed rules.
- File issues at https://github.com/zendframework/zend-escaper/issues
- Documentation is at http://framework.zend.com/manual/current/en/index.html#zend-escaper
- Documentation is at https://zendframework.github.io/zend-escaper/

View file

@ -13,7 +13,7 @@
}
},
"require": {
"php": ">=5.3.23"
"php": ">=5.5"
},
"minimum-stability": "dev",
"prefer-stable": true,
@ -32,4 +32,4 @@
"fabpot/php-cs-fixer": "1.7.*",
"phpunit/PHPUnit": "~4.0"
}
}
}

View file

@ -0,0 +1,21 @@
# Configuration
`Zend\Escaper\Escaper` has only one configuration option available, and that is
the encoding to be used by the `Escaper` instance.
The default encoding is **utf-8**. Other supported encodings are:
- iso-8859-1
- iso-8859-5
- iso-8859-15
- cp866, ibm866, 866
- cp1251, windows-1251
- cp1252, windows-1252
- koi8-r, koi8-ru
- big5, big5-hkscs, 950, gb2312, 936
- shift\_jis, sjis, sjis-win, cp932
- eucjp, eucjp-win
- macroman
If an unsupported encoding is passed to `Zend\Escaper\Escaper`, a
`Zend\Escaper\Exception\InvalidArgumentException` will be thrown.

View file

@ -0,0 +1,74 @@
# Escaping Cascading Style Sheets
CSS is similar to [escaping Javascript](escaping-javascript.md). CSS escaping
excludes only basic alphanumeric characters and escapes all other characters
into valid CSS hexadecimal escapes.
## Example of Bad CSS Escaping
In most cases developers forget to escape CSS completely:
```php
<?php header('Content-Type: application/xhtml+xml; charset=UTF-8'); ?>
<!DOCTYPE html>
<?php
$input = <<<INPUT
body {
background-image: url('http://example.com/foo.jpg?</style><script>alert(1)</script>');
}
INPUT;
?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Unescaped CSS</title>
<meta charset="UTF-8"/>
<style>
<?= $input ?>
</style>
</head>
<body>
<p>User controlled CSS needs to be properly escaped!</p>
</body>
</html>
```
In the above example, by failing to escape the user provided CSS, an attacker
can execute an XSS attack fairly easily.
## Example of Good CSS Escaping
By using `escapeCss()` method in the CSS context, such attacks can be prevented:
```php
<?php header('Content-Type: application/xhtml+xml; charset=UTF-8'); ?>
<!DOCTYPE html>
<?php
$input = <<<INPUT
body {
background-image: url('http://example.com/foo.jpg?</style><script>alert(1)</script>');
}
INPUT;
$escaper = new Zend\Escaper\Escaper('utf-8');
$output = $escaper->escapeCss($input);
?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Escaped CSS</title>
<meta charset="UTF-8"/>
<style>
<?php
// output will look something like
// body\20 \7B \A \20 \20 \20 \20 background\2D image\3A \20 url\28 ...
echo $output;
?>
</style>
</head>
<body>
<p>User controlled CSS needs to be properly escaped!</p>
</body>
</html>
```
By properly escaping user controlled CSS, we can prevent XSS attacks in our web
applications.

View file

@ -0,0 +1,128 @@
# Escaping HTML Attributes
Escaping data in **HTML Attribute** contexts is most often done incorrectly, if
not overlooked completely by developers. Regular [HTML
escaping](escaping-html.md) can be used for escaping HTML attributes *only* if
the attribute value can be **guaranteed as being properly quoted**! To avoid
confusion, we recommend always using the HTML Attribute escaper method when
dealing with HTTP attributes specifically.
To escape data for an HTML Attribute, use `Zend\Escaper\Escaper`'s
`escapeHtmlAttr()` method. Internally it will convert the data to UTF-8, check
for its validity, and use an extended set of characters to escape that are not
covered by `htmlspecialchars()` to cover the cases where an attribute might be
unquoted or quoted illegally.
## Examples of Bad HTML Attribute Escaping
An example of incorrect HTML attribute escaping:
```php
<?php header('Content-Type: text/html; charset=UTF-8'); ?>
<!DOCTYPE html>
<?php
$input = <<<INPUT
' onmouseover='alert(/ZF2!/);
INPUT;
/**
* NOTE: This is equivalent to using htmlspecialchars($input, ENT_COMPAT)
*/
$output = htmlspecialchars($input);
?>
<html>
<head>
<title>Single Quoted Attribute</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div>
<?php
// the span tag will look like:
// <span title='' onmouseover='alert(/ZF2!/);'>
?>
<span title='<?= $output ?>'>
What framework are you using?
</span>
</div>
</body>
</html>
```
In the above example, the default `ENT_COMPAT` flag is being used, which does
not escape single quotes, thus resulting in an alert box popping up when the
`onmouseover` event happens on the `span` element.
Another example of incorrect HTML attribute escaping can happen when unquoted
attributes are used (which is, by the way, perfectly valid HTML5):
```php
<?php header('Content-Type: text/html; charset=UTF-8'); ?>
<!DOCTYPE html>
<?php
$input = <<<INPUT
faketitle onmouseover=alert(/ZF2!/);
INPUT;
// Tough luck using proper flags when the title attribute is unquoted!
$output = htmlspecialchars($input, ENT_QUOTES);
?>
<html>
<head>
<title>Quoteless Attribute</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div>
<?php
// the span tag will look like:
// <span title=faketitle onmouseover=alert(/ZF2!/);>
?>
<span title=<?= $output ?>>
What framework are you using?
</span>
</div>
</body>
</html>
```
The above example shows how it is easy to break out from unquoted attributes in
HTML5.
## Example of Good HTML Attribute Escaping
Both of the previous examples can be avoided by simply using the
`escapeHtmlAttr()` method:
```php
<?php header('Content-Type: text/html; charset=UTF-8'); ?>
<!DOCTYPE html>
<?php
$input = <<<INPUT
faketitle onmouseover=alert(/ZF2!/);
INPUT;
$escaper = new Zend\Escaper\Escaper('utf-8');
$output = $escaper->escapeHtmlAttr($input);
?>
<html>
<head>
<title>Quoteless Attribute</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div>
<?php
// the span tag will look like:
// <span title=faketitle&#x20;onmouseover&#x3D;alert&#x28;&#x2F;ZF2&#x21;&#x2F;&#x29;&#x3B;>
?>
<span title=<?= $output ?>>
What framework are you using?
</span>
</div>
</body>
</html>
```
In the above example, the malicious input from the attacker becomes completely
harmless as we used proper HTML attribute escaping!

View file

@ -0,0 +1,74 @@
# Escaping HTML
Probably the most common escaping happens for **HTML body** contexts. There are
very few characters with special meaning in this context, yet it is quite common
to escape data incorrectly, namely by setting the wrong flags and character
encoding.
For escaping data to use within an HTML body context, use
`Zend\Escaper\Escaper`'s `escapeHtml()` method. Internally it uses PHP's
`htmlspecialchars()`, correctly setting the flags and encoding for you.
```php
// Outputting this without escaping would be a bad idea!
$input = '<script>alert("zf2")</script>';
$escaper = new Zend\Escaper\Escaper('utf-8');
// somewhere in an HTML template
<div class="user-provided-input">
<?= $escaper->escapeHtml($input) // all safe! ?>
</div>
```
One thing a developer needs to pay special attention to is the encoding in which
the document is served to the client, as it **must be the same** as the encoding
used for escaping!
## Example of Bad HTML Escaping
An example of incorrect usage:
```php
<?php
$input = '<script>alert("zf2")</script>';
$escaper = new Zend\Escaper\Escaper('utf-8');
?>
<?php header('Content-Type: text/html; charset=ISO-8859-1'); ?>
<!DOCTYPE html>
<html>
<head>
<title>Encodings set incorrectly!</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>
<body>
<?php
// Bad! The escaper's and the document's encodings are different!
echo $escaper->escapeHtml($input);
?>
</body>
```
## Example of Good HTML Escaping
An example of correct usage:
```php
<?php
$input = '<script>alert("zf2")</script>';
$escaper = new Zend\Escaper\Escaper('utf-8');
?>
<?php header('Content-Type: text/html; charset=UTF-8'); ?>
<!DOCTYPE html>
<html>
<head>
<title>Encodings set correctly!</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<?php
// Good! The escaper's and the document's encodings are same!
echo $escaper->escapeHtml($input);
?>
</body>
```

View file

@ -0,0 +1,93 @@
# Escaping Javascript
Javascript string literals in HTML are subject to significant restrictions due
to the potential for unquoted attributes and uncertainty as to whether
Javascript will be viewed as being `CDATA` or `PCDATA` by the browser. To
eliminate any possible XSS vulnerabilities, Javascript escaping for HTML extends
the escaping rules of both ECMAScript and JSON to include any potentially
dangerous character. Very similar to HTML attribute value escaping, this means
escaping everything except basic alphanumeric characters and the comma, period,
and underscore characters as hexadecimal or unicode escapes.
Javascript escaping applies to all literal strings and digits. It is not
possible to safely escape other Javascript markup.
To escape data in the **Javascript context**, use `Zend\Escaper\Escaper`'s
`escapeJs()` method. An extended set of characters are escaped beyond
ECMAScript's rules for Javascript literal string escaping in order to prevent
misinterpretation of Javascript as HTML leading to the injection of special
characters and entities.
## Example of Bad Javascript Escaping
An example of incorrect Javascript escaping:
```php
<?php header('Content-Type: application/xhtml+xml; charset=UTF-8'); ?>
<!DOCTYPE html>
<?php
$input = <<<INPUT
bar&quot;; alert(&quot;Meow!&quot;); var xss=&quot;true
INPUT;
$output = json_encode($input);
?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Unescaped Entities</title>
<meta charset="UTF-8"/>
<script type="text/javascript">
<?php
// this will result in
// var foo = "bar&quot;; alert(&quot;Meow!&quot;); var xss=&quot;true";
?>
var foo = <?= $output ?>;
</script>
</head>
<body>
<p>json_encode() is not good for escaping javascript!</p>
</body>
</html>
```
The above example will show an alert popup box as soon as the page is loaded,
because the data is not properly escaped for the Javascript context.
## Example of Good Javascript Escaping
By using the `escapeJs()` method in the Javascript context, such attacks can be
prevented:
```php
<?php header('Content-Type: text/html; charset=UTF-8'); ?>
<!DOCTYPE html>
<?php
$input = <<<INPUT
bar&quot;; alert(&quot;Meow!&quot;); var xss=&quot;true
INPUT;
$escaper = new Zend\Escaper\Escaper('utf-8');
$output = $escaper->escapeJs($input);
?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Escaped Entities</title>
<meta charset="UTF-8"/>
<script type="text/javascript">
<?php
// this will look like
// var foo =
bar\x26quot\x3B\x3B\x20alert\x28\x26quot\x3BMeow\x21\x26quot\x3B\x29\x3B\x20var\x20xss\x3D\x26quot\x3Btrue;
?>
var foo = <?= $output ?>;
</script>
</head>
<body>
<p>Zend\Escaper\Escaper::escapeJs() is good for escaping javascript!</p>
</body>
</html>
```
In the above example, the Javascript parser will most likely report a
`SyntaxError`, but at least the targeted application remains safe from such
attacks.

View file

@ -0,0 +1,57 @@
# Escaping URLs
This method is basically an alias for PHP's `rawurlencode()` which has applied
RFC 3986 since PHP 5.3. It is included primarily for consistency.
URL escaping applies to data being inserted into a URL and not to the whole URL
itself.
## Example of Bad URL Escaping
XSS attacks are easy if data inserted into URLs is not escaped properly:
```php
<?php header('Content-Type: application/xhtml+xml; charset=UTF-8'); ?>
<!DOCTYPE html>
<?php
$input = <<<INPUT
" onmouseover="alert('zf2')
INPUT;
?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Unescaped URL data</title>
<meta charset="UTF-8"/>
</head>
<body>
<a href="http://example.com/?name=<?= $input ?>">Click here!</a>
</body>
</html>
```
## Example of Good URL Escaping
By properly escaping data in URLs by using `escapeUrl()`, we can prevent XSS
attacks:
```php
<?php header('Content-Type: application/xhtml+xml; charset=UTF-8'); ?>
<!DOCTYPE html>
<?php
$input = <<<INPUT
" onmouseover="alert('zf2')
INPUT;
$escaper = new Zend\Escaper\Escaper('utf-8');
$output = $escaper->escapeUrl($input);
?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Unescaped URL data</title>
<meta charset="UTF-8"/>
</head>
<body>
<a href="http://example.com/?name=<?= $output ?>">Click here!</a>
</body>
</html>
```

View file

@ -0,0 +1,10 @@
<div class="container">
<div class="jumbotron">
<h1>zend-escaper</h1>
<p>Securely and safely escape HTML, HTML attributes, JavaScript, CSS, and URLs.</p>
<pre><code class="language-bash">$ composer require zendframework/zend-escaper</code></pre>
</div>
</div>

View file

@ -0,0 +1 @@
../../README.md

View file

@ -0,0 +1,51 @@
# Introduction
The [OWASP Top 10 web security risks](https://www.owasp.org/index.php/Top_10_2010-Main)
study lists Cross-Site Scripting (XSS) in second place. PHP's sole functionality
against XSS is limited to two functions of which one is commonly misapplied.
Thus, the zend-escaper component was written. It offers developers a way to
escape output and defend from XSS and related vulnerabilities by introducing
**contextual escaping based on peer-reviewed rules**.
zend-escaper was written with ease of use in mind, so it can be used completely stand-alone from
the rest of the framework, and as such can be installed with Composer:
```bash
$ composer install zendframework/zend-escaper
```
Several Zend Framework components provide integrations for consuming
zend-escaper, including [zend-view](https://github.com/zendframework/zend-view),
which provides a set of helpers that consume it.
> ### Security
>
> zend-escaper is a security related component. As such, if you believe you have
> found an issue, we ask that you follow our [Security Policy](http://framework.zend.com/security/)
> and report security issues accordingly. The Zend Framework team and the
> contributors thank you in advance.
## Overview
zend-escaper provides one class, `Zend\Escaper\Escaper`, which in turn provides
five methods for escaping output. Which method to use depends on the context in
which the output is used. It is up to the developer to use the right methods in
the right context.
`Zend\Escaper\Escaper` has the following escaping methods available for each context:
- `escapeHtml`: escape a string for an HTML body context.
- `escapeHtmlAttr`: escape a string for an HTML attribute context.
- `escapeJs`: escape a string for a Javascript context.
- `escapeCss`: escape a string for a CSS context.
- `escapeUrl`: escape a string for a URI or URI parameter context.
Usage of each method will be discussed in detail in later chapters.
## What zend-Escaper is not
zend-escaper is meant to be used only for *escaping data for output*, and as
such should not be misused for *filtering input data*. For such tasks, use
[zend-filter](https://zendframework.github.io/zend-filter/),
[HTMLPurifier](http://htmlpurifier.org/) or PHP's
[Filter](http://php.net/filter) functionality should be used.

View file

@ -0,0 +1,147 @@
# Theory of Operation
zend-escaper provides methods for escaping output data, dependent on the context
in which the data will be used. Each method is based on peer-reviewed rules and
is in compliance with the current OWASP recommendations.
The escaping follows a well-known and fixed set of encoding rules defined by
OWASP for each key HTML context. These rules cannot be impacted or negated by
browser quirks or edge-case HTML parsing unless the browser suffers a
catastrophic bug in its HTML parser or Javascript interpreter &mdash; both of
these are unlikely.
The contexts in which zend-escaper should be used are **HTML Body**, **HTML
Attribute**, **Javascript**, **CSS**, and **URL/URI** contexts.
Every escaper method will take the data to be escaped, make sure it is utf-8
encoded data (or try to convert it to utf-8), perform context-based escaping,
encode the escaped data back to its original encoding, and return the data to
the caller.
The actual escaping of the data differs between each method; they all have their
own set of rules according to which escaping is performed. An example will allow
us to clearly demonstrate the difference, and how the same characters are being
escaped differently between contexts:
```php
$escaper = new Zend\Escaper\Escaper('utf-8');
// &lt;script&gt;alert(&quot;zf2&quot;)&lt;/script&gt;
echo $escaper->escapeHtml('<script>alert("zf2")</script>');
// &lt;script&gt;alert&#x28;&quot;zf2&quot;&#x29;&lt;&#x2F;script&gt;
echo $escaper->escapeHtmlAttr('<script>alert("zf2")</script>');
// \x3Cscript\x3Ealert\x28\x22zf2\x22\x29\x3C\x2Fscript\x3E
echo $escaper->escapeJs('<script>alert("zf2")</script>');
// \3C script\3E alert\28 \22 zf2\22 \29 \3C \2F script\3E
echo $escaper->escapeCss('<script>alert("zf2")</script>');
// %3Cscript%3Ealert%28%22zf2%22%29%3C%2Fscript%3E
echo $escaper->escapeUrl('<script>alert("zf2")</script>');
```
More detailed examples will be given in later chapters.
## The Problem with Inconsistent Functionality
At present, programmers orient towards the following PHP functions for each
common HTML context:
- **HTML Body**: `htmlspecialchars()` or `htmlentities()`
- **HTML Attribute**: `htmlspecialchars()` or `htmlentities()`
- **Javascript**: `addslashes()` or `json_encode()`
- **CSS**: n/a
- **URL/URI**: `rawurlencode()` or `urlencode()`
In practice, these decisions appear to depend more on what PHP offers, and if it
can be interpreted as offering sufficient escaping safety, than it does on what
is recommended in reality to defend against XSS. While these functions can
prevent some forms of XSS, they do not cover all use cases or risks and are
therefore insufficient defenses.
Using `htmlspecialchars()` in a perfectly valid HTML5 unquoted attribute value,
for example, is completely useless since the value can be terminated by a space
(among other things), which is never escaped. Thus, in this instance, we have a
conflict between a widely used HTML escaper and a modern HTML specification,
with no specific function available to cover this use case. While it's tempting
to blame users, or the HTML specification authors, escaping just needs to deal
with whatever HTML and browsers allow.
Using `addslashes()`, custom backslash escaping, or `json_encode()` will
typically ignore HTML special characters such as ampersands, which may be used
to inject entities into Javascript. Under the right circumstances, the browser
will convert these entities into their literal equivalents before interpreting
Javascript, thus allowing attackers to inject arbitrary code.
Inconsistencies with valid HTML, insecure default parameters, lack of character
encoding awareness, and misrepresentations of what functions are capable of by
some programmers &mdash; these all make escaping in PHP an unnecessarily
convoluted quest.
To circumvent the lack of escaping methods in PHP, zend-escaper addresses the
need to apply context-specific escaping in web applications. It implements
methods that specifically target XSS and offers programmers a tool to secure
their applications without misusing other inadequate methods, or using, most
likely incomplete, home-grown solutions.
## Why Contextual Escaping?
To understand why multiple standardised escaping methods are needed, what
follows are several quick points; they are by no means a complete set of
reasons, however!
### HTML escaping of unquoted HTML attribute values still allows XSS
This is probably the best known way to defeat `htmlspecialchars()` when used on
attribute values, since any space (or character interpreted as a space &mdash;
there are a lot) lets you inject new attributes whose content can't be
neutralised by HTML escaping. The solution (where this is possible) is
additional escaping as defined by the OWASP ESAPI codecs. The point here can be
extended further &mdash; escaping only works if a programmer or designer knows
what they're doing. In many contexts, there are additional practices and gotchas
that need to be carefully monitored since escaping sometimes needs a little
extra help to protect against XSS &mdash; even if that means ensuring all
attribute values are properly double quoted despite this not being required for
valid HTML.
### HTML escaping of CSS, Javascript or URIs is often reversed when passed to non-HTML interpreters by the browser
HTML escaping is just that &mdsash; it's designed to escape a string for HTML
(i.e. prevent tag or attribute insertion), but not alter the underlying meaning
of the content, whether it be text, Javascript, CSS, or URIs. For that purpose,
a fully HTML-escaped version of any other context may still have its unescaped
form extracted before it's interpreted or executed. For this reason we need
separate escapers for Javascript, CSS, and URIs, and developers or designers
writing templates **must** know which escaper to apply to which context. Of
course, this means you need to be able to identify the correct context before
selecting the right escaper!
### DOM-based XSS requires a defence using at least two levels of different escaping in many cases
DOM-based XSS has become increasingly common as Javascript has taken off in
popularity for large scale client-side coding. A simple example is Javascript
defined in a template which inserts a new piece of HTML text into the DOM. If
the string is only HTML escaped, it may still contain Javascript that will
execute in that context. If the string is only Javascript-escaped, it may
contain HTML markup (new tags and attributes) which will be injected into the
DOM and parsed once the inserting Javascript executes. Damned either way? The
solution is to escape twice &mdash; first escape the string for HTML (make it
safe for DOM insertion), and then for Javascript (make it safe for the current
Javascript context). Nested contexts are a common means of bypassing naive
escaping habits (e.g. you can inject Javascript into a CSS expression within an
HTML attribute).
### PHP has no known anti-XSS escape functions (only those kidnapped from their original purposes)
A simple example, widely used, is when you see `json_encode()` used to escape
Javascript, or worse, some kind of mutant `addslashes()` implementation. These
were never designed to eliminate XSS, yet PHP programmers use them as such. For
example, `json_encode()` does not escape the ampersand or semi-colon characters
by default. That means you can easily inject HTML entities which could then be
decoded before the Javascript is evaluated in a HTML document. This lets you
break out of strings, add new JS statements, close tags, etc. In other words,
using `json_encode()` is insufficient and naive. The same, arguably, could be
said for `htmlspecialchars()` which has its own well known limitations that make
a singular reliance on it a questionable practice.

View file

@ -0,0 +1,17 @@
docs_dir: doc/book
site_dir: doc/html
pages:
- index.md
- Intro: intro.md
- Reference:
- "Theory of Operation": theory-of-operation.md
- Configuration: configuration.md
- "Escaping HTML": escaping-html.md
- "Escaping HTML Attributes": escaping-html-attributes.md
- "Escaping Javascript": escaping-javascript.md
- "Escaping CSS": escaping-css.md
- "Escaping URLs": escaping-url.md
site_name: zend-escaper
site_description: zend-escaper
repo_url: 'https://github.com/zendframework/zend-escaper'
copyright: 'Copyright (c) 2016 <a href="http://www.zend.com/">Zend Technologies USA Inc.</a>'

View file

@ -24,12 +24,12 @@ class Escaper
*
* @var array
*/
protected static $htmlNamedEntityMap = array(
protected static $htmlNamedEntityMap = [
34 => 'quot', // quotation mark
38 => 'amp', // ampersand
60 => 'lt', // less-than sign
62 => 'gt', // greater-than sign
);
];
/**
* Current encoding for escaping. If not UTF-8, we convert strings from this encoding
@ -41,13 +41,11 @@ class Escaper
/**
* Holds the value of the special flags passed as second parameter to
* htmlspecialchars(). We modify these for PHP 5.4 to take advantage
* of the new ENT_SUBSTITUTE flag for correctly dealing with invalid
* UTF-8 sequences.
* htmlspecialchars().
*
* @var string
* @var int
*/
protected $htmlSpecialCharsFlags = ENT_QUOTES;
protected $htmlSpecialCharsFlags;
/**
* Static Matcher which escapes characters for HTML Attribute contexts
@ -75,7 +73,7 @@ class Escaper
*
* @var array
*/
protected $supportedEncodings = array(
protected $supportedEncodings = [
'iso-8859-1', 'iso8859-1', 'iso-8859-5', 'iso8859-5',
'iso-8859-15', 'iso8859-15', 'utf-8', 'cp866',
'ibm866', '866', 'cp1251', 'windows-1251',
@ -85,12 +83,11 @@ class Escaper
'big5-hkscs', 'shift_jis', 'sjis', 'sjis-win',
'cp932', '932', 'euc-jp', 'eucjp',
'eucjp-win', 'macroman'
);
];
/**
* Constructor: Single parameter allows setting of global encoding for use by
* the current object. If PHP 5.4 is detected, additional ENT_SUBSTITUTE flag
* is set for htmlspecialchars() calls.
* the current object.
*
* @param string $encoding
* @throws Exception\InvalidArgumentException
@ -116,14 +113,13 @@ class Escaper
$this->encoding = $encoding;
}
if (defined('ENT_SUBSTITUTE')) {
$this->htmlSpecialCharsFlags|= ENT_SUBSTITUTE;
}
// We take advantage of ENT_SUBSTITUTE flag to correctly deal with invalid UTF-8 sequences.
$this->htmlSpecialCharsFlags = ENT_QUOTES | ENT_SUBSTITUTE;
// set matcher callbacks
$this->htmlAttrMatcher = array($this, 'htmlAttrMatcher');
$this->jsMatcher = array($this, 'jsMatcher');
$this->cssMatcher = array($this, 'cssMatcher');
$this->htmlAttrMatcher = [$this, 'htmlAttrMatcher'];
$this->jsMatcher = [$this, 'jsMatcher'];
$this->cssMatcher = [$this, 'cssMatcher'];
}
/**
@ -248,7 +244,7 @@ class Escaper
* replace it with while grabbing the integer value of the character.
*/
if (strlen($chr) > 1) {
$chr = $this->convertEncoding($chr, 'UTF-16BE', 'UTF-8');
$chr = $this->convertEncoding($chr, 'UTF-32BE', 'UTF-8');
}
$hex = bin2hex($chr);
@ -281,7 +277,13 @@ class Escaper
return sprintf('\\x%02X', ord($chr));
}
$chr = $this->convertEncoding($chr, 'UTF-16BE', 'UTF-8');
return sprintf('\\u%04s', strtoupper(bin2hex($chr)));
$hex = strtoupper(bin2hex($chr));
if (strlen($hex) <= 4) {
return sprintf('\\u%04s', $hex);
}
$highSurrogate = substr($hex, 0, 4);
$lowSurrogate = substr($hex, 4, 4);
return sprintf('\\u%04s\\u%04s', $highSurrogate, $lowSurrogate);
}
/**
@ -297,7 +299,7 @@ class Escaper
if (strlen($chr) == 1) {
$ord = ord($chr);
} else {
$chr = $this->convertEncoding($chr, 'UTF-16BE', 'UTF-8');
$chr = $this->convertEncoding($chr, 'UTF-32BE', 'UTF-8');
$ord = hexdec(bin2hex($chr));
}
return sprintf('\\%X ', $ord);