Imagine visiting a popular blog and suddenly seeing an alert pop-up or being redirected to a suspicious site. This is often a result of unsanitized HTML content. Sanitizing HTML responses mitigates security vulnerabilities, ensuring a safe and trustworthy user experience.
Understanding HTML Sanitization
HTML sanitization is the process of cleaning HTML content to ensure it only contains safe elements. This is crucial because HTML content can include scripts and other elements that pose security risks.
Common Threats
XSS (Cross-Site Scripting): Malicious scripts injected into webpages. Attackers can exploit unsanitized content to execute malicious scripts in the user's browser.
Code Injection: Harmful code that can affect users’ browsers and systems. This can lead to unauthorized data access or further malicious attacks.
The Role of HTML Sanitization in CMS
Content Management Systems (CMSs) help users create and manage content easily. However, unscreened HTML from CMSs can introduce XSS and other security risks.
When a CMS delivers content, it can include user-generated content that may not always be trustworthy. Hence, sanitizing HTML responses from a CMS is vital to ensure content safety and to protect end-users from potential security threats.
The sanitize-html Library
The sanitize-html
library is a robust solution designed to sanitize HTML content, ensuring that only safe and intended content is displayed.
Key Features and Benefits
Customizable Tags and Attributes: Control over which HTML tags and attributes are permitted.
Filtering Specific Tags: Allows elimination or transformation of unwanted tags.
Handling CSS: Specifies which CSS properties and values are allowed.
Disallowed Tags Management: Manages non-permitted tags by either discarding their contents or escaping them.
Installation
npm:
npm install sanitize-html
Yarn:
yarn add sanitize-html
Usage Examples
In the Browserimport sanitizeHtml from 'sanitize-html';
const html = "<strong>hello world</strong>";
console.log(sanitizeHtml(html)); // Output: <strong>hello world</strong>
In Node.js// Importing in ES modules
import sanitizeHtml from 'sanitize-html';
// Importing in CommonJS
const sanitizeHtml = require('sanitize-html');
const dirty = 'some really tacky HTML';
const clean = sanitizeHtml(dirty);
console.log(clean); // Sanitized HTML content
Filtering Specific Tagsconst clean = sanitizeHtml(dirty, {
allowedTags: sanitizeHtml.defaults.allowedTags.concat(['img'])
});
Disallowed Tags ManagementTo discard entire contents:
const clean = sanitizeHtml(dirty, {
disallowedTagsMode: 'completelyDiscard'
});
To escape disallowed tags:
const clean = sanitizeHtml(dirty, {
disallowedTagsMode: 'escape'
});
Best Practices
Server-Side Sanitization: Perform sanitization on the server for increased security.
Maintain Original Case for SVG:
const clean = sanitizeHtml(dirty, {
allowedTags: ['svg', 'g', 'defs', 'linearGradient', 'stop', 'circle'],
parser: {
lowerCaseTags: false,
lowerCaseAttributeNames: false
}
});
Support for CSS Styles:
const clean = sanitizeHtml(dirty, {
allowedStyles: {
'*': {
'color': [/^#[0-9a-fA-F]{3,6}$/], // Allow only hex colors
'text-align': [/^left$/, /^right$/, /^center$/]
},
'p': {
'font-size': [/^\d+rem$/]
}
}
});
Practical Implementation: Wisp CMS Blog Post Content
Wisp CMS offers a seamless headless CMS experience, providing tools to manage and deliver content securely. Let's explore how sanitization is implemented within the Wisp CMS blog starter kit.
Sanitization in Wisp CMS
The BlogPostContent.tsx
component is responsible for displaying sanitized blog post content. It utilizes sanitize-html
to ensure that the content is safe before rendering it.
import sanitize, { defaults } from "sanitize-html";
export const PostContent = ({ content }: { content: string }) => {
const sanitizedContent = sanitize(content, {
allowedTags: [
"b", "i", "em", "strong", "a", "img", "h1", "h2", "h3", "code", "pre",
"p", "li", "ul", "ol", "blockquote", "td", "th", "table", "tr", "tbody",
"thead", "tfoot", "small", "div", "iframe"
],
allowedAttributes: {
...defaults.allowedAttributes,
"*": ["style"],
iframe: ["src", "allowfullscreen", "style"],
},
allowedIframeHostnames: ["www.youtube.com", "www.youtube-nocookie.com"],
});
return (
<div
className="blog-content mx-auto"
dangerouslySetInnerHTML={{ __html: sanitizedContent }}
></div>
);
};
This code ensures that only a defined set of HTML tags and attributes are allowed, reducing security risks. The use of dangerouslySetInnerHTML
in conjunction with sanitized content allows for safe HTML content rendering.
Benefits of Sanitizing HTML Responses
Improved Security: Sanitization prevents XSS and other vulnerabilities.
Content Integrity: Ensures safe and accurate content delivery.
User Trust: Maintains and enhances user trust by ensuring secure content.
Conclusion
Sanitizing HTML responses from a CMS is crucial to secure content delivery and protect users from malicious threats. Using tools like sanitize-html
makes this task efficient and reliable.
Next Steps
Try the Blog Starter Kit: Discover the benefits by trying out our Wisp CMS blog starter kit.
Explore Wisp CMS: Consider using Wisp as a headless CMS for your website’s content needs. Learn more at our Wisp blog.