When working with content management systems (CMS), ensuring the safety and cleanliness of HTML responses is paramount. Unchecked HTML can lead to various security vulnerabilities, including Cross-Site Scripting (XSS) attacks. In this article, we will explore how to effectively sanitize HTML responses from a CMS using the sanitize-html
library. We'll also discuss a practical implementation of sanitization within the Wisp CMS blog starter kit.
Why HTML Sanitization is Important
HTML sanitization is a critical process for web security. It helps in:
Preventing XSS Attacks: Malicious scripts can be embedded in HTML content to steal user data or hijack user sessions.
Ensuring Content Integrity: Sanitation maintains the integrity of your content by allowing only safe HTML tags and attributes.
Enhancing User Experience: Clean HTML ensures that the content is displayed correctly across different browsers and devices.
Overview of the sanitize-html
Library
sanitize-html
is a popular library designed to clean HTML content. It offers flexible configuration options to permit or block specific tags and attributes, ensuring that only safe content is rendered.
Installation
To install sanitize-html
, you can use either npm or yarn:
npm install sanitize-html
# or
yarn add sanitize-html
Basic Usage
The basic usage of sanitize-html
is straightforward:
import sanitizeHtml from 'sanitize-html';
const dirty = '<strong>Hello World</strong>';
const clean = sanitizeHtml(dirty);
console.log(clean); // Output: <strong>Hello World</strong>
Configuration Options
sanitize-html
provides several configuration options to customize the sanitization process:
allowedTags: Specifies which HTML tags are permitted.
const clean = sanitizeHtml(dirty, {
allowedTags: [ 'b', 'i', 'em', 'strong', 'a' ]
});
allowedAttributes: Defines which attributes are allowed on specific tags.
const clean = sanitizeHtml(dirty, {
allowedAttributes: {
'a': [ 'href' ]
}
});
allowedSchemes: Controls the URL schemes permitted in attributes.
const clean = sanitizeHtml(dirty, {
allowedSchemes: [ 'http', 'https' ]
});
disallowedTagsMode: Determines how disallowed tags are handled (e.g., discarded or escaped).
const clean = sanitizeHtml(dirty, {
disallowedTagsMode: 'discard'
});
Advanced Configuration: The library allows for more granular control with options like
allowedIframeHostnames
,allowedClasses
, andallowedStyles
.
Implementation in Wisp CMS Blog Starter Kit
The Wisp CMS blog starter kit incorporates the sanitize-html
library to sanitize blog content before rendering it. Here's how it's implemented:
Importing Dependencies
In the BlogPostContent.tsx
component, the sanitize-html
library is imported:
import sanitize, { defaults } from "sanitize-html";
Sanitizing Content
The PostContent
component sanitizes the HTML content of a blog post using the following configuration:
export const PostContent = ({ content }: { content: string }) => {
const sanitizedContent = sanitize(content, {
allowedTags: [
"b", "i", "em", "strong", "a", "img", "h1", "h2", "h3", "code", "pre",
"p", "li", "ul", "ol", "blockquote", "td", "th", "table", "tr",
"tbody", "thead", "tfoot", "small", "div", "iframe"
],
allowedAttributes: {
...defaults.allowedAttributes,
"*": ["style"],
iframe: ["src", "allowfullscreen", "style"],
},
allowedIframeHostnames: ["www.youtube.com", "www.youtube-nocookie.com"],
});
return (
<div
className="blog-content mx-auto"
dangerouslySetInnerHTML={{ __html: sanitizedContent }}
></div>
);
};
Displaying Content
The BlogPostContent
component uses PostContent
to render the sanitized content, ensuring that only safe HTML is displayed:
export const BlogPostContent = ({ post }: { post: GetPostResult["post"] }) => {
if (!post) return null;
const { title, publishedAt, createdAt, content, tags } = post;
return (
<div>
<div className="prose lg:prose-xl dark:prose-invert mx-auto lg:prose-h1:text-4xl mb-10 lg:mt-20 break-words">
<h1>{title}</h1>
<PostContent content={content} />
<div className="mt-10 opacity-40 text-sm">
{tags.map((tag) => (
<Link
key={tag.id}
href={`/tag/${tag.name}`}
className="text-primary mr-2"
>
#{tag.name}
</Link>
))}
</div>
<div className="text-sm opacity-40 mt-4">
{Intl.DateTimeFormat("en-US").format(
new Date(publishedAt || createdAt)
)}
</div>
</div>
</div>
);
};
Best Practices for HTML Sanitization
Server-Side Sanitization: Always perform sanitization on the server-side to ensure security.
Regular Updates: Keep the
sanitize-html
library and other dependencies updated to mitigate any vulnerabilities.Custom Configurations: Tailor the sanitization configuration to your specific use case, allowing only necessary tags and attributes.
Testing: Regularly test the sanitization process to ensure that it effectively blocks harmful content.
Next Steps
If you're looking for a secure and efficient way to manage and sanitize your blog content, try out our Wisp CMS blog starter kit. It comes with built-in sanitization using the sanitize-html
library, ensuring your content is always safe. Additionally, explore Wisp as a headless CMS to serve content seamlessly to your website.
By following these guidelines and utilizing the sanitize-html
library, you can effectively safeguard your CMS against potential threats and ensure a smooth and secure user experience.