How to Sanitize HTML Response from a CMS?

How to Sanitize HTML Response from a CMS?

When working with content management systems (CMS), ensuring the safety and cleanliness of HTML responses is paramount. Unchecked HTML can lead to various security vulnerabilities, including Cross-Site Scripting (XSS) attacks. In this article, we will explore how to effectively sanitize HTML responses from a CMS using the sanitize-html library. We'll also discuss a practical implementation of sanitization within the Wisp CMS blog starter kit.

Why HTML Sanitization is Important

HTML sanitization is a critical process for web security. It helps in:

  • Preventing XSS Attacks: Malicious scripts can be embedded in HTML content to steal user data or hijack user sessions.

  • Ensuring Content Integrity: Sanitation maintains the integrity of your content by allowing only safe HTML tags and attributes.

  • Enhancing User Experience: Clean HTML ensures that the content is displayed correctly across different browsers and devices.

Overview of the sanitize-html Library

sanitize-html is a popular library designed to clean HTML content. It offers flexible configuration options to permit or block specific tags and attributes, ensuring that only safe content is rendered.

Installation

To install sanitize-html, you can use either npm or yarn:

npm install sanitize-html
# or
yarn add sanitize-html

Basic Usage

The basic usage of sanitize-html is straightforward:

import sanitizeHtml from 'sanitize-html';

const dirty = '<strong>Hello World</strong>';
const clean = sanitizeHtml(dirty);
console.log(clean); // Output: <strong>Hello World</strong>

Configuration Options

sanitize-html provides several configuration options to customize the sanitization process:

  • allowedTags: Specifies which HTML tags are permitted.

const clean = sanitizeHtml(dirty, {
  allowedTags: [ 'b', 'i', 'em', 'strong', 'a' ]
});
  • allowedAttributes: Defines which attributes are allowed on specific tags.

const clean = sanitizeHtml(dirty, {
  allowedAttributes: {
    'a': [ 'href' ]
  }
});
  • allowedSchemes: Controls the URL schemes permitted in attributes.

const clean = sanitizeHtml(dirty, {
  allowedSchemes: [ 'http', 'https' ]
});
  • disallowedTagsMode: Determines how disallowed tags are handled (e.g., discarded or escaped).

const clean = sanitizeHtml(dirty, {
  disallowedTagsMode: 'discard'
});
  • Advanced Configuration: The library allows for more granular control with options like allowedIframeHostnames, allowedClasses, and allowedStyles.

Implementation in Wisp CMS Blog Starter Kit

The Wisp CMS blog starter kit incorporates the sanitize-html library to sanitize blog content before rendering it. Here's how it's implemented:

Importing Dependencies

In the BlogPostContent.tsx component, the sanitize-html library is imported:

import sanitize, { defaults } from "sanitize-html";

Sanitizing Content

The PostContent component sanitizes the HTML content of a blog post using the following configuration:

export const PostContent = ({ content }: { content: string }) => {
  const sanitizedContent = sanitize(content, {
    allowedTags: [
      "b", "i", "em", "strong", "a", "img", "h1", "h2", "h3", "code", "pre", 
      "p", "li", "ul", "ol", "blockquote", "td", "th", "table", "tr", 
      "tbody", "thead", "tfoot", "small", "div", "iframe"
    ],
    allowedAttributes: {
      ...defaults.allowedAttributes,
      "*": ["style"],
      iframe: ["src", "allowfullscreen", "style"],
    },
    allowedIframeHostnames: ["www.youtube.com", "www.youtube-nocookie.com"],
  });
  return (
    <div
      className="blog-content mx-auto"
      dangerouslySetInnerHTML={{ __html: sanitizedContent }}
    ></div>
  );
};

Displaying Content

The BlogPostContent component uses PostContent to render the sanitized content, ensuring that only safe HTML is displayed:

export const BlogPostContent = ({ post }: { post: GetPostResult["post"] }) => {
  if (!post) return null;
  const { title, publishedAt, createdAt, content, tags } = post;
  return (
    <div>
      <div className="prose lg:prose-xl dark:prose-invert mx-auto lg:prose-h1:text-4xl mb-10 lg:mt-20 break-words">
        <h1>{title}</h1>
        <PostContent content={content} />
        <div className="mt-10 opacity-40 text-sm">
          {tags.map((tag) => (
            <Link
              key={tag.id}
              href={`/tag/${tag.name}`}
              className="text-primary mr-2"
            >
              #{tag.name}
            </Link>
          ))}
        </div>
        <div className="text-sm opacity-40 mt-4">
          {Intl.DateTimeFormat("en-US").format(
            new Date(publishedAt || createdAt)
          )}
        </div>
      </div>
    </div>
  );
};

Best Practices for HTML Sanitization

  • Server-Side Sanitization: Always perform sanitization on the server-side to ensure security.

  • Regular Updates: Keep the sanitize-html library and other dependencies updated to mitigate any vulnerabilities.

  • Custom Configurations: Tailor the sanitization configuration to your specific use case, allowing only necessary tags and attributes.

  • Testing: Regularly test the sanitization process to ensure that it effectively blocks harmful content.

Next Steps

If you're looking for a secure and efficient way to manage and sanitize your blog content, try out our Wisp CMS blog starter kit. It comes with built-in sanitization using the sanitize-html library, ensuring your content is always safe. Additionally, explore Wisp as a headless CMS to serve content seamlessly to your website.

By following these guidelines and utilizing the sanitize-html library, you can effectively safeguard your CMS against potential threats and ensure a smooth and secure user experience.

Raymond Yeh

Raymond Yeh

Published on 28 October 2024

Choosing a CMS?

Wisp is the most delightful and intuitive way to manage content on your website. Integrate with any existing website within hours!

Choosing a CMS
Related Posts
Should I Sanitize HTML Response from a CMS?

Should I Sanitize HTML Response from a CMS?

Keep your website secure by sanitizing HTML content! Learn why HTML sanitization matters and get tips on using the sanitize-html library to protect against XSS and code injection.

Read Full Story
How to Build a Table of Content from HTML in React?

How to Build a Table of Content from HTML in React?

Transforming HTML into a Table of Contents (TOC) in React can be a complex task. This guide simplifies the process, enhancing both your site's SEO and user experience.

Read Full Story
Static Site, Dynamic Content: Powering Your NextJS Static Site with Lightweight Blog-Only CMS

Static Site, Dynamic Content: Powering Your NextJS Static Site with Lightweight Blog-Only CMS

Tired of choosing between static site performance and dynamic content? Learn how to integrate Wisp CMS with NextJS for the ultimate blogging solution. Best of both worlds awaits!

Read Full Story