Should I Sanitize HTML Response from a CMS?

Imagine visiting a popular blog and suddenly seeing an alert pop-up or being redirected to a suspicious site. This is often a result of unsanitized HTML content. Sanitizing HTML responses mitigates security vulnerabilities, ensuring a safe and trustworthy user experience.

Understanding HTML Sanitization

HTML sanitization is the process of cleaning HTML content to ensure it only contains safe elements. This is crucial because HTML content can include scripts and other elements that pose security risks.

Common Threats

XSS (Cross-Site Scripting): Malicious scripts injected into webpages. Attackers can exploit unsanitized content to execute malicious scripts in the user's browser.
Code Injection: Harmful code that can affect users’ browsers and systems. This can lead to unauthorized data access or further malicious attacks.

The Role of HTML Sanitization in CMS

Content Management Systems (CMSs) help users create and manage content easily. However, unscreened HTML from CMSs can introduce XSS and other security risks.

When a CMS delivers content, it can include user-generated content that may not always be trustworthy. Hence, sanitizing HTML responses from a CMS is vital to ensure content safety and to protect end-users from potential security threats.

The sanitize-html Library

The sanitize-html library is a robust solution designed to sanitize HTML content, ensuring that only safe and intended content is displayed.

Key Features and Benefits

Customizable Tags and Attributes: Control over which HTML tags and attributes are permitted.
Filtering Specific Tags: Allows elimination or transformation of unwanted tags.
Handling CSS: Specifies which CSS properties and values are allowed.
Disallowed Tags Management: Manages non-permitted tags by either discarding their contents or escaping them.

Installation

npm:
```
npm install sanitize-html
```
Yarn:
```
yarn add sanitize-html
```

Usage Examples

In the Browser

import sanitizeHtml from 'sanitize-html';

const html = "<strong>hello world</strong>";
console.log(sanitizeHtml(html)); // Output: <strong>hello world</strong>

In Node.js

// Importing in ES modules
import sanitizeHtml from 'sanitize-html';

// Importing in CommonJS
const sanitizeHtml = require('sanitize-html');

const dirty = 'some really tacky HTML';
const clean = sanitizeHtml(dirty);
console.log(clean); // Sanitized HTML content

Filtering Specific Tags

const clean = sanitizeHtml(dirty, {
  allowedTags: sanitizeHtml.defaults.allowedTags.concat(['img'])
});

Disallowed Tags Management

To discard entire contents:

const clean = sanitizeHtml(dirty, {
  disallowedTagsMode: 'completelyDiscard'
});

To escape disallowed tags:

const clean = sanitizeHtml(dirty, {
  disallowedTagsMode: 'escape'
});

Best Practices

Server-Side Sanitization: Perform sanitization on the server for increased security.
Maintain Original Case for SVG:

const clean = sanitizeHtml(dirty, {
  allowedTags: ['svg', 'g', 'defs', 'linearGradient', 'stop', 'circle'],
  parser: {
    lowerCaseTags: false,
    lowerCaseAttributeNames: false
  }
});

Support for CSS Styles:

const clean = sanitizeHtml(dirty, {
  allowedStyles: {
    '*': {
      'color': [/^#[0-9a-fA-F]{3,6}$/], // Allow only hex colors
      'text-align': [/^left$/, /^right$/, /^center$/]
    },
    'p': {
      'font-size': [/^\d+rem$/]
    }
  }
});

Practical Implementation: Wisp CMS Blog Post Content

Wisp CMS offers a seamless headless CMS experience, providing tools to manage and deliver content securely. Let's explore how sanitization is implemented within the Wisp CMS blog starter kit.

Sanitization in Wisp CMS

The BlogPostContent.tsx component is responsible for displaying sanitized blog post content. It utilizes sanitize-html to ensure that the content is safe before rendering it.

Code Example

import sanitize, { defaults } from "sanitize-html";

export const PostContent = ({ content }: { content: string }) => {
  const sanitizedContent = sanitize(content, {
    allowedTags: [
      "b", "i", "em", "strong", "a", "img", "h1", "h2", "h3", "code", "pre", 
      "p", "li", "ul", "ol", "blockquote", "td", "th", "table", "tr", "tbody", 
      "thead", "tfoot", "small", "div", "iframe"
    ],
    allowedAttributes: {
      ...defaults.allowedAttributes,
      "*": ["style"],
      iframe: ["src", "allowfullscreen", "style"],
    },
    allowedIframeHostnames: ["www.youtube.com", "www.youtube-nocookie.com"],
  });

  return (
    <div
      className="blog-content mx-auto"
      dangerouslySetInnerHTML={{ __html: sanitizedContent }}
    ></div>
  );
};

This code ensures that only a defined set of HTML tags and attributes are allowed, reducing security risks. The use of dangerouslySetInnerHTML in conjunction with sanitized content allows for safe HTML content rendering.

Benefits of Sanitizing HTML Responses

Improved Security: Sanitization prevents XSS and other vulnerabilities.
Content Integrity: Ensures safe and accurate content delivery.
User Trust: Maintains and enhances user trust by ensuring secure content.

Conclusion

Sanitizing HTML responses from a CMS is crucial to secure content delivery and protect users from malicious threats. Using tools like sanitize-html makes this task efficient and reliable.

Next Steps

Try the Blog Starter Kit: Discover the benefits by trying out our Wisp CMS blog starter kit.
Explore Wisp CMS: Consider using Wisp as a headless CMS for your website’s content needs. Learn more at our Wisp blog.