Should I Sanitize HTML Response from a CMS?

Should I Sanitize HTML Response from a CMS?

Imagine visiting a popular blog and suddenly seeing an alert pop-up or being redirected to a suspicious site. This is often a result of unsanitized HTML content. Sanitizing HTML responses mitigates security vulnerabilities, ensuring a safe and trustworthy user experience.

Understanding HTML Sanitization

HTML sanitization is the process of cleaning HTML content to ensure it only contains safe elements. This is crucial because HTML content can include scripts and other elements that pose security risks.

Common Threats

  • XSS (Cross-Site Scripting): Malicious scripts injected into webpages. Attackers can exploit unsanitized content to execute malicious scripts in the user's browser.

  • Code Injection: Harmful code that can affect users’ browsers and systems. This can lead to unauthorized data access or further malicious attacks.

The Role of HTML Sanitization in CMS

Content Management Systems (CMSs) help users create and manage content easily. However, unscreened HTML from CMSs can introduce XSS and other security risks.

When a CMS delivers content, it can include user-generated content that may not always be trustworthy. Hence, sanitizing HTML responses from a CMS is vital to ensure content safety and to protect end-users from potential security threats.

The sanitize-html Library

The sanitize-html library is a robust solution designed to sanitize HTML content, ensuring that only safe and intended content is displayed.

Key Features and Benefits

  1. Customizable Tags and Attributes: Control over which HTML tags and attributes are permitted.

  2. Filtering Specific Tags: Allows elimination or transformation of unwanted tags.

  3. Handling CSS: Specifies which CSS properties and values are allowed.

  4. Disallowed Tags Management: Manages non-permitted tags by either discarding their contents or escaping them.

Installation

  • npm:

    npm install sanitize-html
  • Yarn:

    yarn add sanitize-html

Usage Examples

In the Browser
import sanitizeHtml from 'sanitize-html';

const html = "<strong>hello world</strong>";
console.log(sanitizeHtml(html)); // Output: <strong>hello world</strong>
In Node.js
// Importing in ES modules
import sanitizeHtml from 'sanitize-html';

// Importing in CommonJS
const sanitizeHtml = require('sanitize-html');

const dirty = 'some really tacky HTML';
const clean = sanitizeHtml(dirty);
console.log(clean); // Sanitized HTML content
Filtering Specific Tags
const clean = sanitizeHtml(dirty, {
  allowedTags: sanitizeHtml.defaults.allowedTags.concat(['img'])
});
Disallowed Tags Management

To discard entire contents:

const clean = sanitizeHtml(dirty, {
  disallowedTagsMode: 'completelyDiscard'
});

To escape disallowed tags:

const clean = sanitizeHtml(dirty, {
  disallowedTagsMode: 'escape'
});

Best Practices

  1. Server-Side Sanitization: Perform sanitization on the server for increased security.

  2. Maintain Original Case for SVG:

const clean = sanitizeHtml(dirty, {
  allowedTags: ['svg', 'g', 'defs', 'linearGradient', 'stop', 'circle'],
  parser: {
    lowerCaseTags: false,
    lowerCaseAttributeNames: false
  }
});
  1. Support for CSS Styles:

const clean = sanitizeHtml(dirty, {
  allowedStyles: {
    '*': {
      'color': [/^#[0-9a-fA-F]{3,6}$/], // Allow only hex colors
      'text-align': [/^left$/, /^right$/, /^center$/]
    },
    'p': {
      'font-size': [/^\d+rem$/]
    }
  }
});

Practical Implementation: Wisp CMS Blog Post Content

Wisp CMS offers a seamless headless CMS experience, providing tools to manage and deliver content securely. Let's explore how sanitization is implemented within the Wisp CMS blog starter kit.

Sanitization in Wisp CMS

The BlogPostContent.tsx component is responsible for displaying sanitized blog post content. It utilizes sanitize-html to ensure that the content is safe before rendering it.

Code Example
import sanitize, { defaults } from "sanitize-html";

export const PostContent = ({ content }: { content: string }) => {
  const sanitizedContent = sanitize(content, {
    allowedTags: [
      "b", "i", "em", "strong", "a", "img", "h1", "h2", "h3", "code", "pre", 
      "p", "li", "ul", "ol", "blockquote", "td", "th", "table", "tr", "tbody", 
      "thead", "tfoot", "small", "div", "iframe"
    ],
    allowedAttributes: {
      ...defaults.allowedAttributes,
      "*": ["style"],
      iframe: ["src", "allowfullscreen", "style"],
    },
    allowedIframeHostnames: ["www.youtube.com", "www.youtube-nocookie.com"],
  });

  return (
    <div
      className="blog-content mx-auto"
      dangerouslySetInnerHTML={{ __html: sanitizedContent }}
    ></div>
  );
};

This code ensures that only a defined set of HTML tags and attributes are allowed, reducing security risks. The use of dangerouslySetInnerHTML in conjunction with sanitized content allows for safe HTML content rendering.

Benefits of Sanitizing HTML Responses

  1. Improved Security: Sanitization prevents XSS and other vulnerabilities.

  2. Content Integrity: Ensures safe and accurate content delivery.

  3. User Trust: Maintains and enhances user trust by ensuring secure content.

Conclusion

Sanitizing HTML responses from a CMS is crucial to secure content delivery and protect users from malicious threats. Using tools like sanitize-html makes this task efficient and reliable.

Next Steps

  • Try the Blog Starter Kit: Discover the benefits by trying out our Wisp CMS blog starter kit.

  • Explore Wisp CMS: Consider using Wisp as a headless CMS for your website’s content needs. Learn more at our Wisp blog.

Raymond Yeh

Raymond Yeh

Published on 01 November 2024

Choosing a CMS?

Wisp is the most delightful and intuitive way to manage content on your website. Integrate with any existing website within hours!

Choosing a CMS
Related Posts
How to Sanitize HTML Response from a CMS?

How to Sanitize HTML Response from a CMS?

Discover how to protect your CMS from security risks through HTML sanitization using sanitize-html library, with practical steps from the Wisp CMS blog starter kit.

Read Full Story
How to Build a Table of Content from HTML in React?

How to Build a Table of Content from HTML in React?

Transforming HTML into a Table of Contents (TOC) in React can be a complex task. This guide simplifies the process, enhancing both your site's SEO and user experience.

Read Full Story
Static Site, Dynamic Content: Powering Your NextJS Static Site with Lightweight Blog-Only CMS

Static Site, Dynamic Content: Powering Your NextJS Static Site with Lightweight Blog-Only CMS

Tired of choosing between static site performance and dynamic content? Learn how to integrate Wisp CMS with NextJS for the ultimate blogging solution. Best of both worlds awaits!

Read Full Story