`,
expected: "Clean text",
},
{
name: "Extracts links correctly",
input: `Visit my website for info.`,
expected: "Visit my [website](https://example.com) for info.",
},
{
name: "Converts headers (H1, H2, H3)",
input: `
Main Title
Subtitle
Section
`,
expected: "# Main Title\n\n## Subtitle\n\n### Section",
},
{
name: "Handles bold and italics",
input: `Text bold and strong, then italic and em.`,
expected: "Text **bold** and **strong**, then *italic* and *em*.",
},
{
name: "Converts lists",
input: `
First element
Second element
`,
expected: "- First element\n- Second element",
},
{
name: "Handles paragraphs and line breaks ( )",
input: `
First paragraph
Second paragraph with a line break.
`,
expected: "First paragraph\n\nSecond paragraph with\na line break.",
},
{
name: "Decodes HTML entities",
input: `Math: 5 > 3 & 2 < 4. A "quote".`,
expected: "Math: 5 > 3 & 2 < 4. A \"quote\".",
},
{
name: "Cleans up residual HTML tags",
input: `
Text inside div and span
`,
expected: "Text inside div and span",
},
{
name: "Removes multiple spaces and excessive empty lines",
input: `This text has too many spaces.
And too many newlines.`,
expected: "This text has too many spaces.\n\nAnd too many newlines.",
},
{
name: "Nested lists with indentation",
input: "
One
Two
",
// Expect the sub-element to have 4 spaces of indentation
expected: "- One\n - Two",
},
{
name: "Image support",
input: ``,
// Correct Markdown syntax for images
expected: "",
},
{
name: "Image support without alt-text",
input: ``,
// If alt is missing, square brackets remain empty
expected: "",
},
{
name: "XSS Bypass on Links (Obfuscated HTML entities)",
// The Go HTML parser resolves entities, so this becomes "javascript:alert(1)"
input: `Click here`,
// Our isSafeHref (if updated with net/url) should neutralize it to "#"
expected: "[Click here](#)",
},
{
name: "Empty link or used as anchor",
input: ``,
// With no text or href, it shouldn't print anything (not even empty brackets)
expected: "",
},
{
name: "Link without href but with text (Textual anchor)",
input: `Back to top`,
// Should extract only plain text, without generating a broken Markdown link like [Back to top](#) or [Back to top]()
expected: "Back to top",
},
{
name: "Badly spaced bold and italics (Edge Case)",
input: ` Text `,
// In Markdown `** Text **` is often not formatted correctly. The ideal is `**Text**`
expected: "**Text**",
},
{
name: "Complex Test - Real Article",
input: `
`,
// Note: The indentation of the real HTML test will generate spaces that
// regex will clean up.
expected: "# Article Title\n\nThis is an **introductory text** with a [link](http://link.com).\n\n## Subtitle\n\n- Point one\n- Point two",
},
{
name: "Ordered list (OL)",
input: `
First
Second
Third
`,
expected: "1. First\n2. Second\n3. Third",
},
{
name: "Ordered list nested in unordered list",
input: `
`,
expected: "Use the command `go test ./...` to run the tests.",
},
{
name: "Simple blockquote",
input: `
An important quote.
`,
expected: "> An important quote.",
},
{
name: "Multiline blockquote",
input: `
First line of the quote.
Second line of the quote.
`,
expected: "> First line of the quote.\n>\n> Second line of the quote.",
},
{
name: "Strikethrough text (del/s)",
input: `This text is deleted and this is crossed out.`,
expected: "This text is ~~deleted~~ and this is ~~crossed out~~.",
},
{
name: "Horizontal separator (HR)",
input: `
Above the line
Below the line
`,
expected: "Above the line\n\n---\n\nBelow the line",
},
{
name: "Bold nested in link",
input: `Linked bold text`,
expected: "[**Linked bold text**](https://example.com)",
},
{
name: "data-src Image (lazy loading)",
input: ``,
expected: "",
},
{
name: "Image with javascript: src blocked",
input: ``,
// src is not safe, so the image is not emitted
expected: "",
},
{
name: "Link with data: href blocked",
input: `Click`,
expected: "[Click](#)",
},
{
name: "Deeply nested divs",
input: `
Important: read the critical instructionscarefully.
`,
expected: "**Important:** read the ***critical instructions*** *carefully*.",
},
{
name: "Article with nav and aside sections (noise to filter)",
input: `