前端浏览器有什么库能直接解析获取原始文本值的URL-Go服务器开发

在前端浏览器中，如果需要直接解析 HTML 并获取 原始未标准化 的 URL 属性值（如 href、src、action），可以使用以下库或方法：

1. 使用 `htmlparser2`（纯解析，不触发 DOM 标准化）

适用场景：需要轻量级、快速的 HTML 解析，避免浏览器自动补全 URL。
安装：

npm install htmlparser2

示例代码：

import { Parser } from 'htmlparser2';

const html = `
    <a href="/path?query=1">Link</a>
    <img src="image.png">
`;

const attributes = [];
const parser = new Parser({
    onopentag(name, attrs) {
        if (attrs.href) attributes.push({ tag: name, attr: 'href', value: attrs.href });
        if (attrs.src) attributes.push({ tag: name, attr: 'src', value: attrs.src });
    }
});

parser.write(html);
parser.end();

console.log(attributes); // 输出原始属性值

优点：

不依赖浏览器 DOM，直接解析原始文本。
保留原始值（如相对路径、未编码的查询参数）。

2. 使用 `cheerio`（类似 jQuery 的服务器端 DOM 操作）

适用场景：需要类似 jQuery 的 API，同时避免 URL 标准化。
安装：

npm install cheerio

示例代码：

import * as cheerio from 'cheerio';

const html = `
    <a href="/path?query=1">Link</a>
    <img src="image.png">
`;

const $ = cheerio.load(html);
const elements = $('[href], [src]').map((_, el) => {
    const $el = $(el);
    return {
        tag: el.tagName,
        attr: $el.attr('href') ? 'href' : 'src',
        value: $el.attr('href') || $el.attr('src')
    };
}).get();

console.log(elements); // 输出原始属性值

优点：

兼容 jQuery API，易上手。
默认返回原始属性值（除非手动调用 new URL()）。

3. 原生方案：正则表达式 + `DOMParser`

如果不想引入第三方库，可以结合正则表达式和 DOMParser（参考之前回答的方法2）：

function getRawAttributes(html) {
    const doc = new DOMParser().parseFromString(html, 'text/html');
    const elements = [...doc.querySelectorAll('[href], [src]')];
    
    return elements.map(el => {
        const outerHTML = el.outerHTML;
        const attrName = el.hasAttribute('href') ? 'href' : 'src';
        const rawValue = outerHTML.match(new RegExp(`${attrName}=["']([^"']*)["']`, 'i'))?.[1];
        
        return { tag: el.tagName, attr: attrName, value: rawValue };
    });
}

关键区别总结

方案	是否标准化 URL	依赖环境	适用场景
`htmlparser2`	❌ 保留原始值	Node.js/浏览器	需要最高性能的底层解析
`cheerio`	❌ 保留原始值	Node.js/浏览器	jQuery-like API + HTML5标准
`DOMParser+正则`	❌ （需额外处理）	仅浏览器	无依赖的原生方案

注意事项

编码问题:
- htmlparser2/cheerio返回的值可能是编码后的（如 &→&)），需手动解码：
```
const decodedValue = new DOMParser().parseFromString(value, 'text/html').body.textContent;
```
XSS防护:
- 如果解析用户输入的 HTML，确保对结果进行消毒（如使用 DOMPurify）。