JavaScript 混淆器实现原理

研究了一下 JavaScript 混淆技术。市面上的工具要么太重，要么配置太复杂。干脆自己实现一个，顺便把混淆的核心技术整理出来。

混淆的本质是什么？

混淆 ≠ 加密。加密需要密钥才能还原，混淆只是让代码变得难以阅读和分析。好的混淆应该：

增加逆向成本：让破解的时间成本远高于重写
保持功能不变：混淆后的代码逻辑必须完全一致
控制体积膨胀：不能让代码体积爆炸式增长

核心技术一：标识符重命名

最基础也最有效。把有意义的变量名变成无意义的字符：

// 原始代码
function calculateTotal(price, quantity) {
  const discount = 0.9;
  return price * quantity * discount;
}

// 混淆后
function _0x1a2b(_0x3c4d, _0x5e6f) {
  const _0x7g8h = 0.9;
  return _0x3c4d * _0x5e6f * _0x7g8h;
}

四种命名策略

策略	示例	特点
hexadecimal	`_0x1a2b`	最常见，不易冲突
mangled	`a, b, c`	最短，但可能冲突
mangled-shuffled	`x, m, k`	随机打乱字母
dictionary	`apple, banana`	自定义词典

实现要点：需要维护一个映射表，确保同一个标识符在不同位置使用相同的混淆名：

class IdentifierRenamer {
  private map = new Map<string, string>()
  private counter = 0
  
  rename(original: string): string {
    if (this.map.has(original)) {
      return this.map.get(original)!
    }
    const newName = `_0x${this.counter.toString(16)}`
    this.map.set(original, newName)
    this.counter++
    return newName
  }
}

核心技术二：字符串数组化

把所有字符串字面量提取到一个数组，通过索引访问：

// 原始代码
const message = "Hello, World!";
console.log("User: " + username);
alert("Operation completed");

// 混淆后
var _0x4f3a = ["Hello, World!", "User: ", "Operation completed"];
const message = _0x4f3a[0];
console.log(_0x4f3a[1] + username);
alert(_0x4f3a[2]);

加密增强

进一步对字符串数组加密（Base64 或 RC4）：

// Base64 编码
var _0x4f3a = ["SGVsbG8sIFdvcmxkIQ==", "VXNlcjog", "T3BlcmF0aW9uIGNvbXBsZXRlZA=="];
function _0xdecode(idx) {
  return atob(_0x4f3a[idx]);
}

// RC4 加密（更难破解）
var _0x4f3a = ["\x1a\x2b\x3c...", "\x4d\x5e\x6f...", ...];
function _0xrc4(idx, key) {
  // RC4 解密逻辑
}

性能考量：RC4 比 Base64 慢 30-35%，但安全性更高。

核心技术三：控制流扁平化

这是最复杂也最有效的技术。把线性的代码流程变成 switch-case 跳转：

// 原始代码
function validate(user) {
  if (!user) return false;
  if (!user.email) return false;
  if (!user.password) return false;
  return true;
}

// 混淆后
function validate(user) {
  var _0xflow = "3|1|4|2|0".split("|");
  var _0xidx = 0;
  while (true) {
    switch (_0xflow[_0xidx++]) {
      case "0":
        return true;
      case "1":
        if (!user) return false;
        break;
      case "2":
        if (!user.password) return false;
        break;
      case "3":
        continue;
      case "4":
        if (!user.email) return false;
        break;
    }
    if (_0xidx >= _0xflow.length) break;
  }
}

原理：把代码块打乱顺序，用状态机控制执行流程。逆向时很难理清逻辑。

性能影响：这是对性能影响最大的技术，可能降低 1.5 倍运行速度。建议设置阈值（如 0.75），只对部分代码块应用。

核心技术四：死代码注入

在真实代码中混入永远不会执行的垃圾代码：

// 原始代码
function add(a, b) {
  return a + b;
}

// 注入死代码后
function add(a, b) {
  if (typeof _0xdead === 'undefined') {
    var _0xfake1 = Math.random() * 1000;
    var _0xfake2 = "never executed";
    console.log(_0xfake1 + _0xfake2);
  }
  return a + b;
}

代价：代码体积可能增加 200%。需要权衡体积和安全。

核心技术五：调试保护

防止别人在开发者工具中调试：

1. 禁用 debugger

// 检测到调试器就无限循环
setInterval(function() {
  var start = Date.now();
  debugger;
  if (Date.now() - start > 100) {
    location.reload(); // 或者死循环
  }
}, 4000);

2. 禁用 console

(function() {
  var _console = console;
  ['log', 'info', 'warn', 'error', 'debug'].forEach(function(method) {
    _console[method] = function() {};
  });
})();

3. 自我保护

// 格式化代码后会失效
(function() {
  function check() {
    var fn = function() {};
    fn.toString = function() {
      return '/* original code */';
    };
    if (fn + '' !== '/* original code */') {
      // 代码被修改了
      throw new Error('Tampered!');
    }
  }
  setInterval(check, 1000);
})();

核心技术六：域名锁定

限制代码只能在特定域名运行：

(function() {
  var allowed = ['jsokit.com', '.jsokit.com'];
  var current = location.hostname;
  var valid = allowed.some(function(domain) {
    if (domain.startsWith('.')) {
      return current.endsWith(domain) || current === domain.slice(1);
    }
    return current === domain;
  });
  if (!valid) {
    // 不在允许的域名，停止执行
    return;
  }
})();

实战：三种预设配置

根据安全等级选择不同的混淆策略：

低安全（适合性能敏感场景）

{
  compact: true,              // 压缩代码
  stringArray: true,          // 字符串数组化
  stringArrayThreshold: 0.5,  // 50% 字符串混淆
  controlFlowFlattening: false,
  deadCodeInjection: false
}

体积增长：~20%
性能损失：<5%

中安全（平衡方案）

{
  compact: true,
  stringArray: true,
  stringArrayEncoding: ['base64'],
  stringArrayThreshold: 0.75,
  controlFlowFlattening: true,
  controlFlowFlatteningThreshold: 0.75,
  deadCodeInjection: true,
  deadCodeInjectionThreshold: 0.4
}

体积增长：~100%
性能损失：~30%

高安全（最大保护）

{
  compact: true,
  stringArray: true,
  stringArrayEncoding: ['rc4'],
  stringArrayThreshold: 1,
  controlFlowFlattening: true,
  controlFlowFlatteningThreshold: 1,
  deadCodeInjection: true,
  deadCodeInjectionThreshold: 1,
  debugProtection: true,
  selfDefending: true,
  disableConsoleOutput: true
}

体积增长：~200%
性能损失：~50%

一些踩坑经验

1. 全局变量污染

混淆全局变量可能破坏外部依赖：

// ❌ 危险：可能破坏第三方库
renameGlobals: true

// ✅ 安全：只混淆局部变量
renameGlobals: false

2. eval 和 new Function

混淆后 eval 中的代码无法执行：

// 原始代码
eval("console.log('test')");

// 混淆后可能失败
eval("_0x1a2b('0x1')"); // _0x1a2b 在 eval 作用域不存在

解决方案：使用 target: 'browser-no-eval' 避免生成 eval。

3. 属性名混淆陷阱

// 原始代码
const obj = { name: 'John' };
console.log(obj.name);

// 混淆后
const obj = { _0x1a: 'John' };
console.log(obj._0x1a); // ❌ 错误：obj 没有 _0x1a 属性

原因：属性访问会被编译成 obj['name']，混淆后变成 obj['_0x1a']，但对象键没变。

解决方案：谨慎使用 renameProperties，或确保所有属性访问都用点号语法。

在线工具推荐

基于以上原理，做了一个在线混淆工具：JavaScript 混淆器

主要特点：

三种预设配置（低/中/高安全）
30+ 可调参数
实时预览混淆效果
支持下载 Source Map

混淆不是银弹，但能显著提高逆向成本。结合服务端验证、代码分割等手段，才能构建更完整的前端安全体系。

在这里插入图片描述

相关工具：代码压缩器 | Base64 编解码