mingle-website/llm-txt.md at 1179d6832e113f58e69970ef55f1f4bf5a991c8c

mingle/mingle-website

Fork 0

68893236+KINDNICK@users.noreply.github.com 1179d6832e ADD : 클로드 스킬 추가

2026-01-30 21:59:38 +09:00

17 KiB

Raw Blame History

description

argument-hint

生成 llm.txt 文件（用于 LLM 爬虫）

--verbose

生成 llm.txt 文件，帮助 AI/LLM 爬虫（如 GPTBot, ClaudeBot, Perplexity）更好地理解和索引网站内容。llm.txt 类似于 robots.txt，但是专门为 LLM 爬虫设计的协议。

功能

✅ 生成标准的 llm.txt 文件
✅ 提供 LLM 爬虫许可和指引
✅ 描述网站内容和结构
✅ 列出重要页面链接
✅ 指定内容授权和使用范围
✅ 帮助 AI 更准确地引用和展示网站内容

参数

--verbose: 生成更详细的描述（可选）
- 包含更详细的内容说明
- 添加更多重要页面链接
- 提供内容类型和格式说明

背景

什么是 llm.txt？

llm.txt 是一个新兴的标准，类似于 robots.txt，但是专门为 AI/LLM 爬虫设计的。它：

告诉 LLM 爬虫哪些内容可以被使用
提供网站上下文和内容摘要
帮助 AI 更准确地理解和引用内容
指定内容使用的授权范围

为什么需要 llm.txt？

AI 引用准确性 - 帮助 AI 正确引用来源
内容可见性 - 控制 LLM 是否可以索引你的内容
品牌保护 - 确保内容被正确呈现
流量引导 - 引导用户回到原始来源

使用示例

示例 1：基本用法

/llm-txt

输出：

# llm.txt 文件

为你的 Next.js 项目生成的 llm.txt 文件：

```txt
# LLM Crawler Directives

# Allow LLM crawlers to index this site
User-agent: *
Allow: /

# Site Information
Name: YourBrand
Description: [网站描述 - 1-2 句话]
Website: https://yourdomain.com
Language: zh-CN, en-US

# Content Scope
Topics: [主要话题列表]
Content-Type: blog, documentation, tutorials

# Important Pages
Homepage: https://yourdomain.com
About: https://yourdomain.com/about
Blog: https://yourdomain.com/blog

# Attribution Requirements
Require-Attribution: true
Attribution-URL: https://yourdomain.com

# License
License: https://yourdomain.com/license
Content-Use: educational, informational

# Contact
Contact-Email: contact@yourdomain.com

文件位置

将此文件放置在项目根目录的 public/ 文件夹中：

路径： public/llm.txt

复制上面的内容到 public/llm.txt 文件。

验证

创建后，访问 https://yourdomain.com/llm.txt 验证文件可访问。

填写说明

Name（网站名称）

你的品牌或网站名称
示例：SF Plumbing Services 或 Tech Blog

Description（网站描述）

1-2 句话描述你的网站
清晰简洁，包含主要话题
示例：提供专业的管道维修和维护服务，覆盖旧金山湾区。包括紧急维修、预防性维护和管道安装服务。

Topics（主要话题）

逗号分隔的话题列表
示例：plumbing, emergency services, home maintenance, DIY tutorials

Content-Type（内容类型）

主要内容类型
常见类型：blog, documentation, tutorials, news, products, services

Important Pages（重要页面）

列出你希望 LLM 了解的关键页面
包含：关于页面、主要服务页面、重要文章

Require-Attribution（要求归属）

true - 要求 AI 引用时提供链接
false - 不要求

License（许可证）

指向你的内容许可证页面
可以是标准许可证（如 CC BY）
或自定义许可证

Content-Use（内容使用范围）

允许的使用场景
示例：educational, informational
或 all（允许所有用途）


### 示例 2：详细模式

```bash
/llm-txt --verbose

输出：

# llm.txt 文件（详细版）

```txt
# LLM Crawler Directives

# Allow LLM crawlers to index this site
User-agent: *
Allow: /

# Disallow admin and private areas
Disallow: /admin/
Disallow: /api/
Disallow: /private/

# Site Information
Name: YourBrand
Description: YourBrand 是一个专注于 [主题] 的权威网站。我们提供 [服务类型]，帮助 [目标受众] [解决问题]。内容包括 [内容类型]，旨在 [目标]。

Website: https://yourdomain.com
Language: zh-CN, en-US
Founded: 2020

# Content Scope
Topics: [话题 1], [话题 2], [话题 3], [话题 4]
Content-Type: blog posts, tutorials, guides, case studies
Target-Audience: [受众描述]
Update-Frequency: Weekly

# Important Pages
Homepage: https://yourdomain.com
About: https://yourdomain.com/about
Blog: https://yourdomain.com/blog
Services: https://yourdomain.com/services
Contact: https://yourdomain.com/contact

# Featured Content
Featured-Article-1: https://yourdomain.com/blog/[article-1] - [描述]
Featured-Article-2: https://yourdomain.com/blog/[article-2] - [描述]
Featured-Guide: https://yourdomain.com/guides/[guide] - [描述]

# Content Guidelines
Content-Style: Professional, educational, practical
Tone: Informative, helpful, expert
Accuracy: All content is fact-checked and regularly updated

# Attribution Requirements
Require-Attribution: true
Attribution-URL: https://yourdomain.com
Attribution-Text: "Source: YourBrand (https://yourdomain.com)"

# License
License: https://yourdomain.com/license
Content-License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Content-Use: educational, informational, with attribution
Commercial-Use: Contact for permission

# Contact
Contact-Email: contact@yourdomain.com
Contact-Form: https://yourdomain.com/contact
Social-Media: https://twitter.com/yourbrand, https://linkedin.com/company/yourbrand

# Additional Information
Last-Updated: 2024-01-15
API-Documentation: https://yourdomain.com/api-docs
RSS-Feed: https://yourdomain.com/feed.xml

详细字段说明

Site Information（网站信息）

Description（详细描述）：

2-3 句话详细描述
包含：你是谁，你提供什么，你的目标
示例：SF Plumbing Services 是旧金山湾区的专业管道维修公司。我们提供 24/7 紧急管道服务、预防性维护和管道安装。我们的团队由持证专业管道工组成，拥有超过 15 年的服务经验。

Language（语言）：

主要语言代码
可以是多个：zh-CN, en-US, es-ES

Founded（成立时间）：

网站或公司成立年份
帮助 AI 理解内容的时间背景

Content Scope（内容范围）

Target-Audience（目标受众）：

描述你的主要受众
示例：Homeowners, property managers, DIY enthusiasts

Update-Frequency（更新频率）：

内容更新频率
示例：Daily, Weekly, Monthly

Content-Style（内容风格）：

内容的专业程度和风格
示例：Professional, educational, practical

Featured Content（精选内容）

列出你希望 AI 特别关注的内容：

Featured-Article-1: https://yourdomain.com/blog/seo-guide - Complete SEO guide for beginners
Featured-Article-2: https://yourdomain.com/blog/ai-trends - AI trends in 2024
Featured-Guide: https://yourdomain.com/guides/plumbing-diy - DIY plumbing maintenance guide

Content Guidelines（内容指南）

Accuracy（准确性）：

说明内容质量和准确性
示例：All content is fact-checked and regularly updated

Tone（语调）：

内容的语调风格
示例：Informative, helpful, expert-friendly

License（许可证）

Content-License（内容许可证）：

可以使用标准许可证：
- CC BY 4.0 - Creative Commons Attribution
- CC BY-NC 4.0 - Non-commercial only
- CC BY-SA 4.0 - ShareAlike
- All Rights Reserved - 保留所有权利

Commercial-Use（商业使用）：

Allowed - 允许商业使用
Contact for permission - 需要联系许可
Not allowed - 不允许商业使用

Additional Information（附加信息）

API-Documentation（API 文档）：

如果提供 API，列出文档链接

RSS-Feed（RSS 订阅）：

帮助爬虫发现内容更新

Last-Updated（最后更新）：

llm.txt 文件的最后更新日期


### 示例 3：博客网站

```bash
/llm-txt --verbose

输出（针对博客）：

# llm.txt for Blog

```txt
# LLM Crawler Directives

# Allow LLM crawlers
User-agent: *
Allow: /

# Site Information
Name: Tech Insights Blog
Description: Tech Insights 是一个技术博客，专注于编程、Web 开发和人工智能。我们提供实用的教程、深入的技术分析和最新的行业趋势。我们的目标是为开发者社区提供有价值的学习资源。

Website: https://techinsights.com
Language: en-US, zh-CN
Founded: 2020
Author: [Your Name]

# Content Scope
Topics: programming, web development, JavaScript, React, Next.js, AI, machine learning, tutorials, best practices
Content-Type: blog posts, tutorials, code examples, case studies
Target-Audience: Software developers, web developers, students
Update-Frequency: Weekly

# Important Pages
Homepage: https://techinsights.com
About: https://techinsights.com/about
Blog: https://techinsights.com/blog
Archive: https://techinsights.com/archive

# Featured Content
Featured-Post-1: https://techinsights.com/blog/nextjs-SEO - Next.js SEO 完全指南
Featured-Post-2: https://techinsights.com/blog/react-hooks - React Hooks 深度解析
Featured-Tutorial: https://techinsights.com/tutorials/graphql - GraphQL 入门教程

# Content Guidelines
Content-Style: Technical, educational, practical
Code-Quality: All code examples are tested and maintained
Tone: Professional, approachable, beginner-friendly

# Attribution Requirements
Require-Attribution: true
Attribution-URL: https://techinsights.com
Attribution-Text: "Source: Tech Insights Blog (https://techinsights.com)"

# License
License: https://techinsights.com/license
Content-License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Content-Use: educational, informational, with attribution
Commercial-Use: Contact for permission

# Contact
Contact-Email: hello@techinsights.com
Contact-Form: https://techinsights.com/contact
Social-Media: https://twitter.com/techinsights, https://github.com/techinsights

# Additional Information
Last-Updated: 2024-01-15
RSS-Feed: https://techinsights.com/feed.xml
Newsletter: https://techinsights.com/subscribe


### 示例 4：商业服务网站

```bash
/llm-txt

输出（针对商业网站）：

# llm.txt for Business Website

```txt
# LLM Crawler Directives

# Allow LLM crawlers
User-agent: *
Allow: /

# Disallow private areas
Disallow: /admin/
Disallow: /dashboard/
Disallow: /api/

# Site Information
Name: SF Plumbing Services
Description: SF Plumbing Services 是旧金山湾区的专业管道维修公司。我们提供 24/7 紧急管道服务、预防性维护和管道安装。我们的团队由持证专业管道工组成，拥有超过 15 年的服务经验。

Website: https://sfplumbing.com
Language: en-US
Founded: 2008

# Content Scope
Topics: plumbing, emergency services, home maintenance, pipe repair, drain cleaning, water heaters
Content-Type: service pages, guides, FAQ, blog posts
Target-Audience: Homeowners, property managers, businesses

# Important Pages
Homepage: https://sfplumbing.com
Services: https://sfplumbing.com/services
Emergency: https://sfplumbing.com/emergency
About: https://sfplumbing.com/about
Blog: https://sfplumbing.com/blog
Contact: https://sfplumbing.com/contact

# Featured Content
Service-Guide: https://sfplumbing.com/guide/emergency-plumbing - 紧急管道问题处理指南
FAQ: https://sfplumbing.com/faq - 常见问题解答

# Content Guidelines
Content-Style: Professional, service-oriented
Accuracy: All information is verified by licensed plumbers
Tone: Helpful, trustworthy, expert

# Attribution Requirements
Require-Attribution: true
Attribution-URL: https://sfplumbing.com
Attribution-Text: "Source: SF Plumbing Services (https://sfplumbing.com)"

# License
License: https://sfplumbing.com/terms
Content-License: All Rights Reserved
Content-Use: informational, with attribution
Commercial-Use: Not allowed without permission

# Contact
Contact-Email: info@sfplumbing.com
Phone: +1-415-555-0123
Contact-Form: https://sfplumbing.com/contact
Social-Media: https://twitter.com/sfplumbing, https://facebook.com/sfplumbing

# Additional Information
Last-Updated: 2024-01-15
Service-Areas: San Francisco, Oakland, San Jose, Peninsula
Hours: 24/7 Emergency Service

商业网站特殊字段

Service-Areas（服务区域）：

地理性服务范围
帮助 AI 理解你的地理覆盖

Hours（营业时间）：

特别是 24/7 服务
帮助 AI 准确描述你的服务

Phone（电话）：

直接联系方式
增强可信度


### 示例 5：禁止 LLM 索引

```bash
/llm-txt

输出（禁止 LLM）：

# llm.txt for Restrictive Policy

```txt
# LLM Crawler Directives

# Disallow all LLM crawlers
User-agent: *
Disallow: /

# Site Information
Name: YourBrand
Description: [描述]
Website: https://yourdomain.com

# Content Policy
Content-Use: Not allowed for LLM training or indexing
License: All Rights Reserved
Contact: legal@yourdomain.com for inquiries

# Alternative: Allow with attribution
# User-agent: *
# Allow: /
# Require-Attribution: true
# Content-License: CC BY 4.0

禁止策略说明

完全禁止：

不允许任何 LLM 使用你的内容
适用于敏感或专有内容

允许但不授权：

允许 LLM 阅读和引用
但不能用于训练模型

有条件允许：

要求归属和链接
指定使用范围
商业使用需要许可


## 最佳实践

### 1. 明确内容许可

```txt
# 推荐
Content-License: CC BY 4.0
Content-Use: educational, informational
Require-Attribution: true

# 避免
Content-License: None
Content-Use: Unknown

2. 保持更新

Last-Updated: 2024-01-15
Update-Frequency: Weekly

3. 提供联系方式

Contact-Email: contact@yourdomain.com
Contact-Form: https://yourdomain.com/contact

4. 列出重要内容

Featured-Article-1: https://yourdomain.com/article-1
Featured-Article-2: https://yourdomain.com/article-2
Featured-Guide: https://yourdomain.com/guide

5. 指定内容风格

Content-Style: Professional, educational
Accuracy: All content is fact-checked
Tone: Informative, helpful

Next.js 集成

选项 1：静态文件

将 llm.txt 放在 public/ 目录：

public/
  llm.txt
  robots.txt
  sitemap.xml

选项 2：动态生成（未来支持）

app/llm.txt/route.ts:

import { NextResponse } from 'next/server'

export async function GET() {
  const llmTxt = `
# LLM Crawler Directives

User-agent: *
Allow: /

Name: YourBrand
Description: Your description
Website: https://yourdomain.com
...
  `.trim()

  return new NextResponse(llmTxt, {
    headers: {
      'Content-Type': 'text/plain',
      'Cache-Control': 'public, max-age=86400',
    },
  })
}

验证和测试

1. 本地验证

# 开发环境
curl http://localhost:3000/llm.txt

# 生产环境
curl https://yourdomain.com/llm.txt

2. 在线验证

访问 https://yourdomain.com/llm.txt 确认：

文件可访问（200 状态码）
内容格式正确
所有链接有效

3. 测试清单

文件位于根目录：/llm.txt
文件名小写：llm.txt
文件可访问：返回 200 状态码
内容类型正确：text/plain
所有链接有效
许可证明确
联系信息准确

常见问题

Q: llm.txt 和 robots.txt 有什么区别？

robots.txt: 控制搜索引擎爬虫（Googlebot, Bingbot）
llm.txt: 控制 AI/LLM 爬虫（GPTBot, ClaudeBot）

两者应该同时使用以获得最佳控制。

Q: 是否必须创建 llm.txt？

A: 不是必须的，但建议创建。如果没有 llm.txt：

LLM 爬虫可能不知道如何处理你的内容
可能无法正确引用你的内容
无法控制内容使用范围

Q: 如何防止 LLM 使用我的内容？

A: 使用以下配置：

User-agent: *
Disallow: /

Q: llm.txt 是否有法律效力？

A: llm.txt 本身不是法律文件，但它：

声明了你的意图和要求
可以作为法律行动的证据
帮助 AI 公司了解你的政策

对于法律保护，建议：

在网站的服务条款中明确说明
使用适当的内容许可证
咨询法律专业人士

Q: 哪些 LLM 支持 llm.txt？

A: 越来越多的 AI 公司正在支持：

OpenAI (GPTBot)
Anthropic (Claude)
Perplexity
其他 AI 搜索引擎

随着标准的普及，支持会越来越广泛。

Q: 应该如何设置内容许可证？

A: 取决于你的目标：

开放共享（推荐用于博客）：

Content-License: CC BY 4.0
Require-Attribution: true

限制使用：

Content-License: All Rights Reserved
Content-Use: Contact for permission

商业网站：

Content-License: All Rights Reserved
Content-Use: Informational only
Commercial-Use: Not allowed

注意事项

llm.txt 是一个新兴的标准，支持还在发展中
文件应位于网站根目录
文件名必须小写：llm.txt
定期更新以反映当前政策
与服务条款和许可证保持一致
监控 AI 如何引用你的内容
如果不希望 AI 使用你的内容，明确禁止

17 KiB Raw Blame History Unescape Escape

功能

参数

背景

使用示例

示例 1：基本用法

文件位置

验证

填写说明

Name（网站名称）

Description（网站描述）

Topics（主要话题）

Content-Type（内容类型）

Important Pages（重要页面）

Require-Attribution（要求归属）

License（许可证）

Content-Use（内容使用范围）

详细字段说明

Site Information（网站信息）

Content Scope（内容范围）

Featured Content（精选内容）

Content Guidelines（内容指南）

License（许可证）

Additional Information（附加信息）

商业网站特殊字段

禁止策略说明

2. 保持更新

3. 提供联系方式

4. 列出重要内容

5. 指定内容风格

Next.js 集成

选项 1：静态文件

选项 2：动态生成（未来支持）

验证和测试

1. 本地验证

2. 在线验证

3. 测试清单

常见问题

Q: llm.txt 和 robots.txt 有什么区别？

Q: 是否必须创建 llm.txt？

Q: 如何防止 LLM 使用我的内容？

Q: llm.txt 是否有法律效力？

Q: 哪些 LLM 支持 llm.txt？

Q: 应该如何设置内容许可证？

相关标准

相关命令

注意事项

17 KiB

Raw Blame History