Jetson 部署 GPT OSS 20B - 1

Jetson 部署 GPT OSS 20B

一键通过 SSH 在 NVIDIA Jetson 上部署 GPT OSS 20B,使用预构建 Docker 镜像,自动启动推理服务。

入门10min大语言模型
Jetson大模型dockeredge-ai

这个能力做什么

通过 SSH 一键把 GPT OSS 20B 大模型部署到 NVIDIA Jetson 设备上。部署完成后,容器自动启动 llama-server,在 8080 端口提供兼容 OpenAI 格式的 HTTP 推理服务。

输出接口

接口类型说明端口/路径数据格式
HTTP API兼容 OpenAI 的对话补全接口:8080/v1/chat/completionsJSON

部署完成后,打开浏览器访问:

http://<jetson-ip>:8080

适用集成场景

  • 作为本地 AI 对话后端,接入聊天机器人或语音助手
  • 配合 OpenClaw 网关,同时对接微信、Telegram 等多个聊天平台
  • 为边缘设备提供离线大模型推理能力,无需云端 API

使用须知

硬件要求

  • Jetson Orin NX 16GB 或更高配置(20B 模型需要约 12-15GB 显存)
  • reComputer J4012 已验证可用,其他 Jetson Orin 机型需确认显存充足

调用方式

  • API 地址:http://<jetson-ip>:8080/v1/chat/completions
  • 兼容 OpenAI 格式,可直接用现有 SDK 调用
  • Python 示例:import openai; openai.api_base = "http://<jetson-ip>:8080/v1"

首次请求延迟

  • 部署完成后首次调用可能需要等待 2-5 分钟(模型加载预热)
  • 可访问 http://<jetson-ip>:8080/v1/models 检查服务是否就绪
  • 预热完成后,后续请求响应较快(通常 1-3 秒)

Token 与上下文

  • 默认上下文窗口约 2048 tokens,可在部署时调整
  • 如需更大上下文,可在配置中调高 Llama Context 参数(会占用更多显存)
  • 单次请求建议控制在 1000 tokens 以内,避免显存溢出

技术规格

指标数值
模型GPT OSS 20B
推理框架llama.cpp (llama-server)
支持硬件reComputer J4012 (Jetson Orin NX 16GB)
服务端口8080
API 格式OpenAI 兼容

集成接口

http

兼容 OpenAI 格式的对话补全 API

/v1/chat/completions · Port: 8080 · Method: POST
{"choices":[{"message":{"content":"response text"}}]}

部署方案

edge_device

下载与安装

Preset: Jetson GPT OSS 20B Service {#jetson_got_oss}

Deploy GPT OSS 20B to your Jetson device with one click from this platform.

DevicePurpose
NVIDIA Jetson (reComputer)Runs GPT OSS 20B in Docker

Step 1: Deploy GPT OSS 20B Service {#deploy_got_oss type=docker_deploy required=true config=devices/jetson.yaml}

Deploy the containerized GPT OSS 20B runtime to your Jetson over SSH.

部署目标: 远程部署(Jetson) {#jetson_remote type=remote config=devices/jetson.yaml default=true}

Deploy to your Jetson over SSH with one click.

Wiring

  1. Connect Jetson and your computer to the same network.
  2. Fill in Jetson IP, SSH username, and password.
  3. Click Deploy.

Deployment Complete

  1. The GPT OSS 20B container is running on your Jetson.
  2. llama-server is started inside the container.
  3. The service endpoint is available at http://<jetson-ip>:8080.
  4. Readiness endpoint is available at http://<jetson-ip>:8080/v1/models.

Troubleshooting

IssueSolution
SSH connection failedVerify Jetson IP, username, password, and SSH service status
Docker runtime check failedEnsure Docker is installed and NVIDIA runtime is available
Docker Compose unavailableEnsure docker compose or docker-compose is installed
Service start failedInspect logs on Jetson: docker compose logs --tail=200
503 {"message":"Loading model"} on /v1/modelsModel is still warming up; first run can take several minutes
Out-of-memory at startupReduce settings, for example set Llama NGL=16 and Llama Context=512

部署目标: 本机部署 {#jetson_local type=local config=devices/jetson_local.yaml}

直接在当前机器上部署(需要具备足够显存的 NVIDIA GPU)。

接线

  1. 确保已安装 Docker 和 NVIDIA Container Toolkit
  2. 点击 部署 开始安装

提示: 首次启动可能需要 15-30 分钟下载 Docker 镜像和加载模型。需要至少 20GB 可用磁盘空间。

部署完成

  1. 在浏览器打开 http://localhost:8080
  2. 你将看到 GPT OSS 聊天界面,可随时开始对话

故障排查

问题解决方法
找不到 NVIDIA 运行时安装 NVIDIA Container Toolkit:sudo apt install nvidia-container-toolkit && sudo systemctl restart docker
端口 8080 已被占用停止该端口上的其他服务
容器反复重启查看日志:docker compose logs --tail=200
GPU 显存不足20B 模型需要较大显存。可尝试使用更小的模型变体

Step 2: Open Service Link {#preview_service type=preview required=false config=devices/preview.yaml}

Use this step to open the Jetson service URL directly in a new browser tab.

Wiring

  1. Enter Jetson IP in this step.
  2. Click Connect.
  3. The platform opens http://<jetson-ip>:8080 in a new tab.

Deployment Complete

  1. The service page opens in your browser.
  2. You can return here and click Connect again to reopen it.

Troubleshooting

IssueSolution
Invalid host inputEnter a valid IP or hostname, for example 192.168.1.100
New tab not openedAllow pop-ups for this site and retry
Service page not reachableConfirm Jetson service is listening on 8080 and network is reachable

Deployment Complete

GPT OSS 20B runtime has been deployed successfully on your Jetson.

Validation Checklist

  1. Step 1 deployment status shows success.
  2. The GPT OSS 20B container stays in running state.
  3. Clicking Connect in Step 2 opens http://<jetson-ip>:8080.
联系我们
获取方案参考设计与产品选型支持。
您是否使用过我们的产品?
Jetson 部署 GPT OSS 20B