模型介绍

Intern-S1-mini基于一个8B密集语言模型（Qwen3）和一个0.3B视觉编码器（InternViT），Intern-S1-mini 在5万亿个标记的多模态数据上进行了进一步预训练，其中包括超过2.5万亿个科学领域的标记。这使得该模型在保持强大的通用能力的同时，在诸如解释化学结构、理解蛋白质序列和规划化合物合成路线等专业科学领域表现出色，使Intern-S1-mini 成为适用于实际科学应用的研究助手。

模型加载

from modelscope import AutoProcessor, AutoModelForCausalLM
import torchmodel_name = "Shanghai_AI_Laboratory/Intern-S1-mini"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)

Downloading Model from https://www.modelscope.cn to directory: /home/six/.cache/modelscope/hub/models/Shanghai_AI_Laboratory/Intern-S1-miniIf tokenization with SMILES formula is of necessity, please 'pip install RDKit' for better tokenization quality.Downloading Model from https://www.modelscope.cn to directory: /home/six/.cache/modelscope/hub/models/Shanghai_AI_Laboratory/Intern-S1-miniLoading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  2.48it/s]

模型结构

model

InternS1ForConditionalGeneration((model): InternS1Model((vision_tower): InternS1VisionModel((embeddings): InternS1VisionEmbeddings((patch_embeddings): InternS1VisionPatchEmbeddings((projection): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14)))(dropout): Dropout(p=0.0, inplace=False))(encoder): InternS1VisionEncoder((layer): ModuleList((0-23): 24 x InternS1VisionLayer((attention): InternS1VisionAttention((q_proj): Linear(in_features=1024, out_features=1024, bias=True)(k_proj): Linear(in_features=1024, out_features=1024, bias=True)(v_proj): Linear(in_features=1024, out_features=1024, bias=True)(projection_layer): Linear(in_features=1024, out_features=1024, bias=True)(projection_dropout): Identity()(q_norm): Identity()(k_norm): Identity())(mlp): InternS1VisionMLP((activation_fn): GELUActivation()(fc1): Linear(in_features=1024, out_features=4096, bias=True)(fc2): Linear(in_features=4096, out_features=1024, bias=True))(layernorm_before): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(layernorm_after): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)(dropout): Dropout(p=0.0, inplace=False)(drop_path1): Identity()(drop_path2): Identity())))(layernorm): Identity())(multi_modal_projector): InternS1MultiModalProjector((layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)(linear_1): Linear(in_features=4096, out_features=4096, bias=True)(act): GELUActivation()(linear_2): Linear(in_features=4096, out_features=4096, bias=True))(language_model): Qwen3Model((embed_tokens): Embedding(153216, 4096)(layers): ModuleList((0-35): 36 x Qwen3DecoderLayer((self_attn): Qwen3Attention((q_proj): Linear(in_features=4096, out_features=4096, bias=False)(k_proj): Linear(in_features=4096, out_features=1024, bias=False)(v_proj): Linear(in_features=4096, out_features=1024, bias=False)(o_proj): Linear(in_features=4096, out_features=4096, bias=False)(q_norm): Qwen3RMSNorm((128,), eps=1e-06)(k_norm): Qwen3RMSNorm((128,), eps=1e-06))(mlp): Qwen3MLP((gate_proj): Linear(in_features=4096, out_features=12288, bias=False)(up_proj): Linear(in_features=4096, out_features=12288, bias=False)(down_proj): Linear(in_features=12288, out_features=4096, bias=False)(act_fn): SiLU())(input_layernorm): Qwen3RMSNorm((4096,), eps=1e-06)(post_attention_layernorm): Qwen3RMSNorm((4096,), eps=1e-06)))(norm): Qwen3RMSNorm((4096,), eps=1e-06)(rotary_emb): Qwen3RotaryEmbedding()))(lm_head): Linear(in_features=4096, out_features=153216, bias=False)
)

在这里插入图片描述

模型测试

messages = [{"role": "user","content": [{"type": "text", "text": "请用中文介绍一下你自己。"},],}
]inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)

Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.好的，用户让我用中文介绍一下自己。首先，我需要确定用户的需求是什么。可能他们想了解我的功能，或者测试我的中文能力，或者只是出于好奇。接下来，我要考虑如何组织回答。应该包括我的身份、功能、特点，以及如何帮助用户。用户可能希望得到一个简洁明了的回答，所以需要避免冗长。同时，要突出我的优势，比如多语言支持、知识广度、实时信息获取等。还要注意用词亲切，让用户感觉友好。另外，用户可能没有明确说明，但潜在需求可能是想确认我是否可靠，或者能否处理复杂任务。因此，在回答中需要强调准确性和安全性，比如提到遵守道德准则，保护隐私。还要考虑用户可能的后续问题，比如如何使用我，或者我的局限性。所以在介绍中可以适当提到这些方面，但不需要展开，保持回答的针对性。最后，检查语言是否自然流畅，避免技术术语，让不同背景的用户都能理解。确保回答结构清晰，分点说明，但不要用列表形式，而是用连贯的段落。
</think>你好！我是一个名叫Intern-S1的人工智能助手，由上海人工智能实验室开发。我擅长处理各种语言任务，比如回答问题、提供定义和解释、将文本从一种语言翻译成另一种语言、总结文本、生成文本、编写故事、分析情感、提供推荐、开发算法、编写代码等等。我致力于通过执行常见的基于语言的任务和提供建议来帮助人类。如果你有任何需要帮助的地方，欢迎随时向我提问！