[Question]: Parse doc failed

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### Describe your problem

Hi @dosubot,

I deployed ragflow 0.21.1 on mac os by source code.

I encountered two issues:

**1. I tried to parse .doc file by BAAI/bge-large-zh-v1.5 model and it listed in llm_factories.json already, but the log still displayed error as following:**

‘’‘
2025-10-28 18:01:08,333 ERROR    44839 Fail to bind embedding model: Model(BAAI/bge-large-zh-v1.5@BAAI) not authorized
Traceback (most recent call last):
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/rag/svr/task_executor.py", line 799, in do_handle_task
    embedding_model = LLMBundle(task_tenant_id, LLMType.EMBEDDING, llm_name=task_embedding_id, lang=task_language)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/db/services/llm_service.py", line 72, in __init__
    super().__init__(tenant_id, llm_type, llm_name, lang, **kwargs)
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/db/services/tenant_llm_service.py", line 250, in __init__
    self.mdl = TenantLLMService.model_instance(tenant_id, llm_type, llm_name, lang=lang, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/.venv/lib/python3.11/site-packages/peewee.py", line 3128, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/db/services/tenant_llm_service.py", line 132, in model_instance
    model_config = TenantLLMService.get_model_config(tenant_id, llm_type, llm_name)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/.venv/lib/python3.11/site-packages/peewee.py", line 3128, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/db/services/tenant_llm_service.py", line 120, in get_model_config
    raise LookupError(f"Model({mdlnm}@{fid}) not authorized")
LookupError: Model(BAAI/bge-large-zh-v1.5@BAAI) not authorized
2025-10-28 18:01:08,343 INFO     44839 set_progress(05f20e5ab3e511f0b9f6ea8e869d4ad9), progress: -1, progress_msg: 18:01:08 [ERROR][Exception]: Model(BAAI/bge-large-zh-v1.5@BAAI) not authorized
2025-10-28 18:01:08,344 ERROR    44839 handle_task got exception for task {"id": "05f20e5ab3e511f0b9f6ea8e869d4ad9", "doc_id": "232f8e5eb3e311f0a59aea8e869d4ad9", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "6f76425ab3e111f0a59aea8e869d4ad9", "parser_id": "naive", "parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "toc_extraction": false, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n      {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "name": "\u8f66\u8f86\u64cd\u4f5c\u89c4\u8303\u53ca\u5e38\u89c1\u6545\u969c\u6392\u67e5\u6307\u535720220208.docx", "type": "doc", "location": "\u8f66\u8f86\u64cd\u4f5c\u89c4\u8303\u53ca\u5e38\u89c1\u6545\u969c\u6392\u67e5\u6307\u535720220208.docx", "size": 2560931, "tenant_id": "74f31f7eb3cc11f093cd5e95a36a973e", "language": "English", "embd_id": "BAAI/bge-large-zh-v1.5@BAAI", "pagerank": 0, "kb_parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "toc_extraction": false, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n      {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "img2txt_id": "", "asr_id": "", "llm_id": "gpt-oss:120b@Ollama", "update_time": 1761645667725, "task_type": ""}
Traceback (most recent call last):
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/rag/svr/task_executor.py", line 972, in handle_task
    await do_handle_task(task)
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/utils/api_utils.py", line 775, in async_wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/rag/svr/task_executor.py", line 799, in do_handle_task
    embedding_model = LLMBundle(task_tenant_id, LLMType.EMBEDDING, llm_name=task_embedding_id, lang=task_language)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/db/services/llm_service.py", line 72, in __init__
    super().__init__(tenant_id, llm_type, llm_name, lang, **kwargs)
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/db/services/tenant_llm_service.py", line 250, in __init__
    self.mdl = TenantLLMService.model_instance(tenant_id, llm_type, llm_name, lang=lang, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/.venv/lib/python3.11/site-packages/peewee.py", line 3128, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/db/services/tenant_llm_service.py", line 132, in model_instance
    model_config = TenantLLMService.get_model_config(tenant_id, llm_type, llm_name)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/.venv/lib/python3.11/site-packages/peewee.py", line 3128, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/db/services/tenant_llm_service.py", line 120, in get_model_config
    raise LookupError(f"Model({mdlnm}@{fid}) not authorized")
LookupError: Model(BAAI/bge-large-zh-v1.5@BAAI) not authorized
’‘’

 
**2. I tried to parse .doc file by bge-m3:latest model that launched by ollama locally, but another error displayed in log as following:**
‘’‘
Please check that Ollama is downloaded, running and accessible. https://ollama.com/download
2025-10-28 18:28:25,932 ERROR    44839 Fail to bind embedding model: Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. https://ollama.com/download
Traceback (most recent call last):
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/rag/svr/task_executor.py", line 800, in do_handle_task
    vts, _ = embedding_model.encode(["ok"])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<@beartype(api.db.services.llm_service.LLMBundle.encode) at 0x3a202e700>", line 31, in encode
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/db/services/llm_service.py", line 84, in encode
    embeddings, used_tokens = self.mdl.encode(texts)
                              ^^^^^^^^^^^^^^^^^^^^^^
  File "<@beartype(rag.llm.embedding_model.OllamaEmbed.encode) at 0x3a15fda80>", line 31, in encode
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/rag/llm/embedding_model.py", line 278, in encode
    res = self.client.embeddings(prompt=txt, model=self.model_name, options={"use_mmap": True}, keep_alive=self.keep_alive)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/.venv/lib/python3.11/site-packages/ollama/_client.py", line 401, in embeddings
    return self._request(
           ^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/.venv/lib/python3.11/site-packages/ollama/_client.py", line 189, in _request
    return cls(**self._request_raw(*args, **kwargs).json())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/.venv/lib/python3.11/site-packages/ollama/_client.py", line 135, in _request_raw
    raise ConnectionError(CONNECTION_ERROR_MESSAGE) from None
ConnectionError: Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. https://ollama.com/download
2025-10-28 18:28:25,939 INFO     44839 set_progress(d65d4278b3e811f0b9f6ea8e869d4ad9), progress: -1, progress_msg: 18:28:25 [ERROR][Exception]: Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. https://ollama.com/download
2025-10-28 18:28:25,939 ERROR    44839 handle_task got exception for task {"id": "d65d4278b3e811f0b9f6ea8e869d4ad9", "doc_id": "232f8e5eb3e311f0a59aea8e869d4ad9", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "6f76425ab3e111f0a59aea8e869d4ad9", "parser_id": "naive", "parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "toc_extraction": false, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n      {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "name": "\u8f66\u8f86\u64cd\u4f5c\u89c4\u8303\u53ca\u5e38\u89c1\u6545\u969c\u6392\u67e5\u6307\u535720220208.docx", "type": "doc", "location": "\u8f66\u8f86\u64cd\u4f5c\u89c4\u8303\u53ca\u5e38\u89c1\u6545\u969c\u6392\u67e5\u6307\u535720220208.docx", "size": 2560931, "tenant_id": "74f31f7eb3cc11f093cd5e95a36a973e", "language": "English", "embd_id": "bge-m3:latest@Ollama", "pagerank": 0, "kb_parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "toc_extraction": false, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n      {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}}, "img2txt_id": "", "asr_id": "", "llm_id": "gpt-oss:120b@Ollama", "update_time": 1761647305884, "task_type": ""}
Traceback (most recent call last):
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/rag/svr/task_executor.py", line 972, in handle_task
    await do_handle_task(task)
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/utils/api_utils.py", line 775, in async_wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/rag/svr/task_executor.py", line 800, in do_handle_task
    vts, _ = embedding_model.encode(["ok"])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<@beartype(api.db.services.llm_service.LLMBundle.encode) at 0x3a202e700>", line 31, in encode
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/api/db/services/llm_service.py", line 84, in encode
    embeddings, used_tokens = self.mdl.encode(texts)
                              ^^^^^^^^^^^^^^^^^^^^^^
  File "<@beartype(rag.llm.embedding_model.OllamaEmbed.encode) at 0x3a15fda80>", line 31, in encode
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/rag/llm/embedding_model.py", line 278, in encode
    res = self.client.embeddings(prompt=txt, model=self.model_name, options={"use_mmap": True}, keep_alive=self.keep_alive)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/.venv/lib/python3.11/site-packages/ollama/_client.py", line 401, in embeddings
    return self._request(
           ^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/.venv/lib/python3.11/site-packages/ollama/_client.py", line 189, in _request
    return cls(**self._request_raw(*args, **kwargs).json())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/M2SSD/Space/Workspace/llm/ragflow_0211/.venv/lib/python3.11/site-packages/ollama/_client.py", line 135, in _request_raw
    raise ConnectionError(CONNECTION_ERROR_MESSAGE) from None
ConnectionError: Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. https://ollama.com/download
’‘’

**I found other similar issues, you suggested that we must register model firstly, but what "register model" mean? There is no any doc explain it.

Please kindly provide the detail solution. Thanks.**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: Parse doc failed #10850

Self Checks

Describe your problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: Parse doc failed #10850

Description

Self Checks

Describe your problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions