Skip to content

Commit dde5f86

Browse files
FEAT: Job manager concurrent job bug (#1640)
* adding CLI for batch submission * adding CLI for batch submission * chore: adding changelog file 1635.added.md [dependabot-skip] * siwave valcheck * job manager max concurrent cli * chore: adding changelog file 1640.added.md [dependabot-skip] * job service modified * local job with massive job fixed * local job with massive job fixed #3 * Val fix --------- Co-authored-by: pyansys-ci-bot <92810346+pyansys-ci-bot@users.noreply.github.com>
1 parent 266a866 commit dde5f86

File tree

10 files changed

+800
-123
lines changed

10 files changed

+800
-123
lines changed

doc/changelog.d/1635.added.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Adding CLI for batch submission

doc/changelog.d/1640.added.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Job manager concurrent job bug

doc/source/workflows/drc/drc.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _ref_drc:
22

33
==================================================================
4-
Design-rule checking (DRC)self-contained, multi-threaded engine
4+
Design-rule checking (DRC)self-contained, multi-threaded engine
55
==================================================================
66

77
.. currentmodule:: pyedb.workflows.drc.drc
@@ -85,7 +85,7 @@ Rule models
8585
BackDrillStubLength
8686
CopperBalance
8787

88-
DRC engine
88+
DRC Engine
8989
~~~~~~~~~~
9090

9191
.. autosummary::
@@ -129,7 +129,7 @@ Load a rule deck from JSON
129129
rules = Rules.from_dict(json.load(f))
130130
131131
Export violations to CSV
132-
~~~~~~~~~~~~~~~~~~~~~~~~
132+
~~~~~~~~~~~~~~~~~~~~~~~~~
133133

134134
.. code-block:: python
135135

doc/source/workflows/job_manager/submit_job.rst

Lines changed: 125 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ It exposes:
4444
* REST & Web-Socket endpoints (``http://localhost:8080`` by default)
4545
* Thread-safe synchronous façade for scripts / Jupyter
4646
* Native async API for advanced integrations
47-
* CLI utilities ``submit_local_job`` and ``submit_job_on_scheduler`` for shell / CI pipelines
47+
* CLI utilities ``submit_local_job``, ``submit_batch_jobs``, and ``submit_job_on_scheduler`` for shell / CI pipelines
4848

4949
The **same backend code path** is used regardless of front-end style; the difference is
5050
**who owns the event loop** and **how control is returned to the caller**.
@@ -176,6 +176,130 @@ Example—CLI (cluster)
176176
The command returns immediately after the job is **queued**; use the printed ID
177177
with ``wait_until_done`` or monitor via the web UI.
178178

179+
CLI—``submit_batch_jobs``
180+
^^^^^^^^^^^^^^^^^^^^^^^^^^
181+
For bulk submissions, use ``submit_batch_jobs`` to automatically discover and submit
182+
multiple projects from a directory tree.
183+
184+
Synopsis
185+
""""""""
186+
.. code-block:: bash
187+
188+
$ python submit_batch_jobs.py --root-dir <DIRECTORY> [options]
189+
190+
Key features
191+
""""""""""""
192+
* **Automatic discovery**: Scans for all ``.aedb`` folders and ``.aedt`` files
193+
* **Smart pairing**: When both ``.aedb`` and ``.aedt`` exist, uses the ``.aedt`` file
194+
* **Asynchronous submission**: Submits jobs concurrently for faster processing
195+
* **Recursive scanning**: Optional recursive directory traversal
196+
197+
Options
198+
"""""""
199+
.. list-table::
200+
:widths: 30 15 55
201+
:header-rows: 1
202+
203+
* - Argument
204+
- Default
205+
- Description
206+
* - ``--root-dir``
207+
- *(required)*
208+
- Root directory to scan for projects
209+
* - ``--host``
210+
- ``localhost``
211+
- Job manager host address
212+
* - ``--port``
213+
- ``8080``
214+
- Job manager port
215+
* - ``--num-cores``
216+
- ``8``
217+
- Number of cores to allocate per job
218+
* - ``--max-concurrent``
219+
- ``5``
220+
- Maximum concurrent job submissions
221+
* - ``--delay-ms``
222+
- ``100``
223+
- Delay in milliseconds between job submissions
224+
* - ``--recursive``
225+
- ``False``
226+
- Scan subdirectories recursively
227+
* - ``--verbose``
228+
- ``False``
229+
- Enable debug logging
230+
231+
Example—batch submission (local)
232+
"""""""""""""""""""""""""""""""""
233+
.. code-block:: bash
234+
235+
# Submit all projects in a directory
236+
$ python submit_batch_jobs.py --root-dir "D:\Temp\test_jobs"
237+
238+
# Recursive scan with custom core count
239+
$ python submit_batch_jobs.py \
240+
--root-dir "D:\Projects\simulations" \
241+
--num-cores 16 \
242+
--recursive \
243+
--verbose
244+
245+
Example output
246+
""""""""""""""
247+
.. code-block:: text
248+
249+
2025-11-07 10:30:15 - __main__ - INFO - Scanning D:\Temp\test_jobs for projects (recursive=False)
250+
2025-11-07 10:30:15 - __main__ - INFO - Found AEDB folder: D:\Temp\test_jobs\project1.aedb
251+
2025-11-07 10:30:15 - __main__ - INFO - Found AEDT file: D:\Temp\test_jobs\project2.aedt
252+
2025-11-07 10:30:15 - __main__ - INFO - Using AEDB folder for project: D:\Temp\test_jobs\project1.aedb
253+
2025-11-07 10:30:15 - __main__ - INFO - Using standalone AEDT file: D:\Temp\test_jobs\project2.aedt
254+
2025-11-07 10:30:15 - __main__ - INFO - Found 2 project(s) to submit
255+
2025-11-07 10:30:15 - __main__ - INFO - Starting batch submission of 2 project(s) to http://localhost:8080
256+
2025-11-07 10:30:16 - __main__ - INFO - ✓ Successfully submitted: project1.aedb (status=200)
257+
2025-11-07 10:30:16 - __main__ - INFO - ✓ Successfully submitted: project2.aedt (status=200)
258+
2025-11-07 10:30:16 - __main__ - INFO - ============================================================
259+
2025-11-07 10:30:16 - __main__ - INFO - Batch submission complete:
260+
2025-11-07 10:30:16 - __main__ - INFO - Total projects: 2
261+
2025-11-07 10:30:16 - __main__ - INFO - ✓ Successful: 2
262+
2025-11-07 10:30:16 - __main__ - INFO - ✗ Failed: 0
263+
2025-11-07 10:30:16 - __main__ - INFO - ============================================================
264+
265+
How it works
266+
""""""""""""
267+
1. **Scanning phase**:
268+
269+
* Searches for all ``.aedb`` folders in the root directory
270+
* Searches for all ``.aedt`` files in the root directory
271+
* For each ``.aedb`` folder, checks if a corresponding ``.aedt`` file exists:
272+
273+
- If yes: Uses the ``.aedt`` file
274+
- If no: Uses the ``.aedb`` folder
275+
276+
* Standalone ``.aedt`` files (without corresponding ``.aedb``) are also included
277+
278+
2. **Submission phase**:
279+
280+
* Creates job configurations for each project
281+
* Submits jobs asynchronously to the job manager REST API
282+
* Limits concurrent submissions using a semaphore (default: 5)
283+
* Reports success/failure for each submission
284+
285+
3. **Results**:
286+
287+
* Displays a summary with total, successful, and failed submissions
288+
* Logs detailed information about each submission
289+
290+
.. note::
291+
The script does **not** wait for jobs to complete, only for submission confirmation.
292+
Job execution happens asynchronously in the job manager service.
293+
294+
.. tip::
295+
* Use ``--max-concurrent`` to limit load on the job manager service when submitting
296+
large batches.
297+
* Use ``--delay-ms`` to control the pause between submissions (default: 100ms).
298+
This ensures HTTP requests are fully sent before the next submission starts.
299+
* Set ``--delay-ms 0`` to disable the delay if your network is very fast and reliable.
300+
* For very large batch submissions, consider increasing the timeout in the code if
301+
network latency is high.
302+
179303
Programmatic—native asyncio
180304
"""""""""""""""""""""""""""""
181305
.. code-block:: python

ignore_words.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,7 @@ aline
3232
COM
3333
gRPC
3434
Toolkits
35+
Cohn
36+
Pydantic
37+
pydantic
38+
Drc

src/pyedb/workflows/job_manager/backend/job_manager_handler.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -231,8 +231,9 @@ def __init__(self, edb=None, version=None, host="localhost", port=8080):
231231
else:
232232
self.ansys_path = os.path.join(installed_versions[version], "ansysedt.exe")
233233
self.scheduler_type = self._detect_scheduler()
234-
self.manager = JobManager(scheduler_type=self.scheduler_type)
235-
self.manager.resource_limits = ResourceLimits(max_concurrent_jobs=1)
234+
# Create resource limits with default values
235+
resource_limits = ResourceLimits(max_concurrent_jobs=1)
236+
self.manager = JobManager(resource_limits=resource_limits, scheduler_type=self.scheduler_type)
236237
self.manager.jobs = {} # In-memory job store -TODO add persistence database
237238
# Pass the detected ANSYS path to the manager
238239
self.manager.ansys_path = self.ansys_path

src/pyedb/workflows/job_manager/backend/job_submission.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@
6868
from datetime import datetime
6969
import enum
7070
import getpass
71+
import hashlib
7172
import logging
7273
import os
7374
import platform
@@ -77,6 +78,7 @@
7778
import subprocess # nosec B404
7879
import tempfile
7980
from typing import Any, Dict, List, Optional, Union
81+
import uuid
8082

8183
from pydantic import BaseModel, Field
8284

@@ -468,7 +470,12 @@ def __init__(self, **data):
468470
else:
469471
self.ansys_edt_path = os.path.join(list(installed_versions.values())[-1], "ansysedt.exe") # latest
470472
if not self.jobid:
471-
self.jobid = f"JOB_ID_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
473+
# Generate unique job ID using timestamp and UUID to avoid collisions
474+
# when submitting multiple jobs rapidly (batch submissions)
475+
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
476+
# Use short UUID (first 8 chars) for readability while ensuring uniqueness
477+
unique_id = str(uuid.uuid4())[:8]
478+
self.jobid = f"JOB_{timestamp}_{unique_id}"
472479
if "auto" not in data: # user did not touch it
473480
data["auto"] = self.scheduler_type != SchedulerType.NONE
474481
self.validate_fields()

0 commit comments

Comments
 (0)