Skip to content

Conversation

@hppritcha
Copy link
Member

No description provided.

Matthew-Whitlock and others added 30 commits October 9, 2025 15:10
Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>
Fork Sync: Update from parent repository
This error is also displayed in cases where files or directories do not
exist and is not only caused by missing permissions.

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
Standard has been updated to allow the log and
tool_connected upcalls to return status codes.
Update here to support them

Signed-off-by: Ralph Castain <rhc@pmix.org>
Need to do it based on PMIx capabilities

Signed-off-by: Ralph Castain <rhc@pmix.org>
Fork Sync: Update from parent repository
There was an initial thought that we would generate some
uber-list of defined flags for capabilities across all PMIx
versions, and then indicate which ones were supported by
this particular version by OR'ing them together into
some more general value. This has proven unworkable as you
get into a giant game of bit-counting to create the definitions.
Instead, we only define flags that this specific version
supports - thus, the value of the individual flag is irrelevant
and no general value is required.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Ralph Castain <rhc@pmix.org>
Fork Sync: Update from parent repository
Allow the target node list to follow the ordering inside a provided hostfile
and dash-host specification by not assigning a bookmark based on the DVM job.

Add support for missing default-hostfile cmd line option We have the support
for the user to specify it via MCA param, but somehow we lost the integration
to pick it up off of the prte and prterun cmd lines.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Fork Sync: Update from parent repository
Need to clear the character arrays between calculating
binding location for each proc as snprintf doesn't
terminate the string.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Fork Sync: Update from parent repository
PPR placement policy requests are uniform - i.e., the specified
number of procs must be placed on every object of the directed
type. When the request includes a cpu/proc directive, then there
must also be enough CPUs to meet the request on every object.

When that isn't the case, then we need to error out and not
just place the proc without binding it.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Fork Sync: Update from parent repository
If we are using the seq or rankfile mapper and have multiple
apps on the cmd line, then allow the mappers to compute
their own num procs if one or more are not given.

Signed-off-by: Ralph Castain <rhc@pmix.org>
The empty nodes were not properly being added to the list
of names to be used by the mapper.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Per note in the OMPI project, at least one compiler family is removing the "sprintf" function. Replace all uses of that function with the safer "snprintf" version.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Fork Sync: Update from parent repository
When a timeout is specified and the primary job is timed-out,
then we need to ensure we also report and kill any child jobs
it started. This includes reporting any requested stack
traces.

Also all inheritance of output directives like tag and timestamp.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Port the "launching-apps" section from the OMPI docs over
to PRRTE since it specifically deals with prterun usage.
Add some updates about gridengine support courtesy of
open-mpi/ompi#13450.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Fork Sync: Update from parent repository
Ensure the IDs provided are interpreted as core and not
hwt values. Properly error out when an ID is provided
that does not exist on the node, and note that this usually
is because the IDs address hwt's while we are treating cores
as CPUs.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Fork Sync: Update from parent repository
…embership

The PMIx_Group_construct API itself is not order sensitive
in the provided proc array - i.e., we ignore that order when executing the
operation. This allows each caller to provide the proc array in arbitrary
order - which is advantageous for most libraries.

However, some users may need us to return a specific order of the procs in the
final membership. Allow them to specify the order in a new attribute, This can be specified by individual processes
or by namespace (with a wildcard rank). If multiple participants
provide this attribute, then the values must match - i.e., the
desired final membership order must be identical.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Print the cpulist itself, and not its address

Signed-off-by: Ralph Castain <rhc@pmix.org>
Fork Sync: Update from parent repository
Use the hwloc synthetic topology string as the signature
instead of our custom attempt at counting number of types
of objects - the synthetic retains some hierarchical info
and hopefully does a little better job of detecting hetero
nodes are in use.

Signed-off-by: Ralph Castain <rhc@pmix.org>
rhc54 and others added 5 commits November 8, 2025 08:40
Update the MCA param help message to clarify what the param
does and what values it supports. Cleanup an error where we
would overwrite the resulting list of signals to forward.
Cleanup the return value so we don't generate spurious
error log output. Provide verbose output showing the
signals being forwarded.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Further improve automatic handling of hetero nodes
by making the non-symmetric signature unique, thereby
forcing collection of the full topology from each
such node. Fix an error in the topology retrieval
procedure whereby we double-counted cached nodes,
thereby causing us to quit collecting topologies early.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Need to init the ess framework to have the signal forwarding list initialized

Signed-off-by: Ralph Castain <rhc@pmix.org>
@hppritcha hppritcha requested a review from jsquyres November 12, 2025 21:19
@hppritcha hppritcha changed the title Sync ompi main with master bff13fb Sync ompi main with master at bff13fb Nov 12, 2025
@jsquyres jsquyres merged commit d083d73 into open-mpi:ompi_main Nov 14, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants