Skip to content

Conversation

@ankostis
Copy link

@ankostis ankostis commented Oct 3, 2019

This PR refactors the DAG-solver to prune correctly when intermediate data are given as input.

ENH: NEW DAG SOLVER

  • Pruning now works by breaking incoming provide-links
    to any given intermedediate inputs and retrofitting satisfaction detector
    to include those operations without outputs due to these broken links.

REFACT: Network class COMPILE+COMPUTE:

  • Read the doc-only c570a18 commit to understand changes.
  • Renamed/Refactored:
    --- net.steps --> net.execution_plan.
       ,-- _find_necessary_steps() ------------.
    ---+-- (old)compile() ---------------------+--> (new)compile() + _solve_dag()         
       `-- _collect_unsatisfied_operations --'
    
  • compile() became the master function invoking _solve_dag() &
    _build-execution_plan(), and do the caching.
  • WIP/refact show_layers() to allow yielding string-list
    (not just printing).
  • refact(net): move check if value in asked outputs before cache-evicting it when building DelInstructs in execution-plan; compute methods don't need outputs anymore.
  • TCs: speed up parallel/multihtread TCs by reducing delays & repetitions.
  • refact: network adopted stray functions for parallel processing - they all work on net.graph attribute; pave the way for a separate class.
  • refact: consistent use of networkx API when indexing dag.nodes views.
  • enh: add log message when deleting in parallel (in par with sequential code).
  • Raise AssertionError when invalid class in plan.
    it's a logic error, not a language type-error.
  • refact: var-renames, if-then-else simplifications, pythonisms.
  • doc: A lot!

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

were writing in text-mode in PY3. and failing as encoding error.
receiving partial inputs, needed for other operations.
+ The x2 TCs added just before are now passing.
NOTE dict are not deterministic in <PY3.6.
So this commit would not improve determinism
in those pythons.

+ build: add `boltons` dependency for ndexedSet.
+ doc: mark all set usage if affect determinism.
+ e.g. see reproducibility problem in yahoo#14.
needed to refactor the new pruning algorithm.

- expected 2 TCs fail for yet-to-be-solved yahoo#24 & yahoo#25 bugs.
override intermediate data.

More changes only for newer pruning TCs:
+ refact(test): rename graph-->netop vars for results of compose(),
  to avoid of `graph.net.graph`.
+ Explain failure modes in v1.2.4 & this merged branch (yahoo#19 + yahoo#23).
not after compose().

+ All TCs pass ok.
+ NOTE this is not yet what is described in yahoo#21.
@ankostis ankostis force-pushed the refact-new_dag_solver branch from 7e851b1 to 3ee344b Compare October 3, 2019 13:55
to pass +TC checking DeleteInst vary when inputs change.

- x4 TCs still failing, and need revamp of dag-solution.
+ Read the next doc-only commit to understand changes.
+ Renamed:
  + net.steps --> net.execution_plan.
  + (old)compile() --> _build_execution_plan()
  + _find_necessary_steps() --> (new)compile() + _solve_dag()
    compile() became the master function invoking _solve_dag &
    _build-execution_plan(), and do the caching.
+ refact(compute()): extract common tasks from sequential/parallel.
+ refact show_layers() to allow full-print, geting also string
  (not just printing), and using custom classes for representation.
+ Raise AssertionError when invalid class in plan.
  it's a logic error, not a language type-error.
Probaly unreported bug in v1.2.4 for '_neccessary_steps_cache`.
@ankostis ankostis force-pushed the refact-new_dag_solver branch 4 times, most recently from e67e57c to 31aae2f Compare October 3, 2019 15:12
+ Pruning behaves correctly also when outputs given;
  this happens by breaking incoming provide-links
  to any given intermedediate inputs.
+ Unsatisfied detection now includes those without outputs
  due to broken links (above).
+ Remove some uneeded "glue" from unsatisfied-detection code,
  leftover from previous compile() refactoring.
+ Renamed satisfiable --> satisfied.
+ Improved unknown output requested raise-message.
+ x3 TCs PASS, x1 in yahoo#24 and the first x2 in yahoo#25.
- 1x TCs in yahoo#25 still FAIL, and need "Pinning" of given-inputs
  (the operation MUST and MUST NOT run in these cases).
@ankostis ankostis force-pushed the refact-new_dag_solver branch from 31aae2f to 0830b7c Compare October 3, 2019 17:13
ankostis added a commit to ankostis/graphtik that referenced this pull request Oct 4, 2019
…must run

+ Insert "PinInstructions" in the execution-plan to avoid overwritting.
+ Add `_overwrite_collector` in `compose()` to collect re-calculated values.
+ FIX the last TC in yahoo#25.
@ankostis ankostis force-pushed the refact-new_dag_solver branch from 008d501 to b870451 Compare October 4, 2019 02:37
…must run

- WIP: PARALLEL execution not adding PINS!
+ Insert "PinInstructions" in the execution-plan to avoid overwritting.
+ Add `_overwrite_collector` in `compose()` to collect re-calculated values.
+ FIX the last TC in yahoo#25.
@ankostis ankostis force-pushed the refact-new_dag_solver branch from b870451 to 0dc1293 Compare October 4, 2019 02:41
@codecov-io
Copy link

codecov-io commented Oct 4, 2019

Codecov Report

Merging #26 into master will increase coverage by 0.44%.
The diff coverage is 85.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #26      +/-   ##
==========================================
+ Coverage   77.87%   78.31%   +0.44%     
==========================================
  Files           5        5              
  Lines         348      392      +44     
==========================================
+ Hits          271      307      +36     
- Misses         77       85       +8
Impacted Files Coverage Δ
graphkit/functional.py 93.82% <100%> (+0.07%) ⬆️
graphkit/base.py 75.34% <69.23%> (-4.03%) ⬇️
graphkit/network.py 73.16% <86.02%> (+2.95%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e70718b...fb1b074. Read the comment docs.

@ankostis ankostis force-pushed the refact-new_dag_solver branch from 7c94891 to 91c7b11 Compare October 4, 2019 10:53
- STILL buggy PIN on PARALLEL, 2 DISABLED TCs FAIL:
  - test_pruning_with_given_intermediate_and_asked_out()
  - test_unsatisfied_operations_same_out()
+ move check if value in asked outputs before cache-evicting it
  in build-execution-plan time - compute methods
  don't need outputs anymore.
+ test: speed up parallel/multihtread TCs
  by reducing delays & repetitions.
+ refact: network rightfully adopted stray functions
  for parallel processing - they all worke on the net.graph,
+ upd: networkx api by indexing on `dag.nodes` views.
+ enh: add log message when deleting in parallel
  (in par with sequential code).
+ refact: var-renames, if-then-else simplifications, pythonisms.
+ doc: A lot!
@ankostis ankostis force-pushed the refact-new_dag_solver branch from 91c7b11 to 06f6554 Compare October 4, 2019 21:25
@ankostis ankostis force-pushed the refact-new_dag_solver branch from 4603bbb to 1cc733e Compare October 4, 2019 21:35
ankostis added a commit to ankostis/graphtik that referenced this pull request Oct 4, 2019
- WIP: x4 TCs FAIL and still not discovered th bug :-(
+ BUT ALL+AUGMENTED PARALLEL TCs pass
  (yahoo#26 were failing some)
+ refact: net stores also `pruned_dag` (not only `steps`).
+ refact: _solve_dag() --> _prune_dag().
+ doc: +a lot.
+ TODO: store pruned_dag in own ExePlan class.
@ankostis
Copy link
Author

ankostis commented Oct 4, 2019

Can merge now:

  • STILL buggy PIN on PARALLEL, but code in 2 failing TCs is DISABLED:
    • test_pruning_with_given_intermediate_and_asked_out()
    • test_unsatisfied_operations_same_out()
  • refact(net): move check if value in asked outputs before cache-evicting it when building DelInstructs in execution-plan; compute methods don't need outputs anymore.
  • TCs: speed up parallel/multihtread TCs by reducing delays & repetitions.
  • refact: network adopted stray functions for parallel processing - they all work on net.graph attribute; pave the way for a separate class.
  • refact: consistent use of networkx API when indexing dag.nodes views.
  • enh: add log message when deleting in parallel (in par with sequential code).
  • refact: var-renames, if-then-else simplifications, pythonisms.
  • doc: A lot!

ankostis referenced this pull request in syamajala/graphkit Oct 8, 2019
ankostis added a commit to ankostis/graphtik that referenced this pull request Oct 8, 2019
many commits ago.

Never got it bc TC were not checking merges!
ankostis added a commit to ankostis/graphtik that referenced this pull request Oct 8, 2019
many commits ago.

Never got it bc TC were not checking merges!
@ankostis
Copy link
Author

ankostis commented Oct 8, 2019

$ date
Tue 08 Oct 2019 07:00:21 PM EEST


$ cloc --by-file-by-lang  graphkit 
       5 text files.
       5 unique files.                              
       0 files ignored.

github.com/AlDanial/cloc v 1.82  T=0.01 s (525.9 files/s, 119687.0 lines/s)
------------------------------------------------------------------------------------
File                                  blank        comment           code
------------------------------------------------------------------------------------
graphkit/network.py                     145            260            260
graphkit/functional.py                   51             75             89
graphkit/base.py                         38             86             84
graphkit/__init__.py                      3              3              5
graphkit/modifiers.py                     8             29              2
------------------------------------------------------------------------------------
SUM:                                    245            453            440
------------------------------------------------------------------------------------

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                           5            245            453            440
-------------------------------------------------------------------------------
SUM:                             5            245            453            440
-------------------------------------------------------------------------------

@ankostis
Copy link
Author

Just pushed x1 TC for multithreading-executions (originally given n huyng#1 and repeated in #31), it passes ok.

- WIP: x4 TCs FAIL and still not discovered th bug :-(
+ BUT ALL+AUGMENTED PARALLEL TCs pass
  (yahoo#26 were failing some)
+ refact: net stores also `pruned_dag` (not only `steps`).
+ refact: _solve_dag() --> _prune_dag().
+ doc: +a lot.
+ TODO: store pruned_dag in own ExePlan class.
... bugged in the opening commit d403783 of this PR, and
discovered 68(!) commits later, and all that time had to live
with x4 broken TCs with asked-outputs.
+ Partial fix deterministic results (yahoo#22-2.4.3i).
bc subgraph was taken on plain string outputs.

+ minor upd err-msg terminology.
due to bad node check, evicting parallels it nevered kicked in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: pruning when output given unjustly drops ancestors

2 participants