4.6. ws3 Parallel Model Building + HiGHS (highspy) — Hands-On

This notebook shows how to:

  1. Build a ws3 optimization problem model in both serial and parallel modes.

  2. Solve the problem with HiGHS via the highspy bindings.

  3. Measure wall time to find the parallel parameter sweet spot.

Note that we test building and solving a ws3 optimization on up to 64 cores in this notebook. Obviously this will not work as intended if you have fewer than 64 cores available in the environment in which you run this code. Use common sense in interpreting output.

4.6.1. Set up environment

Below we (optionally) install the ws3 package from the local source code in this repository using the -e flag to ensure that the package is installed in editable mode (i.e., any changes you make to the source code immediately affect ws3 behaviour the next time you run the notebook). This is is not necessary if you have installed ws3 using pip or another method.

Note that the code in this notebook has been tested with the version of the ws3 package that is included in this repository.

Set auto-reload to reload modules when they are changed.

[1]:
%load_ext autoreload
%autoreload 2
[2]:
clobber_ws3 = False
if clobber_ws3:
    %pip uninstall -y ws3
    %pip install -e ..

4.6.2. Step 0: Setup

On Linux, fork is the default (fastest). For Windows/macOS (and portable scripts), set spawn.

[3]:
import os, sys, time, platform, cProfile, pstats, io
import ws3
from ws3 import opt, forest
import highspy


# (Optional) force a portable start method in your launcher if needed:
# ws3.forest.MP_CONTEXT = "spawn"

4.6.3. Step 1: Load data and configure model

Set some reasonable model parameters.

[4]:
base_year = 2020    # base year for the problem
horizon = 20        # number of periods in the simulation horizon
period_length = 10  # period length (in years)
max_age = 1000      # maximum age of a stand (in years)
tvy_name = "totvol" # name for total volume yield component

Create a new ForestModel object and import a model dataset from the data directory.

[5]:
fm = ws3.forest.ForestModel(model_name="tsa24",
                            model_path="data/woodstock_model_files_tsa24",
                            base_year=base_year,
                            horizon=horizon,
                            period_length=period_length,
                            max_age=max_age)
fm.import_landscape_section()
fm.import_areas_section(convert_periods_to_years=period_length)
fm.import_yields_section(convert_periods_to_years=period_length)
fm.import_actions_section(convert_periods_to_years=period_length)
fm.import_transitions_section(convert_periods_to_years=period_length)
fm.initialize_areas()
fm.add_null_action()
fm.reset_actions()

fm.actions["harvest"].is_harvest = True # set harvest action to be a harvest action (needed for `cmp_c_z` to work correctly)

[6]:
import time
import functools
from util import cmp_c_z, cmp_c_caa, cmp_c_ci
from util import compile_scenario, plot_scenario
[7]:
acodes = ("null", "harvest")
expr = "0.85 * totvol"

# Row builders / coefficient functions (project-specific)
coeff_funcs = {
    "z": functools.partial(cmp_c_z, expr=expr),
    "cflw_hv": functools.partial(cmp_c_caa, expr=expr, acodes=["harvest"]),
    "cflw_ha": functools.partial(cmp_c_caa, expr="1.", acodes=["harvest"]),
    "cgen_gs" : functools.partial(cmp_c_ci, yname=tvy_name, mask=None)
}

# Flow-constraint epsilons and reference period
cflw_e = {
    "cflw_hv": ({p:0.05 for p in fm.periods}, 1),
    "cflw_ha": ({p:0.05 for p in fm.periods}, 1),
}

# General bounds (per period)
gs_lb_rhs = fm.inventory(0, "totvol") * 0.90 # 90% of initial growing stock level
cgen_data = {
    "cgen_gs": {"lb":{10:gs_lb_rhs}, "ub":{10:999999999.}}}

Build and solve the problem with 1 core. You might want to open an interactive system terminal shell before running the rest of this notebook and keep an eye on memory allocation and CPU core activity while the model builds and solves (to get a realtime view of system resource utilization under different parameter values).

[8]:
workers = 1
[9]:
t0 = time.perf_counter()
problem = fm.add_problem(
    name="test",
    coeff_funcs=coeff_funcs,
    cflw_e=cflw_e,
    cgen_data=cgen_data,
    acodes=acodes,
    sense=ws3.opt.SENSE_MAXIMIZE,
    mask=None,
    workers=workers,
    verbose=True
)
t1 = time.perf_counter()
print(f"Build time: {t1 - t0:.2f}s")
add_problem: build problem
generate trees using 1 workers
process trees
_bld_p_m1: build problem
_bld_p_m1: done building problem
add_problem: compile flow constraints
_cmp_cflw_m1: phase 1
_cmp_cflw_m1: phase 2
_cmp_cflw_m1: phase 3
add_problem: compile general constraints
Build time: 205.91s

Now we have a reference time for building the problem in serial mode.

We will also solve the problem with 1 core and compile and display the solution just to make sure the model is working correctly before proceeding to test various parallel model parameter values.

[10]:
problem.solve(verbose=True, threads=workers)

Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
WARNING: LP matrix packed vector contains 3 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 4.54747e-13] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [7.27596e-12, 7.27596e-12] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 1.45519e-11] less than or equal to 1e-09: ignored
LP   has 7778 rows; 461956 cols; 10014103 nonzeros
Coefficient ranges:
  Matrix [1e-03, 1e+07]
  Cost   [3e+00, 2e+07]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+09]
Presolving model
4476 rows, 458658 cols, 9697481 nonzeros  3s
Dependent equations search running on 4398 equations with time limit of 1000.00s
Dependent equations search removed 0 rows and 0 nonzeros in 0.02s (limit = 1000.00s)
4475 rows, 458658 cols, 9413474 nonzeros  5s
Presolve : Reductions: rows 4475(-3303); columns 458658(-3298); elements 9413474(-600629)
Solving the presolved LP
WARNING: Number of threads available = 1 < 8 = Simplex concurrency to be used: Parallel performance may be less than anticipated
Using EKK parallel dual simplex solver - SIP with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 7s
         72    -1.0611783165e+10 Pr: 4434(1.52937e+09); Du: 0(1.72505e-07) 133s
        144    -8.3945478108e+09 Pr: 4386(2.55624e+08) 160s
        447    -7.5791507202e+09 Pr: 4186(1.16792e+08) 165s
       1374    -6.4218954439e+09 Pr: 3702(1.01576e+08) 170s
       2441    -5.3558475700e+09 Pr: 3528(6.53618e+07) 176s
       3299    -4.7823649957e+09 Pr: 3415(1.09355e+08) 181s
       4183    -3.9937973149e+09 Pr: 3586(9.95246e+07) 186s
       5945    -3.3607201147e+09 Pr: 3442(1.1896e+08) 191s
       8392    -2.4218065109e+09 Pr: 2861(6.12309e+07); Du: 0(3.2306e-08) 197s
      10886    -1.9272268074e+09 Pr: 2453(3.86569e+07) 202s
      12740    -1.7360223742e+09 Pr: 2455(2.60278e+07) 208s
      15022    -1.7049224331e+09 Pr: 2251(6.11247e+07); Du: 0(8.4082e-08) 213s
      16937    -1.6728486671e+09 Pr: 2153(3.21198e+07) 218s
      18965    -1.6112238062e+09 Pr: 2182(2.64841e+07) 224s
      21024    -1.5777114067e+09 Pr: 1900(2.30809e+07) 229s
      23368    -1.5647453176e+09 Pr: 1867(8.2494e+07); Du: 0(2.13932e-08) 235s
      25893    -1.5499435701e+09 Pr: 1790(1.33966e+07) 240s
      28123    -1.4933861118e+09 Pr: 1601(5.46795e+06); Du: 0(4.15042e-09) 246s
      29835    -1.4659928811e+09 Pr: 1221(3.05762e+06); Du: 0(7.24469e-09) 252s
      31567    -1.4461292456e+09 Pr: 506(1.81901e+06); Du: 0(1.28772e-08) 259s
      32297    -1.4441396268e+09 Pr: 174(36813.7); Du: 0(8.41078e-08) 264s
      32478    -1.4440246751e+09 Pr: 0(0); Du: 0(5.81036e-07) 265s
WARNING: Using concurrency of 1 for parallel strategy rather than minimum number (2) specified in options
Using EKK primal simplex solver
  Iteration        Objective     Infeasibilities num(sum)
      32478    -1.4440293920e+09 Pr: 0(0); Du: 3663(0.00175359) 265s
      32507    -1.4440246751e+09 Pr: 0(0); Du: 0(8.1066e-06) 266s
Solving the original LP from the solution after postsolve
Model status        : Optimal
Simplex   iterations: 32507
Objective value     : -1.4440246751e+09
P-D objective error :  1.6510699791e-15
HiGHS run time      :        266.48
[11]:
if problem.status() != ws3.opt.STATUS_OPTIMAL:
        print('Model not optimal.')
        df = None
else:
    sch = fm.compile_schedule(problem)
    fm.apply_schedule(sch,
        force_integral_area=False,
        override_operability=False,
        fuzzy_age=False,
        recourse_enabled=False,
        verbose=False,
        compile_c_ycomps=True
    )
    df = compile_scenario(fm)
    print(df)
    fig, ax = plot_scenario(df)
    period            oha           ohv           ogs
0        1  323917.431871  7.202118e+07  6.927929e+08
1        2  307721.560278  6.842012e+07  6.798218e+08
2        3  307721.560278  6.842012e+07  6.760743e+08
3        4  307721.560278  6.842012e+07  6.789138e+08
4        5  340113.303465  6.842012e+07  6.827610e+08
5        6  340113.303465  6.842012e+07  6.854458e+08
6        7  340113.303465  6.842012e+07  6.901670e+08
7        8  333083.385036  6.842012e+07  6.975081e+08
8        9  307721.560278  6.842012e+07  7.018079e+08
9       10  307721.560278  6.842012e+07  6.997713e+08
10      11  307721.560278  7.562224e+07  6.761588e+08
11      12  340113.303465  7.562224e+07  6.450464e+08
12      13  340113.303465  7.562224e+07  6.144736e+08
13      14  340113.303465  7.562224e+07  5.893272e+08
14      15  340113.303465  7.562224e+07  5.739232e+08
15      16  340113.303465  7.562224e+07  5.680156e+08
16      17  340113.303465  7.562224e+07  5.648146e+08
17      18  340113.303465  7.562224e+07  5.603040e+08
18      19  340113.303465  7.562224e+07  5.530622e+08
19      20  340113.303465  7.562224e+07  5.426653e+08
../_images/examples_023_ws3_model_example-optimize-parallel_20_1.png

Rebuild and solve the same problem, but with 2 cores.

[12]:
workers = 2
[13]:
t0 = time.perf_counter()
problem = fm.add_problem(
    name="test",
    coeff_funcs=coeff_funcs,
    cflw_e=cflw_e,
    cgen_data=cgen_data,
    acodes=acodes,
    sense=ws3.opt.SENSE_MAXIMIZE,
    mask=None,
    workers=workers,
    verbose=True
)
t1 = time.perf_counter()
print(f"Build time: {t1 - t0:.2f}s")
add_problem: build problem
generate trees using 2 workers
process trees
_bld_p_m1: build problem
_bld_p_m1: done building problem
add_problem: compile flow constraints
_cmp_cflw_m1: phase 1
_cmp_cflw_m1: phase 2
_cmp_cflw_m1: phase 3
add_problem: compile general constraints
Build time: 224.74s
[14]:
problem.solve(verbose=True, threads=workers)

Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
WARNING: LP matrix packed vector contains 3 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 4.54747e-13] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [7.27596e-12, 7.27596e-12] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 1.45519e-11] less than or equal to 1e-09: ignored
LP   has 7778 rows; 461956 cols; 10014103 nonzeros
Coefficient ranges:
  Matrix [1e-03, 1e+07]
  Cost   [3e+00, 2e+07]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+09]
Presolving model
4476 rows, 458658 cols, 9697481 nonzeros  3s
Dependent equations search running on 4398 equations with time limit of 1000.00s
Dependent equations search removed 0 rows and 0 nonzeros in 0.02s (limit = 1000.00s)
4475 rows, 458658 cols, 9413474 nonzeros  5s
Presolve : Reductions: rows 4475(-3303); columns 458658(-3298); elements 9413474(-600629)
Solving the presolved LP
WARNING: Number of threads available = 2 < 8 = Simplex concurrency to be used: Parallel performance may be less than anticipated
Using EKK parallel dual simplex solver - SIP with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 7s
         62    -9.3778657921e+09 Pr: 4412(6.28471e+08) 146s
        127    -8.1932871410e+09 Pr: 4203(4.61045e+08); Du: 0(9.41877e-08) 158s
        489    -7.4080343917e+09 Pr: 4081(7.61578e+07); Du: 0(2.55574e-09) 164s
       1788    -5.9517248380e+09 Pr: 3633(8.25822e+07); Du: 0(2.98644e-08) 169s
       2754    -5.1174280373e+09 Pr: 3403(9.00023e+07); Du: 0(3.30517e-08) 175s
       3935    -4.0599762397e+09 Pr: 3524(9.52876e+07) 181s
       6048    -3.1846461666e+09 Pr: 3531(1.18075e+08) 186s
       8802    -2.2111406738e+09 Pr: 2864(3.54672e+07) 191s
      11645    -1.8548556807e+09 Pr: 2452(6.9844e+07) 196s
      14859    -1.6961273901e+09 Pr: 2289(2.0375e+07) 202s
      17684    -1.6239667123e+09 Pr: 2082(1.86446e+07) 207s
      20715    -1.5689207037e+09 Pr: 1775(1.85489e+07); Du: 0(8.79974e-09) 212s
      24110    -1.5624119647e+09 Pr: 1935(3.77035e+07) 218s
      27254    -1.4958470606e+09 Pr: 1742(7.29301e+06); Du: 0(3.48364e-08) 224s
      29916    -1.4647766851e+09 Pr: 1188(5.49377e+06); Du: 0(3.24601e-08) 229s
      31827    -1.4461284711e+09 Pr: 510(337654); Du: 0(6.59284e-08) 236s
      32675    -1.4440246751e+09 Pr: 0(0); Du: 0(4.25428e-09) 240s
Solving the original LP from the solution after postsolve
Model status        : Optimal
Simplex   iterations: 32675
Objective value     : -1.4440246751e+09
P-D objective error :  2.0638374739e-15
HiGHS run time      :        240.43

Rebuild and solve the same problem, but with 4 cores.

[15]:
workers = 4
[16]:
t0 = time.perf_counter()
problem = fm.add_problem(
    name="test",
    coeff_funcs=coeff_funcs,
    cflw_e=cflw_e,
    cgen_data=cgen_data,
    acodes=acodes,
    sense=ws3.opt.SENSE_MAXIMIZE,
    mask=None,
    workers=workers,
    verbose=True
)
t1 = time.perf_counter()
print(f"Build time: {t1 - t0:.2f}s")
add_problem: build problem
generate trees using 4 workers
process trees
_bld_p_m1: build problem
_bld_p_m1: done building problem
add_problem: compile flow constraints
_cmp_cflw_m1: phase 1
_cmp_cflw_m1: phase 2
_cmp_cflw_m1: phase 3
add_problem: compile general constraints
Build time: 180.17s
[17]:
problem.solve(verbose=True, threads=workers)

Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
WARNING: LP matrix packed vector contains 1 |values| in [7.27596e-12, 7.27596e-12] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 3 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 4.54747e-13] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 1.45519e-11] less than or equal to 1e-09: ignored
LP   has 7778 rows; 461956 cols; 10014103 nonzeros
Coefficient ranges:
  Matrix [1e-03, 1e+07]
  Cost   [3e+00, 2e+07]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+09]
Presolving model
4476 rows, 458658 cols, 9697481 nonzeros  3s
Dependent equations search running on 4398 equations with time limit of 1000.00s
Dependent equations search removed 0 rows and 0 nonzeros in 0.02s (limit = 1000.00s)
4475 rows, 458658 cols, 9413474 nonzeros  5s
Presolve : Reductions: rows 4475(-3303); columns 458658(-3298); elements 9413474(-600629)
Solving the presolved LP
WARNING: Number of threads available = 4 < 8 = Simplex concurrency to be used: Parallel performance may be less than anticipated
Using EKK parallel dual simplex solver - SIP with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 7s
         76    -1.0090968799e+10 Pr: 4435(5.65204e+08) 132s
        134    -8.3735976761e+09 Pr: 4198(2.35676e+08) 142s
        551    -7.3666232510e+09 Pr: 4085(7.74913e+07) 148s
       1881    -5.9582324420e+09 Pr: 3658(7.05966e+07) 153s
       3044    -5.0283147476e+09 Pr: 3442(9.61591e+07) 159s
       3997    -4.0718214698e+09 Pr: 3463(8.85617e+07) 164s
       6680    -3.0293157251e+09 Pr: 3469(1.21311e+08) 169s
      10081    -2.0157907527e+09 Pr: 2749(7.67172e+07) 175s
      13365    -1.7283759175e+09 Pr: 2391(4.09505e+07) 180s
      16623    -1.6795285974e+09 Pr: 2133(3.02162e+07) 185s
      19761    -1.5877275648e+09 Pr: 2154(1.78827e+07) 191s
      22848    -1.5680189680e+09 Pr: 1940(5.94837e+07); Du: 0(7.01371e-08) 196s
      26728    -1.4953484146e+09 Pr: 1641(4.0448e+06); Du: 0(6.26156e-08) 202s
      29335    -1.4647734459e+09 Pr: 1212(2.73558e+06); Du: 0(2.48074e-07) 207s
      31201    -1.4461285715e+09 Pr: 528(675780) 213s
      32064    -1.4440246751e+09 Pr: 0(0); Du: 0(3.44383e-09) 216s
Solving the original LP from the solution after postsolve
Model status        : Optimal
Simplex   iterations: 32064
Objective value     : -1.4440246751e+09
P-D objective error :  3.7974609520e-15
HiGHS run time      :        217.26

Rebuild and solve the same problem, but with 8 cores.

[18]:
workers = 8
[19]:
t0 = time.perf_counter()
problem = fm.add_problem(
    name="test",
    coeff_funcs=coeff_funcs,
    cflw_e=cflw_e,
    cgen_data=cgen_data,
    acodes=acodes,
    sense=ws3.opt.SENSE_MAXIMIZE,
    mask=None,
    workers=workers,
    verbose=True
)
t1 = time.perf_counter()
print(f"Build time: {t1 - t0:.2f}s")
add_problem: build problem
generate trees using 8 workers
process trees
_bld_p_m1: build problem
_bld_p_m1: done building problem
add_problem: compile flow constraints
_cmp_cflw_m1: phase 1
_cmp_cflw_m1: phase 2
_cmp_cflw_m1: phase 3
add_problem: compile general constraints
Build time: 199.71s
[20]:
problem.solve(verbose=True, threads=workers)

Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
WARNING: LP matrix packed vector contains 1 |values| in [7.27596e-12, 7.27596e-12] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 3 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 4.54747e-13] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 1.45519e-11] less than or equal to 1e-09: ignored
LP   has 7778 rows; 461956 cols; 10014103 nonzeros
Coefficient ranges:
  Matrix [1e-03, 1e+07]
  Cost   [3e+00, 2e+07]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+09]
Presolving model
4476 rows, 458658 cols, 9697481 nonzeros  4s
Dependent equations search running on 4398 equations with time limit of 1000.00s
Dependent equations search removed 0 rows and 0 nonzeros in 0.02s (limit = 1000.00s)
4475 rows, 458658 cols, 9413474 nonzeros  5s
Presolve : Reductions: rows 4475(-3303); columns 458658(-3298); elements 9413474(-600629)
Solving the presolved LP
Using EKK parallel dual simplex solver - SIP with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 7s
         71    -9.1403711680e+09 Pr: 4398(7.6073e+08) 147s
        129    -8.1701817935e+09 Pr: 4203(2.90735e+08) 155s
       1144    -6.5908877337e+09 Pr: 3836(1.60856e+08) 161s
       2370    -5.3544632983e+09 Pr: 3576(6.67971e+07) 166s
       3529    -4.3716887834e+09 Pr: 3356(1.09251e+08) 173s
       5293    -3.3461327758e+09 Pr: 3669(1.11313e+08) 178s
       8607    -2.1856485784e+09 Pr: 2881(4.00959e+07); Du: 0(9.98316e-08) 183s
      11756    -1.7688183088e+09 Pr: 2475(7.34996e+07) 188s
      15545    -1.6829120005e+09 Pr: 2175(5.19987e+07); Du: 0(3.77496e-08) 194s
      19078    -1.5857819348e+09 Pr: 2031(2.38788e+07) 199s
      22510    -1.5685961999e+09 Pr: 1697(1.11435e+07) 204s
      26509    -1.5164456066e+09 Pr: 1814(9.75099e+06); Du: 0(3.98685e-08) 209s
      29621    -1.4613754431e+09 Pr: 930(1.98947e+06); Du: 0(1.65603e-09) 215s
      31540    -1.4441720306e+09 Pr: 205(119988); Du: 0(1.24444e-07) 220s
      31783    -1.4440246751e+09 Pr: 0(0); Du: 0(1.28472e-07) 221s
WARNING: Using concurrency of 1 for parallel strategy rather than minimum number (2) specified in options
Using EKK primal simplex solver
  Iteration        Objective     Infeasibilities num(sum)
      31783    -1.4440246751e+09 Pr: 0(0); Du: 715(0.000372316) 221s
      31797    -1.4440246751e+09 Pr: 0(0); Du: 0(1.04817e-05) 222s
Solving the original LP from the solution after postsolve
Model status        : Optimal
Simplex   iterations: 31797
Objective value     : -1.4440246751e+09
P-D objective error :  7.4298149061e-16
HiGHS run time      :        222.54

Rebuild and solve the same problem, but with 16 cores.

[21]:
workers = 16
[22]:
t0 = time.perf_counter()
problem = fm.add_problem(
    name="test",
    coeff_funcs=coeff_funcs,
    cflw_e=cflw_e,
    cgen_data=cgen_data,
    acodes=acodes,
    sense=ws3.opt.SENSE_MAXIMIZE,
    mask=None,
    workers=workers,
    verbose=True
)
t1 = time.perf_counter()
print(f"Build time: {t1 - t0:.2f}s")
add_problem: build problem
generate trees using 16 workers
process trees
_bld_p_m1: build problem
_bld_p_m1: done building problem
add_problem: compile flow constraints
_cmp_cflw_m1: phase 1
_cmp_cflw_m1: phase 2
_cmp_cflw_m1: phase 3
add_problem: compile general constraints
Build time: 227.73s
[23]:
problem.solve(verbose=True, threads=workers)

Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
WARNING: LP matrix packed vector contains 1 |values| in [7.27596e-12, 7.27596e-12] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 3 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 4.54747e-13] less than or equal to 1e-09: ignored
LP   has 7778 rows; 461956 cols; 10014103 nonzeros
Coefficient ranges:
  Matrix [1e-03, 1e+07]
  Cost   [3e+00, 2e+07]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+09]
Presolving model
4476 rows, 458658 cols, 9697481 nonzeros  3s
Dependent equations search running on 4398 equations with time limit of 1000.00s
Dependent equations search removed 0 rows and 0 nonzeros in 0.02s (limit = 1000.00s)
4475 rows, 458658 cols, 9413474 nonzeros  5s
Presolve : Reductions: rows 4475(-3303); columns 458658(-3298); elements 9413474(-600629)
Solving the presolved LP
Using EKK parallel dual simplex solver - SIP with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 7s
         66    -1.1251912865e+10 Pr: 4375(1.73568e+09) 151s
        121    -8.4309021230e+09 Pr: 4410(3.91691e+08) 179s
        448    -7.5477651357e+09 Pr: 4106(1.65751e+08) 185s
       1667    -6.0596936456e+09 Pr: 3561(9.06301e+07) 191s
       2653    -5.1788871593e+09 Pr: 3488(1.07611e+08) 196s
       3810    -4.3729156030e+09 Pr: 3536(7.98353e+07) 202s
       5526    -3.5430741540e+09 Pr: 3520(6.83297e+07) 207s
       8576    -2.2154244405e+09 Pr: 2872(5.07237e+07) 213s
      11998    -1.7614115077e+09 Pr: 2395(4.49786e+07) 218s
      15479    -1.6936998494e+09 Pr: 2020(1.84433e+07) 224s
      19102    -1.6008342410e+09 Pr: 2075(2.84657e+07) 229s
      22465    -1.5684003360e+09 Pr: 1740(1.46312e+07) 234s
      26609    -1.5087149887e+09 Pr: 1828(5.55282e+06) 240s
      29126    -1.4634396103e+09 Pr: 1130(1.56525e+06); Du: 0(5.75094e-08) 245s
      30982    -1.4444263286e+09 Pr: 332(943478); Du: 0(2.00021e-08) 250s
      31453    -1.4440246751e+09 Pr: 0(0); Du: 0(3.40625e-08) 252s
WARNING: Using concurrency of 1 for parallel strategy rather than minimum number (2) specified in options
Using EKK primal simplex solver
  Iteration        Objective     Infeasibilities num(sum)
      31453    -1.4440246751e+09 Pr: 0(0); Du: 132(0.000116388) 252s
      31454    -1.4440246751e+09 Pr: 0(0); Du: 0(7.28502e-06) 252s
Solving the original LP from the solution after postsolve
Model status        : Optimal
Simplex   iterations: 31454
Objective value     : -1.4440246751e+09
P-D objective error :  2.2289444718e-15
HiGHS run time      :        253.24

Rebuild and solve the same problem, but with 32 cores.

[24]:
workers = 32
[25]:
t0 = time.perf_counter()
problem = fm.add_problem(
    name="test",
    coeff_funcs=coeff_funcs,
    cflw_e=cflw_e,
    cgen_data=cgen_data,
    acodes=acodes,
    sense=ws3.opt.SENSE_MAXIMIZE,
    mask=None,
    workers=workers,
    verbose=True
)
t1 = time.perf_counter()
print(f"Build time: {t1 - t0:.2f}s")
add_problem: build problem
generate trees using 32 workers
process trees
_bld_p_m1: build problem
_bld_p_m1: done building problem
add_problem: compile flow constraints
_cmp_cflw_m1: phase 1
_cmp_cflw_m1: phase 2
_cmp_cflw_m1: phase 3
add_problem: compile general constraints
Build time: 254.56s
[26]:
problem.solve(verbose=True, threads=workers)

Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
WARNING: LP matrix packed vector contains 3 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 4.54747e-13] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [7.27596e-12, 7.27596e-12] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 1.45519e-11] less than or equal to 1e-09: ignored
LP   has 7778 rows; 461956 cols; 10014103 nonzeros
Coefficient ranges:
  Matrix [1e-03, 1e+07]
  Cost   [3e+00, 2e+07]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+09]
Presolving model
4476 rows, 458658 cols, 9697481 nonzeros  3s
Dependent equations search running on 4398 equations with time limit of 1000.00s
Dependent equations search removed 0 rows and 0 nonzeros in 0.02s (limit = 1000.00s)
4475 rows, 458658 cols, 9413474 nonzeros  5s
Presolve : Reductions: rows 4475(-3303); columns 458658(-3298); elements 9413474(-600629)
Solving the presolved LP
Using EKK parallel dual simplex solver - SIP with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 7s
         68    -9.8287125279e+09 Pr: 4436(4.32668e+08) 154s
        128    -8.2487637044e+09 Pr: 4122(2.29633e+08) 173s
        574    -7.2733102617e+09 Pr: 4108(8.02629e+07) 178s
       1944    -5.8241401830e+09 Pr: 3519(7.09098e+07) 184s
       3189    -4.8691855008e+09 Pr: 3283(8.24689e+07) 189s
       4115    -4.0208477673e+09 Pr: 3543(1.4801e+08) 195s
       6467    -3.0020163843e+09 Pr: 3480(1.08353e+08) 200s
       9930    -1.9947083169e+09 Pr: 2704(2.31275e+08) 206s
      13367    -1.7211583486e+09 Pr: 2322(4.99249e+07) 211s
      16730    -1.6626995865e+09 Pr: 2198(2.38206e+07) 216s
      20489    -1.5753482098e+09 Pr: 1964(6.11287e+07) 221s
      24818    -1.5617086064e+09 Pr: 1933(1.5556e+07) 227s
      29155    -1.4822839970e+09 Pr: 1387(4.64393e+06); Du: 0(4.05335e-08) 232s
      31526    -1.4509350067e+09 Pr: 763(2.60451e+06); Du: 0(5.62414e-08) 238s
      32923    -1.4440246751e+09 Pr: 0(0); Du: 0(3.45811e-08) 242s
WARNING: Using concurrency of 1 for parallel strategy rather than minimum number (2) specified in options
Using EKK primal simplex solver
  Iteration        Objective     Infeasibilities num(sum)
      32923    -1.4440246751e+09 Pr: 0(0); Du: 83(8.98855e-05) 242s
      32929    -1.4440246751e+09 Pr: 0(0); Du: 0(5.31874e-06) 242s
Solving the original LP from the solution after postsolve
Model status        : Optimal
Simplex   iterations: 32929
Objective value     : -1.4440246751e+09
P-D objective error :  1.3208559833e-15
HiGHS run time      :        243.22

Rebuild and solve the same problem, but with 64 cores.

[27]:
workers = 64
[28]:
t0 = time.perf_counter()
problem = fm.add_problem(
    name="test",
    coeff_funcs=coeff_funcs,
    cflw_e=cflw_e,
    cgen_data=cgen_data,
    acodes=acodes,
    sense=ws3.opt.SENSE_MAXIMIZE,
    mask=None,
    workers=workers,
    verbose=True
)
t1 = time.perf_counter()
print(f"Build time: {t1 - t0:.2f}s")
add_problem: build problem
generate trees using 64 workers
process trees
_bld_p_m1: build problem
_bld_p_m1: done building problem
add_problem: compile flow constraints
_cmp_cflw_m1: phase 1
_cmp_cflw_m1: phase 2
_cmp_cflw_m1: phase 3
add_problem: compile general constraints
Build time: 284.35s
[29]:
problem.solve(verbose=True, threads=workers)

Running HiGHS 1.11.0 (git hash: 364c83a): Copyright (c) 2025 HiGHS under MIT licence terms
WARNING: LP matrix packed vector contains 3 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 4.54747e-13] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [7.27596e-12, 7.27596e-12] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 1 |values| in [1.45519e-11, 1.45519e-11] less than or equal to 1e-09: ignored
WARNING: LP matrix packed vector contains 2 |values| in [4.54747e-13, 1.45519e-11] less than or equal to 1e-09: ignored
LP   has 7778 rows; 461956 cols; 10014103 nonzeros
Coefficient ranges:
  Matrix [1e-03, 1e+07]
  Cost   [3e+00, 2e+07]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 1e+09]
Presolving model
4476 rows, 458658 cols, 9697481 nonzeros  4s
Dependent equations search running on 4398 equations with time limit of 1000.00s
Dependent equations search removed 0 rows and 0 nonzeros in 0.02s (limit = 1000.00s)
4475 rows, 458658 cols, 9413474 nonzeros  5s
Presolve : Reductions: rows 4475(-3303); columns 458658(-3298); elements 9413474(-600629)
Solving the presolved LP
Using EKK parallel dual simplex solver - SIP with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 7s
         67    -9.4437714651e+09 Pr: 4442(2.68216e+09) 173s
        125    -8.2555413978e+09 Pr: 4145(2.33438e+08) 186s
        545    -7.3750742902e+09 Pr: 4206(7.90811e+07) 192s
       1851    -5.9849268130e+09 Pr: 3481(1.21812e+08); Du: 0(6.81872e-08) 197s
       3023    -5.0439835211e+09 Pr: 3465(1.49353e+08) 202s
       4045    -4.2461134672e+09 Pr: 3484(8.90032e+07) 207s
       6646    -3.0152325529e+09 Pr: 3535(9.47888e+07) 212s
      10192    -1.9960665728e+09 Pr: 2720(4.36531e+07) 218s
      13704    -1.7268481182e+09 Pr: 2362(2.45244e+07) 223s
      17530    -1.6369445728e+09 Pr: 2097(6.80344e+07) 228s
      20812    -1.5764268782e+09 Pr: 1864(1.90361e+07) 233s
      25475    -1.5336011536e+09 Pr: 1873(2.71848e+07); Du: 0(2.59463e-08) 239s
      28624    -1.4757911322e+09 Pr: 1288(4.43103e+06) 244s
      30905    -1.4506586473e+09 Pr: 719(788199); Du: 0(1.41957e-07) 250s
      32465    -1.4440342403e+09 Pr: 0(0); Du: 0(4.43213e-07) 255s
      32465    -1.4440246751e+09 Pr: 0(0); Du: 0(4.51458e-08) 255s
WARNING: Using concurrency of 1 for parallel strategy rather than minimum number (2) specified in options
Using EKK primal simplex solver
  Iteration        Objective     Infeasibilities num(sum)
      32465    -1.4440246751e+09 Pr: 0(0); Du: 191(0.000131132) 256s
      32499    -1.4440246751e+09 Pr: 0(0); Du: 0(5.60054e-06) 256s
Solving the original LP from the solution after postsolve
Model status        : Optimal
Simplex   iterations: 32499
Objective value     : -1.4440246751e+09
P-D objective error :  6.6042799166e-16
HiGHS run time      :        256.79

So we can see from these results that throwing more cores at this problem improves performance, we do not get a benefit past 8 cores. As with many parallel processing problems, performance benefits of adding more parallel cores eventually gets overpowered by the fixed cost of worker process initialization (i.e., IPC cost from serialization of data that needs to be sent to and from worker threads at the start and end of work batches). The current parallel implementation in ws3 requires a substantial amount of data to be sent back and forth, so this equilibrium maxes out relatively quickly. Hopefully in a future release we can find a way to reduce IPC cost, and maybe squeeze benefits from adding more cores if they are available.

Next, we run profiling on 4, 8, and 16 core model building runs to get some more insight into what exactly is dominating runtime for these three cases.

[30]:
for workers in [1, 2, 4, 8, 16, 32, 64]:
    print("Building problem with", workers, "workers")
    pr = cProfile.Profile()
    pr.enable()
    problem = fm.add_problem(
        name="test",
        coeff_funcs=coeff_funcs,
        cflw_e=cflw_e,
        cgen_data=cgen_data,
        acodes=acodes,
        sense=ws3.opt.SENSE_MAXIMIZE,
        mask=None,
        workers=workers,
        verbose=False
    )
    pr.disable()

    s = io.StringIO()
    pstats.Stats(pr, stream=s).sort_stats("cumtime").print_stats(30)
    print(s.getvalue())
Building problem with 1 workers
         687767462 function calls (685741537 primitive calls) in 407.821 seconds

   Ordered by: cumulative time
   List reduced from 277 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       81    5.150    0.064 1050.072   12.964 /usr/lib/python3.12/asyncio/base_events.py:1910(_run_once)
        1    0.016    0.016  268.823  268.823 /home/gep/projects/ws3/ws3/forest.py:968(_bld_p_m1)
2033458/7696   18.879    0.000  236.965    0.031 /home/gep/projects/ws3/ws3/forest.py:1103(_bld_tree_m1)
       80   31.158    0.389  104.831    1.310 /usr/lib/python3.12/selectors.py:451(select)
  2682351   11.983    0.000   74.048    0.000 /home/gep/projects/ws3/ws3/forest.py:1495(compile_product)
      375   34.520    0.092   72.136    0.192 {built-in method time.sleep}
      2/1    4.492    2.246   65.126   65.126 /home/gep/projects/ws3/ws3/forest.py:872(add_problem)
  2487718   28.985    0.000   53.620    0.000 /home/gep/projects/ws3/ws3/forest.py:1664(apply_action)
   923912    6.263    0.000   51.945    0.000 /home/gep/projects/ws3/examples/util.py:128(cmp_c_caa)
    30784    0.461    0.000   39.164    0.001 /home/gep/projects/ws3/ws3/common.py:1109(paths)
   461956    5.348    0.000   37.198    0.000 /home/gep/projects/ws3/examples/util.py:83(cmp_c_z)
  2309780   23.139    0.000   35.454    0.000 /home/gep/projects/ws3/ws3/common.py:1093(path)
   461956    6.659    0.000   31.380    0.000 /home/gep/projects/ws3/examples/util.py:152(cmp_c_ci)
  2682351   27.895    0.000   29.770    0.000 {built-in method builtins.eval}
  9239120   10.397    0.000   23.955    0.000 /home/gep/projects/ws3/ws3/forest.py:1420(inventory)
       20    5.205    0.260   23.415    1.171 /home/gep/projects/ws3/ws3/forest_helper.py:285(worker_cmp_cgen_phase3)
  9708772    5.224    0.000   23.274    0.000 /home/gep/projects/ws3/ws3/common.py:82(hex_id)
       40   20.450    0.511   20.450    0.511 /home/gep/projects/ws3/ws3/forest_helper.py:220(worker_cmp_cflw_phase3)
 19189992    7.743    0.000   20.175    0.000 /home/gep/projects/ws3/ws3/core.py:324(__getitem__)
  4502450   16.137    0.000   20.152    0.000 /home/gep/projects/ws3/ws3/forest.py:503(grow)
        4    1.201    0.300   13.670    3.417 /home/gep/projects/ws3/ws3/forest_helper.py:105(worker_summarize_tree_batch)
     7778    0.006    0.000   12.702    0.002 /home/gep/projects/ws3/ws3/opt.py:176(add_constraint)
        1    0.027    0.027   12.623   12.623 /home/gep/projects/ws3/ws3/forest.py:1173(_cmp_cflw_m1)
     7778    0.009    0.000   12.589    0.002 /home/gep/projects/ws3/ws3/opt.py:69(__init__)
     7778    2.695    0.000   12.577    0.002 {built-in method builtins.all}
 19189992    7.848    0.000   12.432    0.000 /home/gep/projects/ws3/ws3/core.py:45(__call__)
  9708772   10.735    0.000   10.735    0.000 {built-in method _pickle.dumps}
 38350126    6.527    0.000    9.963    0.000 /home/gep/projects/ws3/ws3/opt.py:72(<genexpr>)
 71675351    9.344    0.000    9.344    0.000 {method 'append' of 'list' objects}
 86847728    9.214    0.000    9.214    0.000 /home/gep/projects/ws3/ws3/common.py:996(data)



Building problem with 2 workers
         199808762 function calls (199680982 primitive calls) in 390.364 seconds

   Ordered by: cumulative time
   List reduced from 545 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       83    1.429    0.017 1217.433   14.668 /usr/lib/python3.12/asyncio/base_events.py:1910(_run_once)
       83   47.774    0.576  938.323   11.305 /usr/lib/python3.12/selectors.py:451(select)
       83   11.567    0.139  414.717    4.997 {method 'poll' of 'select.epoll' objects}
      2/1    0.000    0.000  390.363  390.363 /home/gep/projects/ws3/ws3/forest.py:872(add_problem)
        1    0.000    0.000  390.042  390.042 /home/gep/projects/ws3/ws3/forest_helper.py:347(__exit__)
        1    0.003    0.003  390.042  390.042 /usr/lib/python3.12/concurrent/futures/process.py:864(shutdown)
  159/158    0.001    0.000  390.027    2.469 /usr/lib/python3.12/threading.py:1153(_wait_for_tstate_lock)
      2/1    1.311    0.656  390.027  390.027 /usr/lib/python3.12/threading.py:1115(join)
   474/73    0.002    0.000  388.935    5.328 {method 'acquire' of '_thread.lock' objects}
      2/1    0.000    0.000  388.934  388.934 /usr/lib/python3.12/threading.py:1016(_bootstrap)
      2/1    0.000    0.000  388.934  388.934 /usr/lib/python3.12/threading.py:1056(_bootstrap_inner)
        1    0.004    0.004  388.934  388.934 /usr/lib/python3.12/concurrent/futures/process.py:340(run)
        1    0.000    0.000  388.929  388.929 /usr/lib/python3.12/concurrent/futures/process.py:574(join_executor_internals)
        1    0.000    0.000  388.929  388.929 /usr/lib/python3.12/concurrent/futures/process.py:578(_join_executor_internals)
        4    0.000    0.000  387.904   96.976 /usr/lib/python3.12/multiprocessing/util.py:208(__call__)
        1    0.000    0.000  387.904  387.904 /usr/lib/python3.12/multiprocessing/queues.py:147(join_thread)
      298  194.424    0.652  301.104    1.010 {built-in method time.sleep}
        1    0.015    0.015  185.773  185.773 /home/gep/projects/ws3/ws3/forest.py:968(_bld_p_m1)
       32    0.029    0.001  117.171    3.662 /usr/lib/python3.12/concurrent/futures/process.py:415(wait_result_broken_or_wakeup)
       91    0.002    0.000   98.245    1.080 /usr/lib/python3.12/multiprocessing/connection.py:1122(wait)
       91    0.036    0.000   97.502    1.071 /usr/lib/python3.12/selectors.py:402(select)
        1    0.000    0.000   55.185   55.185 /usr/lib/python3.12/multiprocessing/queues.py:214(_finalize_join)
       28   43.702    1.561   43.702    1.561 {method 'dump' of '_pickle.Pickler' objects}
       55    0.001    0.000   26.860    0.488 /usr/lib/python3.12/multiprocessing/connection.py:182(send_bytes)
       55    0.014    0.000   26.591    0.483 /usr/lib/python3.12/multiprocessing/connection.py:406(_send_bytes)
        1    0.075    0.075   24.116   24.116 /home/gep/projects/ws3/ws3/forest.py:1279(_cmp_cgen_m1)
    15392    0.260    0.000   23.805    0.002 /home/gep/projects/ws3/ws3/common.py:1109(paths)
       73    0.001    0.000   21.663    0.297 /usr/lib/python3.12/multiprocessing/connection.py:381(_send)
   923912   13.205    0.000   21.373    0.000 /home/gep/projects/ws3/ws3/common.py:1093(path)
       26   18.830    0.724   18.830    0.724 {built-in method _pickle.loads}



Building problem with 4 workers
         199820085 function calls (199692674 primitive calls) in 294.212 seconds

   Ordered by: cumulative time
   List reduced from 543 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       68    0.992    0.015  687.612   10.112 /usr/lib/python3.12/asyncio/base_events.py:1910(_run_once)
        1    0.000    0.000  294.211  294.211 /home/gep/projects/ws3/ws3/forest.py:872(add_problem)
        1    0.000    0.000  293.884  293.884 /home/gep/projects/ws3/ws3/forest_helper.py:347(__exit__)
        1    0.004    0.004  293.884  293.884 /usr/lib/python3.12/concurrent/futures/process.py:864(shutdown)
  127/122    0.000    0.000  293.865    2.409 /usr/lib/python3.12/threading.py:1153(_wait_for_tstate_lock)
      2/1    0.000    0.000  293.865  293.865 /usr/lib/python3.12/threading.py:1115(join)
  789/169    0.004    0.000  291.478    1.725 {method 'acquire' of '_thread.lock' objects}
      2/1    0.000    0.000  291.474  291.474 /usr/lib/python3.12/threading.py:1016(_bootstrap)
      2/1    0.000    0.000  291.474  291.474 /usr/lib/python3.12/threading.py:1056(_bootstrap_inner)
        1    0.001    0.001  291.474  291.474 /usr/lib/python3.12/concurrent/futures/process.py:340(run)
        1    0.000    0.000  291.474  291.474 /usr/lib/python3.12/concurrent/futures/process.py:574(join_executor_internals)
        1    0.000    0.000  291.474  291.474 /usr/lib/python3.12/concurrent/futures/process.py:578(_join_executor_internals)
        6    0.000    0.000  290.265   48.378 /usr/lib/python3.12/multiprocessing/util.py:208(__call__)
        1    0.000    0.000  290.265  290.265 /usr/lib/python3.12/multiprocessing/queues.py:147(join_thread)
       68   46.245    0.680  277.255    4.077 /usr/lib/python3.12/selectors.py:451(select)
       66    0.695    0.011  152.453    2.310 /usr/lib/python3.12/concurrent/futures/process.py:415(wait_result_broken_or_wakeup)
      214   99.358    0.464  142.185    0.664 {built-in method time.sleep}
        1    0.019    0.019  112.891  112.891 /home/gep/projects/ws3/ws3/forest.py:968(_bld_p_m1)
      193    0.003    0.000   58.819    0.305 /usr/lib/python3.12/multiprocessing/connection.py:1122(wait)
      193    0.087    0.000   58.634    0.304 /usr/lib/python3.12/selectors.py:402(select)
       64   50.125    0.783   50.125    0.783 {method 'dump' of '_pickle.Pickler' objects}
        1    0.000    0.000   24.129   24.129 /home/gep/projects/ws3/ws3/forest.py:1173(_cmp_cflw_m1)
    15392    0.254    0.000   23.324    0.002 /home/gep/projects/ws3/ws3/common.py:1109(paths)
       60    8.006    0.133   22.857    0.381 /usr/lib/python3.12/multiprocessing/connection.py:246(recv)
   923912   13.283    0.000   21.281    0.000 /home/gep/projects/ws3/ws3/common.py:1093(path)
       60   17.061    0.284   17.061    0.284 {built-in method _pickle.loads}
       68    6.393    0.094   16.267    0.239 {method 'poll' of 'select.epoll' objects}
      125    0.001    0.000   14.102    0.113 /usr/lib/python3.12/multiprocessing/connection.py:182(send_bytes)
      125    0.001    0.000   14.101    0.113 /usr/lib/python3.12/multiprocessing/connection.py:406(_send_bytes)
      169    0.002    0.000   14.100    0.083 /usr/lib/python3.12/multiprocessing/connection.py:381(_send)



Building problem with 8 workers
         199851201 function calls (199723359 primitive calls) in 301.239 seconds

   Ordered by: cumulative time
   List reduced from 545 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       86    7.032    0.082  681.010    7.919 /usr/lib/python3.12/asyncio/base_events.py:1910(_run_once)
        1    0.000    0.000  301.236  301.236 /home/gep/projects/ws3/ws3/forest.py:872(add_problem)
        1    0.000    0.000  300.902  300.902 /home/gep/projects/ws3/ws3/forest_helper.py:347(__exit__)
        1    0.004    0.004  300.902  300.902 /usr/lib/python3.12/concurrent/futures/process.py:864(shutdown)
  155/154    0.001    0.000  300.881    1.954 /usr/lib/python3.12/threading.py:1153(_wait_for_tstate_lock)
      2/1    0.000    0.000  300.881  300.881 /usr/lib/python3.12/threading.py:1115(join)
 1330/334    0.028    0.000  295.999    0.886 {method 'acquire' of '_thread.lock' objects}
      2/1    0.000    0.000  295.447  295.447 /usr/lib/python3.12/threading.py:1016(_bootstrap)
      2/1    0.000    0.000  295.447  295.447 /usr/lib/python3.12/threading.py:1056(_bootstrap_inner)
        1    0.001    0.001  295.446  295.446 /usr/lib/python3.12/concurrent/futures/process.py:340(run)
        1    0.000    0.000  295.446  295.446 /usr/lib/python3.12/concurrent/futures/process.py:574(join_executor_internals)
        1    0.000    0.000  295.446  295.446 /usr/lib/python3.12/concurrent/futures/process.py:578(_join_executor_internals)
       10    0.000    0.000  293.868   29.387 /usr/lib/python3.12/multiprocessing/util.py:208(__call__)
        1    0.000    0.000  293.868  293.868 /usr/lib/python3.12/multiprocessing/queues.py:147(join_thread)
       86   57.895    0.673  191.861    2.231 /usr/lib/python3.12/selectors.py:451(select)
      193   65.939    0.342  144.224    0.747 {built-in method time.sleep}
      115    1.552    0.013  140.632    1.223 /usr/lib/python3.12/concurrent/futures/process.py:415(wait_result_broken_or_wakeup)
      339    0.010    0.000   87.351    0.258 /usr/lib/python3.12/multiprocessing/connection.py:1122(wait)
      339    0.016    0.000   86.429    0.255 /usr/lib/python3.12/selectors.py:402(select)
      116   64.641    0.557   64.641    0.557 {method 'dump' of '_pickle.Pickler' objects}
        1    0.001    0.001   38.528   38.528 /home/gep/projects/ws3/ws3/forest.py:1173(_cmp_cflw_m1)
      108    8.978    0.083   31.152    0.288 /usr/lib/python3.12/multiprocessing/connection.py:246(recv)
      339    8.134    0.024   24.589    0.073 {method 'poll' of 'select.poll' objects}
    15392    0.255    0.000   23.410    0.002 /home/gep/projects/ws3/ws3/common.py:1109(paths)
   923912   12.962    0.000   20.943    0.000 /home/gep/projects/ws3/ws3/common.py:1093(path)
      108   20.139    0.186   20.139    0.186 {built-in method _pickle.loads}
      225    0.002    0.000   16.733    0.074 /usr/lib/python3.12/multiprocessing/connection.py:182(send_bytes)
        1    0.000    0.000   16.463   16.463 /home/gep/projects/ws3/ws3/forest.py:1279(_cmp_cgen_m1)
      225    0.002    0.000   16.425    0.073 /usr/lib/python3.12/multiprocessing/connection.py:406(_send_bytes)
      301    0.003    0.000   16.418    0.055 /usr/lib/python3.12/multiprocessing/connection.py:381(_send)



Building problem with 16 workers
         199920526 function calls (199791902 primitive calls) in 276.475 seconds

   Ordered by: cumulative time
   List reduced from 543 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      114    0.012    0.000  475.528    4.171 /usr/lib/python3.12/asyncio/base_events.py:1910(_run_once)
      114   59.178    0.519  279.821    2.455 /usr/lib/python3.12/selectors.py:451(select)
        1    0.000    0.000  276.474  276.474 /home/gep/projects/ws3/ws3/forest.py:872(add_problem)
        1    0.000    0.000  276.143  276.143 /home/gep/projects/ws3/ws3/forest_helper.py:347(__exit__)
        1    0.004    0.004  276.143  276.143 /usr/lib/python3.12/concurrent/futures/process.py:864(shutdown)
  195/194    0.001    0.000  276.118    1.423 /usr/lib/python3.12/threading.py:1153(_wait_for_tstate_lock)
      2/1    0.000    0.000  276.117  276.117 /usr/lib/python3.12/threading.py:1115(join)
 2347/681    0.191    0.000  267.994    0.394 {method 'acquire' of '_thread.lock' objects}
      2/1    0.000    0.000  265.797  265.797 /usr/lib/python3.12/threading.py:1016(_bootstrap)
      2/1    0.000    0.000  265.797  265.797 /usr/lib/python3.12/threading.py:1056(_bootstrap_inner)
        1    0.004    0.004  265.797  265.797 /usr/lib/python3.12/concurrent/futures/process.py:340(run)
        1    0.000    0.000  265.793  265.793 /usr/lib/python3.12/concurrent/futures/process.py:574(join_executor_internals)
        1    0.000    0.000  265.793  265.793 /usr/lib/python3.12/concurrent/futures/process.py:578(_join_executor_internals)
       18    0.001    0.000  263.906   14.661 /usr/lib/python3.12/multiprocessing/util.py:208(__call__)
        1    0.000    0.000  263.906  263.906 /usr/lib/python3.12/multiprocessing/queues.py:147(join_thread)
      168   25.576    0.152  129.085    0.768 {built-in method time.sleep}
      202    1.900    0.009   92.160    0.456 /usr/lib/python3.12/concurrent/futures/process.py:415(wait_result_broken_or_wakeup)
      601    0.009    0.000   83.874    0.140 /usr/lib/python3.12/multiprocessing/connection.py:1122(wait)
      212   80.261    0.379   80.261    0.379 {method 'dump' of '_pickle.Pickler' objects}
      601    0.012    0.000   80.223    0.133 /usr/lib/python3.12/selectors.py:402(select)
        1    0.001    0.001   27.911   27.911 /home/gep/projects/ws3/ws3/forest.py:1173(_cmp_cflw_m1)
      196    9.684    0.049   26.920    0.137 /usr/lib/python3.12/multiprocessing/connection.py:246(recv)
      601    6.826    0.011   24.492    0.041 {method 'poll' of 'select.poll' objects}
    15392    0.263    0.000   23.368    0.002 /home/gep/projects/ws3/ws3/common.py:1109(paths)
      409    0.003    0.000   23.187    0.057 /usr/lib/python3.12/multiprocessing/connection.py:182(send_bytes)
      409    0.029    0.000   22.783    0.056 /usr/lib/python3.12/multiprocessing/connection.py:406(_send_bytes)
   923912   12.893    0.000   20.852    0.000 /home/gep/projects/ws3/ws3/common.py:1093(path)
      541    0.340    0.001   19.535    0.036 /usr/lib/python3.12/multiprocessing/connection.py:381(_send)
      196   19.088    0.097   19.088    0.097 {built-in method _pickle.loads}
      541    9.699    0.018   12.911    0.024 {built-in method posix.write}



Building problem with 32 workers
         200065847 function calls (199936173 primitive calls) in 324.315 seconds

   Ordered by: cumulative time
   List reduced from 545 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      184    0.036    0.000  551.675    2.998 /usr/lib/python3.12/asyncio/base_events.py:1910(_run_once)
        1    0.000    0.000  324.311  324.311 /home/gep/projects/ws3/ws3/forest.py:872(add_problem)
        1    0.000    0.000  323.968  323.968 /home/gep/projects/ws3/ws3/forest_helper.py:347(__exit__)
        1    0.004    0.004  323.968  323.968 /usr/lib/python3.12/concurrent/futures/process.py:864(shutdown)
  303/302    0.001    0.000  323.933    1.073 /usr/lib/python3.12/threading.py:1153(_wait_for_tstate_lock)
      2/1    1.824    0.912  323.932  323.932 /usr/lib/python3.12/threading.py:1115(join)
3966/1404    0.037    0.000  304.071    0.217 {method 'acquire' of '_thread.lock' objects}
      2/1    0.000    0.000  303.528  303.528 /usr/lib/python3.12/threading.py:1016(_bootstrap)
      2/1    0.000    0.000  303.528  303.528 /usr/lib/python3.12/threading.py:1056(_bootstrap_inner)
        1    0.002    0.002  303.528  303.528 /usr/lib/python3.12/concurrent/futures/process.py:340(run)
        1    0.000    0.000  303.526  303.526 /usr/lib/python3.12/concurrent/futures/process.py:574(join_executor_internals)
        1    0.000    0.000  303.526  303.526 /usr/lib/python3.12/concurrent/futures/process.py:578(_join_executor_internals)
       34    0.001    0.000  300.907    8.850 /usr/lib/python3.12/multiprocessing/util.py:208(__call__)
        1    0.000    0.000  300.906  300.906 /usr/lib/python3.12/multiprocessing/queues.py:147(join_thread)
      354    1.587    0.004  188.595    0.533 /usr/lib/python3.12/concurrent/futures/process.py:415(wait_result_broken_or_wakeup)
      184   62.645    0.340  128.705    0.699 /usr/lib/python3.12/selectors.py:451(select)
     1057    0.017    0.000  110.747    0.105 /usr/lib/python3.12/multiprocessing/connection.py:1122(wait)
     1057    0.025    0.000  108.046    0.102 /usr/lib/python3.12/selectors.py:402(select)
      380   97.451    0.256   97.451    0.256 {method 'dump' of '_pickle.Pickler' objects}
      190   25.519    0.134   87.420    0.460 {built-in method time.sleep}
      353    0.044    0.000   77.885    0.221 /usr/lib/python3.12/concurrent/futures/_base.py:199(as_completed)
        1    1.872    1.872   58.850   58.850 /home/gep/projects/ws3/ws3/forest.py:1029(_gen_vars_m1)
      348    6.866    0.020   46.064    0.132 /usr/lib/python3.12/multiprocessing/connection.py:246(recv)
      729    0.009    0.000   38.524    0.053 /usr/lib/python3.12/multiprocessing/connection.py:182(send_bytes)
     1057    6.164    0.006   36.126    0.034 {method 'poll' of 'select.poll' objects}
      949    4.569    0.005   31.493    0.033 /usr/lib/python3.12/multiprocessing/connection.py:381(_send)
      729    0.006    0.000   31.489    0.043 /usr/lib/python3.12/multiprocessing/connection.py:406(_send_bytes)
        1    0.000    0.000   24.081   24.081 /usr/lib/python3.12/multiprocessing/queues.py:214(_finalize_join)
    15392    0.262    0.000   23.443    0.002 /home/gep/projects/ws3/ws3/common.py:1109(paths)
      348   22.085    0.063   22.085    0.063 {built-in method _pickle.loads}



Building problem with 64 workers
         200475142 function calls (200343762 primitive calls) in 385.793 seconds

   Ordered by: cumulative time
   List reduced from 543 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      323    1.237    0.004  556.204    1.722 /usr/lib/python3.12/asyncio/base_events.py:1910(_run_once)
        1    0.000    0.000  385.776  385.776 /home/gep/projects/ws3/ws3/forest.py:872(add_problem)
        1    0.000    0.000  385.438  385.438 /home/gep/projects/ws3/ws3/forest_helper.py:347(__exit__)
        1    0.019    0.019  385.438  385.438 /usr/lib/python3.12/concurrent/futures/process.py:864(shutdown)
  519/514    0.002    0.000  385.385    0.750 /usr/lib/python3.12/threading.py:1153(_wait_for_tstate_lock)
      2/1    1.965    0.982  385.383  385.383 /usr/lib/python3.12/threading.py:1115(join)
6975/2952    0.064    0.000  342.594    0.116 {method 'acquire' of '_thread.lock' objects}
      2/1    0.000    0.000  341.896  341.896 /usr/lib/python3.12/threading.py:1016(_bootstrap)
      2/1    0.001    0.000  341.896  341.896 /usr/lib/python3.12/threading.py:1056(_bootstrap_inner)
        1    0.001    0.001  341.895  341.895 /usr/lib/python3.12/concurrent/futures/process.py:340(run)
        1    0.000    0.000  341.895  341.895 /usr/lib/python3.12/concurrent/futures/process.py:574(join_executor_internals)
        1    0.000    0.000  341.894  341.894 /usr/lib/python3.12/concurrent/futures/process.py:578(_join_executor_internals)
       66    0.002    0.000  338.380    5.127 /usr/lib/python3.12/multiprocessing/util.py:208(__call__)
        1    0.000    0.000  338.378  338.378 /usr/lib/python3.12/multiprocessing/queues.py:147(join_thread)
      323   65.398    0.202  268.061    0.830 /usr/lib/python3.12/selectors.py:451(select)
     1921    0.031    0.000  133.928    0.070 /usr/lib/python3.12/multiprocessing/connection.py:1122(wait)
     1921    0.045    0.000  131.868    0.069 /usr/lib/python3.12/selectors.py:402(select)
        1    1.052    1.052  126.511  126.511 /home/gep/projects/ws3/ws3/forest.py:1173(_cmp_cflw_m1)
      642    0.585    0.001  116.736    0.182 /usr/lib/python3.12/concurrent/futures/process.py:415(wait_result_broken_or_wakeup)
      700  104.076    0.149  104.076    0.149 {method 'dump' of '_pickle.Pickler' objects}
      223   30.212    0.135   90.084    0.404 {built-in method time.sleep}
     1337    0.022    0.000   61.920    0.046 /usr/lib/python3.12/multiprocessing/connection.py:182(send_bytes)
      636   25.215    0.040   55.017    0.087 /usr/lib/python3.12/multiprocessing/connection.py:246(recv)
        1    0.000    0.000   50.587   50.587 /usr/lib/python3.12/multiprocessing/queues.py:214(_finalize_join)
     1921    8.294    0.004   49.576    0.026 {method 'poll' of 'select.poll' objects}
        1    0.001    0.001   43.486   43.486 /usr/lib/python3.12/concurrent/futures/process.py:791(_launch_processes)
       64    0.004    0.000   43.485    0.679 /usr/lib/python3.12/concurrent/futures/process.py:799(_spawn_process)
       64    0.005    0.000   43.471    0.679 /usr/lib/python3.12/multiprocessing/process.py:110(start)
       64    0.004    0.000   43.452    0.679 /usr/lib/python3.12/multiprocessing/context.py:279(_Popen)
       64    0.002    0.000   43.447    0.679 /usr/lib/python3.12/multiprocessing/popen_fork.py:15(__init__)



4.7. Picking max_workers in ws3: quick guidance + deeper dive

This note summarizes what we learned from running the build-phase under cProfile at 1, 2, 4, 8, 16, 32, 64 workers. It has two parts:

  • Part A (quick start): practical advice for folks who just want a good max_workers value.

  • Part B (deep dive): how to read the cProfile output, what’s actually fast vs slow, and what to tweak if you’re hacking on the parallel code.


4.7.1. Part A — Quick start (what number should I use?)

TL;DR for this dataset/machine
Our wall-time results (build phase only):

workers

build wall time (s)

1

407.8

2

390.4

4

294.2

8

301.2

16

276.5 ← best here

32

324.3

64

385.8

Recommendation:

  • Start with ``max_workers=16`` on similar problem sizes/hardware.

  • If you have fewer physical cores, try half your cores (e.g., 8 on a 16-core box).

  • If you have many more cores (72–150), don’t jump straight to huge worker counts—diminishing returns and overhead kick in quickly.

When to reduce workers:

  • If you see more time spent with many workers than with fewer ones, drop to the smallest N near the best time (here, 16).

  • If the model is small, stick with serial (max_workers=1)—parallel overhead can dominate.

HiGHS threads:
If you build and solve in the same session, don’t give both the builder and HiGHS all cores. Example: 16 build workers + 8 solver threads is often a good split.

4.7.2. Part B — Deep dive (reading the profile, tuning, and hacking)

4.7.2.1. 1) Interpreting the profiles

In good scaling regions (e.g., 4–16 workers above), the top cumulative-time frames are productive compute:

  • forest._bld_tree_m1 (DFS tree construction)

  • forest.compile_product, forest.apply_action

  • common.paths / common.path

  • Your coefficient functions (e.g., examples.util.cmp_c_*)

As you overshoot the sweet spot (32–64 workers above), overhead climbs into the top slots:

  • selectors.select, connection.wait, poll (multiprocessing pipes/queues)

  • Pickler.dump / _pickle.loads (task/result serialization)

  • threading._wait_for_tstate_lock / mass acquire calls (contention)

  • concurrent.futures.process join/shutdown

  • time.sleep backoffs in the executor

  • asyncio.base_events._run_once (Jupyter event loop noise)

When overhead frames rival or exceed your compute frames in the top 10 cumulative list, you’ve passed the knee of the curve.

4.7.2.2. 2) Choosing max_workers methodically

  1. Run a small sweep: 1, 2, 4, 8, 16, 32 (and 64 only if you must).

  2. Plot wall time vs workers (or just eyeball the printed numbers).

  3. Pick the smallest worker count near the lowest wall time where compute still dominates the top of the profile.

    • In our run: 16.

A simple heuristic for first tries:

  • Small models: 1 (serial).

  • Medium: 4–8.

  • Large: 8–16.

  • Very large: try 16 first; 32 only if you see clear gains (rare on single machine unless tasks are very chunky and memory is ample).

4.7.2.3. 3) Batch sizing

We batch work per process to amortize scheduling and pickling costs. A practical rule:

```text batch_size ≈ len(tasks) // (2–4 * max_workers) + 1

[ ]: