CELF

CELF (Cost-Effective Lazy Forward) is a greedy acceleration strategy based on submodularity. Instead of recomputing every candidate's marginal gain at every round, CELF maintains a priority queue and performs lazy re-evaluation only when necessary.

Compared with naive greedy selection, CELF-style methods usually reduce the number of Monte-Carlo spread estimations by a large margin, which directly improves IM runtime.

Available Algorithms

CELFIM

class fs_gplib.InfluenceMaximization.celf.CELFIM(model, seed_size: int, influenced_type: List[int] | None = None, MC: int = 1000, iterations_times: int = 100, verbose: bool = True)[source]

Bases: BaseInfluenceMaximizer

CELF (Cost-Effective Lazy Forward) Influence Maximization.

CELF dramatically reduces the number of spread evaluations compared to the naive greedy algorithm by exploiting two properties:

  1. Submodularity: The marginal gain of adding a node decreases as more nodes are already selected.

  2. Lazy Evaluation: Using a max-heap, CELF only recomputes marginal gains when they could potentially exceed the current best. Many nodes are "lazy" and never get re-evaluated after their initial estimate.

This results in O(k * n) average-case complexity instead of O(k * n * MC) spread estimates, where k is seed size, n is number of nodes, and MC is Monte-Carlo simulations.

Parameters:
  • model -- A diffusion model from fs_gplib.Epidemics.

  • seed_size -- Number of seeds to select.

  • influenced_type -- Set of node state values that count as influenced.

  • MC -- Monte-Carlo simulations per spread estimate.

  • iterations_times -- Diffusion steps per simulation.

  • verbose -- Whether to show progress and statistics.

Example:

from fs_gplib.InfluenceMaximization import CELFIM
from fs_gplib.Epidemics import SIRModel

model = SIRModel(
    data=graph_data,
    seeds=[],
    infection_beta=0.05,
    recovery_lambda=0.01
)

im = CELFIM(
    model=model,
    seed_size=10,
    influenced_type=[1, 2],
    MC=500,
    iterations_times=100
)
seeds = im.fit()
print(f"Total spread estimates: {im.estimator.get_call_count()}")
fit() List[int][source]

Run CELF algorithm to select seed nodes.

Uses a max-heap with lazy evaluation to minimize spread estimates. Implements the original CELF logic from Leskovec et al. (KDD 2007): each node's marginal_gain always stores the true marginal gain f(S {v}) - f(S) relative to the seed set at the time of its last evaluation. A node is re-evaluated only when its cached gain is from a previous round (i.e. stale due to submodularity).

Returns:

List of selected seed node indices.

Return type:

list[int]

get_estimator() SpreadEstimator[source]

Return the spread estimator for accessing statistics.

Returns:

The SpreadEstimator instance.

Return type:

SpreadEstimator

CELFPlusPlus

CELFPlusPlus implements the Greedy CELF++ lazy-forward state machine. Each candidate keeps four cached fields:

  • mg1: marginal gain \(\Delta_u(S)\) with respect to current seed set.

  • mg2: look-ahead marginal gain \(\Delta_u(S \cup \{prev\_best\})\).

  • prev_best: the best examined node used to compute mg2.

  • flag: the iteration in which the cached gains were last refreshed.

In each iteration, last_seed records the most recently selected seed and cur_best tracks the best marginal-gain node among candidates examined in the current iteration. If prev_best == last_seed, mg2 can be reused directly without recomputing spread.

class fs_gplib.InfluenceMaximization.celf.CELFPlusPlus(model, seed_size: int, influenced_type: List[int] | None = None, MC: int = 1000, iterations_times: int = 100, verbose: bool = True)[source]

Bases: BaseInfluenceMaximizer

CELF++ (Cost-Effective Lazy Forward++) for Influence Maximization.

Parameters:
  • model -- A diffusion model from fs_gplib.Epidemics.

  • seed_size -- Number of seeds to select.

  • influenced_type -- Set of node state values that count as influenced.

  • MC -- Monte-Carlo simulations per spread estimate.

  • iterations_times -- Diffusion steps per simulation.

  • verbose -- Whether to show progress.

Example:

from fs_gplib.InfluenceMaximization import CELFPlusPlus

im = CELFPlusPlus(
    model=model,
    seed_size=10,
    influenced_type=[1, 2]
)
seeds = im.fit()
fit() List[int][source]

Run CELF++ algorithm to select seed nodes.

Implements the Greedy CELF++ state machine from the paper: each candidate stores mg1 (as marginal_gain), mg2 (as second_gain), prev_best, and flag (as last_eval_iter). During lazy updates, stale candidates can reuse mg2 when prev_best == last_seed.

Variables follow the pseudocode semantics: S is the current seed set, last_seed is the most recently selected seed, and cur_best tracks the node with the maximum marginal gain among candidates examined in the current iteration.

Returns:

List of selected seed node indices.

Return type:

list[int]

get_estimator() SpreadEstimator[source]

Return the spread estimator for accessing statistics.

Advanced Interface

Both CELFIM and CELFPlusPlus expose get_estimator() for accessing spread-estimation statistics and cache usage.