CELF
CELF (Cost-Effective Lazy Forward) is a greedy acceleration strategy based on submodularity [1]. Instead of recomputing every candidate's marginal gain at every round, CELF maintains a priority queue and performs lazy re-evaluation only when necessary.
Compared with naive greedy selection, CELF-style methods usually reduce the number of Monte-Carlo spread estimations by a large margin, which directly improves IM runtime.
Available Algorithms
CELFIM
- class fs_gplib.InfluenceMaximization.celf.CELFIM(model, seed_size: int, influenced_type: List[int] | None = None, MC: int = 1000, iterations_times: int = 100, verbose: bool = True)[source]
Bases:
BaseInfluenceMaximizerCELF (Cost-Effective Lazy Forward) Influence Maximization.
CELF dramatically reduces the number of spread evaluations compared to the naive greedy algorithm by exploiting two properties:
Submodularity: The marginal gain of adding a node decreases as more nodes are already selected.
Lazy Evaluation: Using a max-heap, CELF only recomputes marginal gains when they could potentially exceed the current best. Many nodes are "lazy" and never get re-evaluated after their initial estimate.
This results in O(k * n) average-case complexity instead of O(k * n * MC) spread estimates, where k is seed size, n is number of nodes, and MC is Monte-Carlo simulations.
- Parameters:
model -- A diffusion model from
fs_gplib.Epidemics.seed_size -- Number of seeds to select.
influenced_type -- Set of node state values that count as influenced.
MC -- Monte-Carlo simulations per spread estimate.
iterations_times -- Diffusion steps per simulation.
verbose -- Whether to show progress and statistics.
Example:
from fs_gplib.InfluenceMaximization import CELFIM from fs_gplib.Epidemics import SIRModel model = SIRModel( data=graph_data, seeds=[], infection_beta=0.05, recovery_lambda=0.01 ) im = CELFIM( model=model, seed_size=10, influenced_type=[1, 2], MC=500, iterations_times=100 ) seeds = im.fit() print(f"Total spread estimates: {im.estimator.get_call_count()}")
- fit() List[int][source]
Run CELF algorithm to select seed nodes.
Uses a max-heap with lazy evaluation to minimize spread estimates. Implements the original CELF logic from Leskovec et al. (KDD 2007): each node's
marginal_gainalways stores the true marginal gainf(S ∪ {v}) - f(S)relative to the seed set at the time of its last evaluation. A node is re-evaluated only when its cached gain is from a previous round (i.e. stale due to submodularity).- Returns:
List of selected seed node indices.
- Return type:
list[int]
CELFPlusPlus
CELFPlusPlus implements the Greedy CELF++ lazy-forward state machine [2].
Each candidate keeps four cached fields:
mg1: marginal gain \(\Delta_u(S)\) with respect to current seed set.mg2: look-ahead marginal gain \(\Delta_u(S \cup \{prev\_best\})\).prev_best: the best examined node used to computemg2.flag: the iteration in which the cached gains were last refreshed.
In each iteration, last_seed records the most recently selected seed and
cur_best tracks the best marginal-gain node among candidates examined in
the current iteration. If prev_best == last_seed, mg2 can be reused
directly without recomputing spread.
- class fs_gplib.InfluenceMaximization.celf.CELFPlusPlus(model, seed_size: int, influenced_type: List[int] | None = None, MC: int = 1000, iterations_times: int = 100, verbose: bool = True)[source]
Bases:
BaseInfluenceMaximizerCELF++ (Cost-Effective Lazy Forward++) for Influence Maximization.
- Parameters:
model -- A diffusion model from
fs_gplib.Epidemics.seed_size -- Number of seeds to select.
influenced_type -- Set of node state values that count as influenced.
MC -- Monte-Carlo simulations per spread estimate.
iterations_times -- Diffusion steps per simulation.
verbose -- Whether to show progress.
Example:
from fs_gplib.InfluenceMaximization import CELFPlusPlus im = CELFPlusPlus( model=model, seed_size=10, influenced_type=[1, 2] ) seeds = im.fit()
- fit() List[int][source]
Run CELF++ algorithm to select seed nodes.
Implements the Greedy CELF++ state machine from the paper: each candidate stores
mg1(asmarginal_gain),mg2(assecond_gain),prev_best, andflag(aslast_eval_iter). During lazy updates, stale candidates can reusemg2whenprev_best == last_seed.Variables follow the pseudocode semantics:
Sis the current seed set,last_seedis the most recently selected seed, andcur_besttracks the node with the maximum marginal gain among candidates examined in the current iteration.- Returns:
List of selected seed node indices.
- Return type:
list[int]
Advanced Interface
Both CELFIM and CELFPlusPlus expose get_estimator() for
accessing spread-estimation statistics and cache usage.