Influence Maximization
Overview
Influence Maximization (IM) asks a classical question in network science: given a budget \(k\), which \(k\) seed nodes should be activated so that the expected propagation range is maximized under a diffusion model.
This problem is central to viral marketing, public opinion guidance, information campaign planning, and epidemic intervention. For common diffusion settings (for example Independent Cascade and threshold families), the spread objective is monotone and submodular under classical assumptions, enabling a greedy strategy with \((1 - 1/e)\) approximation guarantee in that setting. In practical implementations, spread is typically estimated by Monte-Carlo simulation.
In practice, the main runtime cost of greedy-style IM comes from repeatedly
estimating spread through propagation simulation. FS_GPlib implements these
simulation computations efficiently, so greedy algorithms that rely on repeated
propagation estimation can be significantly accelerated:
GreedyIM: standard greedy baseline with explicit marginal-gain evaluation.GreedyIMWithCaching: avoids repeated spread computation on duplicate seed sets.CELFIM: lazy-forward strategy that dramatically reduces re-evaluation.CELFPlusPlus: CELF++ lazy-forward variant with cached look-ahead gains.
Compared with naive greedy evaluation, CELF-style methods typically achieve substantial speedups while preserving nearly identical seed quality in most benchmark settings.
Base Class & Common API
All IM algorithms inherit from
BaseInfluenceMaximizer,
which provides a shared workflow for:
model compatibility checks and influenced-state validation;
estimating expected spread by Monte-Carlo simulation;
consistent
fit()/get_seeds()workflow.
- class fs_gplib.InfluenceMaximization.base.BaseInfluenceMaximizer(model: DiffusionModel, seed_size: int, influenced_type: List[int] = [1], MC: int = 1000, iterations_times: int = 100, verbose: bool = True)[source]
Base class for influence maximization algorithms.
This class provides the shared infrastructure for computing influence spread and managing the seed selection process. Concrete algorithms (Greedy, CELF, etc.) inherit from this base class.
Core Concept: The spread range of a seed set \(S\) is defined as the expected number of nodes that reach states specified by
influenced_typewhen diffusion starts from \(S\).- Parameters:
model (DiffusionModel) -- A diffusion model instance from
fs_gplib.Epidemics(e.g.SIRModel,SIModel). The model's seeds will be set to empty during IM computation.seed_size (int) -- Number of seed nodes to select.
influenced_type (list[int] or set[int]) -- Set of node state values that count as "influenced". For SIR:
[1, 2]means infected or recovered nodes. For SI:[1]means infected nodes.MC (int) -- Number of Monte-Carlo simulations for spread estimation. Higher values give more accurate estimates but increase computation time.
iterations_times (int) -- Number of diffusion steps per simulation.
verbose (bool) -- Whether to show progress during computation.
Note
The model should be initialized with
seeds=None. The IM algorithm will set seeds during computation.Warning
This implementation is not thread-safe when sharing the same diffusion model instance across multiple threads/tasks. Influence estimation mutates model-internal seed/state via
model._set_seed(...)before running simulations.Example:
from fs_gplib.InfluenceMaximization import GreedyIM from fs_gplib.Epidemics import SIRModel # Create model with empty seeds model = SIRModel( data=graph_data, seeds=[], infection_beta=0.05, recovery_lambda=0.01 ) # Create IM algorithm im = GreedyIM( model=model, seed_size=10, influenced_type=[1, 2], # Infected + Recovered MC=500, iterations_times=100 ) seeds = im.fit()
Common Workflow
All IM algorithms expose the same user-facing workflow:
Method |
Description |
|---|---|
|
Run the seed-selection algorithm and return the selected seed nodes. |
|
Return the selected seeds after |
- BaseInfluenceMaximizer.fit() List[int][source]
Select seed nodes using the configured algorithm.
This is the main entry point. Subclasses implement the actual seed selection logic in this method.
- Returns:
List of selected seed node indices.
- Return type:
list[int]
- BaseInfluenceMaximizer.get_seeds() List[int][source]
Return the selected seed nodes.
- Returns:
List of seed node indices.
- Return type:
list[int]
Note
IM algorithms mutate the wrapped diffusion model by updating its seed set during spread estimation. If you want to compare multiple IM algorithms, create a fresh diffusion model instance for each run.
Note
influenced_type valid states depend on the diffusion model.
Typical choices include [1, 2] for SIRModel and [1] for SIModel.
Quick Example
The example below shows a practical IM workflow.
import torch
from torch_geometric.datasets import Planetoid
from fs_gplib.Epidemics import SIRModel
from fs_gplib.InfluenceMaximization import GreedyIM, CELFIM
# 1) Load Cora graph
dataset = Planetoid(root="./dataset", name="Cora")
data = dataset[0]
# 2) Build diffusion model for Greedy (seeds must be None)
model = SIRModel(
data=data,
seeds=None,
infection_beta=0.05,
recovery_lambda=0.01,
use_weight=False
)
# 3) Greedy baseline
greedy = GreedyIM(
model=model,
seed_size=20,
influenced_type=[1, 2], # Infected + Recovered
MC=400,
iterations_times=100,
verbose=True,
)
greedy_seeds = greedy.fit()
model._set_seed(None)
# 4) Faster CELF alternative
celf = CELFIM(
model=model,
seed_size=20,
influenced_type=[1, 2],
MC=400,
iterations_times=100,
verbose=True,
)
celf_seeds = celf.fit()
print("Greedy seeds:", greedy_seeds[:5], "...")
print("CELF seeds:", celf_seeds[:5], "...")