Random Coordinate Descent Methods

This is a reading note on the paper “Random Coordinate Descent Methods for Minimizing Decomposable Submodular Functions” by A. ENE, Huy L. NGUYEN

arxiv

Knowledge prerequisite on submodular function (in Chinese):

子模函数以及Lovasz拓展

Preliminaries

Let .

Regarding as a modular function .

Let be a submodular function of the form: where each is a simple submodular function.

Additionally, we assume minimizing is easy, i.e., there exists efficient oracles to find the minimizer of for each and .

We want to minimize .

We pivot to the minimization of the Lovász extension .

Let be the Lovász extension of written as the support function of the base polytope : where .

And let be the Lovász extension of , be the base polytope of .

The paper considers a proximal version of the problem:

Given an optimal solution to the proximal problem, one can obtain an optimal solution to the discrete problem by thresholding at ; more precisely, the optimal solution to the discrete problem is given by .

Lemma 1

The dual of the proximal problem is given by:

The primal and dual variables are related by:

Application on Densest Subgraph Problem

To see the definition, please refer to the Densest Subgraph Problem post.

The quadratic formulation of DDS problem is defined as follows:

The dual of such quadratic program is formulated as follows:

Lagrangian relaxation

Now consider the KKT conditions:

KKT Conditions

Stationarity Conditions

Define:

Then:

Primal Feasibility Conditions

Dual Feasibility Conditions

Complementary Slackness Conditions

Analysis & Simplification

From the stationarity conditions, we notice that for any vertex , the quantity must be equal to for all edges incident to . This gives us:

for all edges incident to vertex .

Similarly, for any vertex :

Let us define:

Then we have:

Substituting stationarity conditions back to Lagrangian:

To maximize the dual objective, we want to maximize which means minimizing subject to:

This gives us .

Case 1: If , set and . Then .

Case 2: If , set and . Then .

Therefore: .

So the dual objective function can be simplified to:

And the constraints are given by the primal feasibility conditions, which is equivalent to:

Because a valid solution of primal can be constructed by using flow theory.

Dual Quadratic Program

Set new , we can rewrite the dual quadratic program as follows:

Submodular Function Representation

Let define the submodular function as follows:

is defined as follows:

The base polytope is defined as follows:

Obviously we have for all . (Here use to denote the vertex corresponding to in the second copy of .)

And .

the Lovász extension of is given by:

which just equals to .

Thus, with normalized from to ,

the dual quadratic program can be rewritten as:

Another Derivation

Just ignore the quadratic term, we can rewrite the dual quadratic program as a ratio linear program (RLP):

Let’s introduce the definition of c-biased density:

And the c-biased DDS is defined as the subgraph G(S,T) such that is maximized.

We have the following theorem: #### Theorem 1

The optimal value of , and the optimal solution can be recovered by thresholding.

Proof

First, we show that .

For any vertex sets , consider the feasible solution to RLP:

for , for

for , for

The constraint gives us:

The objective value is:

To maximize this, we want to maximize subject to the constraint.

If , then:

Objective =

This can be rewritten as:

Therefore:

Now we show that .

The LP objective can be rewritten as:

For each threshold , define:

  • (vertices in first part with value ≥ t)
  • (vertices in second part with value ≥ t)

Then the number of edges between and is given by:

Consider the constraint:

So

By averaging principle:

We get:

Thus:

This completes the proof of Theorem 1.

And can be considered as a proximal version of .

Complexity Analysis

Let ,

Key Sets and Definitions

Feasible Set:

Null Space:

Optimal Solution Set:

Alternative Characterization of :

where and .

Geometric Interpretation:

  • represents the set of points in the feasible region that are closest to satisfying the constraint
  • Points in are optimal solutions to our proximal optimization problem
  • The set connects to optimal solutions of the original discrete problem via thresholding

Analogue of theorem 2:

Proof:

The result is identical to the one in the original paper.

Proof of gemini

Analogue of lemma 7:

Let be a random subset where each is included independently with probability . For vectors , let be defined by if and otherwise. Then:

Proof:

We have:

The key term to bound is:

Using the matrix norm bound:

From standard coordinate descent analysis:

Since :

Therefore:

Analogue of theorem 8:

Consider iteration of the APPROX algorithm. Let , Let is the optimal solution that is closest to . Then we have: Proof:

This follows directly from applying Fercoq-Richtárik’s Theorem 3 with:

  • (sampling parameter)
  • for each

Geometric Bound from Theorem 2 Analogue

We need the relationship between function values and distances.

Key Geometric Relationship: For our specific objective function , we have:

This uses the fundamental theorem of calculus. For :

Since :

Therefore:

Since is optimal, (first-order optimality condition).

Thus:

Applying Theorem 2 Analog

From our Theorem 2 analog:

Therefore:

Geometric Bound:

Single Epoch Analysis

Consider epoch . Let be the solution constructed by the APPROX algorithm after running for iterations starting with (so ).

Let be the optimal solution closest to .

By Theorem 8 Analog with :

From Step 3:

Therefore:

For large , the dominant term is , so:

Epoch Length Calculation

To achieve a factor of improvement per epoch, we need:

This gives us:

For large , we can approximate:

To ensure the bound holds robustly, we choose:

This gives us:

Conclusion

From now on, here an iteration is defined as one pass through the entire set of edges for the convenience of comparison.

After epochs of the ACDM algorithm (equivalently, iterations), we have:

where is the optimal solution in that is closest to .

Theorem 5.6

Suppose , then

Now we can use the above result to give a guarantee for the -approximation c-DDS subproblem.

To solve the -approximation c-DDS subproblem, we need

which requires: Thus, in expectation, we need to run the algorithm for epochs, where:

We upper bound by the maximum value of the objective function .

Denote , where and denote the maximum outdegree and indegree of directed graph , respectively

We can show that is upper bounded by

Thus, we have:

So we need to run the algorithm in iterations of

In terms of asymptotic complexity, this gives us: to solve a c-DDS subproblem.

The set of C is defined as

For the DDS problem, define

Then the total complexity of solving all c-DDS subproblems is:


Random Coordinate Descent Methods
https://notdesigned.github.io/2025/06/14/Random-Coordinate-Descent-Methods/
Author
Luocheng Liang
Posted on
June 14, 2025
Licensed under