Random Coordinate Descent Methods
This is a reading note on the paper “Random Coordinate Descent Methods for Minimizing Decomposable Submodular Functions” by A. ENE, Huy L. NGUYEN
Knowledge prerequisite on submodular function (in Chinese):
Preliminaries
Let
Regarding
Let
Additionally, we assume minimizing
We want to minimize
We pivot to the minimization of the Lovász extension
Let
And let
The paper considers a proximal version of the problem:
Given an optimal solution
Lemma 1
The dual of the proximal problem is given by:
The primal and dual variables are related by:
Application on Densest Subgraph Problem
To see the definition, please refer to the Densest Subgraph Problem post.
The quadratic formulation of DDS problem is defined as follows:
The dual of such quadratic program is formulated as follows:
Lagrangian relaxation
Now consider the KKT conditions:
KKT Conditions
Stationarity Conditions
Define:
Then:
Primal Feasibility Conditions
Dual Feasibility Conditions
Complementary Slackness Conditions
Analysis & Simplification
From the stationarity conditions, we notice that for any vertex
for all edges
Similarly, for any vertex
Let us define:
Then we have:
Substituting stationarity conditions back to Lagrangian:
To maximize the dual objective, we want to maximize
This gives us
Case 1: If
Case 2: If
Therefore:
So the dual objective function can be simplified to:
And the constraints are given by the primal feasibility conditions, which is equivalent to:
Because a valid solution of primal can be constructed by using flow theory.
Dual Quadratic Program
Set new
Submodular Function Representation
Let define the submodular function
The base polytope
Obviously we have
And
the Lovász extension of
which just equals to
Thus, with
the dual quadratic program can be rewritten as:
Another Derivation
Just ignore the quadratic term, we can rewrite the dual quadratic
program as a ratio linear program (RLP):
Let’s introduce the definition of c-biased density:
And the c-biased DDS is defined as the subgraph G(S,T) such that
We have the following theorem: #### Theorem 1
The optimal value of
Proof
First, we show that
For any vertex sets
The constraint gives us:
The objective value is:
To maximize this, we want to maximize
If
Objective =
This can be rewritten as:
Therefore:
Now we show that
The LP objective can be rewritten as:
For each threshold
(vertices in first part with value ≥ t) (vertices in second part with value ≥ t)
Then the number of edges between
Consider the constraint:
So
By averaging principle:
We get:
Thus:
This completes the proof of Theorem 1.
And
Complexity Analysis
Let
Key Sets and Definitions
Feasible Set:
Null Space:
Optimal Solution Set:
Alternative Characterization of
where
Geometric Interpretation:
represents the set of points in the feasible region that are closest to satisfying the constraint- Points in
are optimal solutions to our proximal optimization problem - The set
connects to optimal solutions of the original discrete problem via thresholding
Analogue of theorem 2:
Proof:
The result is identical to the one in the original paper.
Analogue of lemma 7:
Let
Proof:
We have:
The key term to bound is:
Using the matrix norm bound:
From standard coordinate descent analysis:
Since
Therefore:
Analogue of theorem 8:
Consider iteration
This follows directly from applying Fercoq-Richtárik’s Theorem 3 with:
(sampling parameter) for each
Geometric Bound from Theorem 2 Analogue
We need the relationship between function values and distances.
Key Geometric Relationship: For our specific
objective function
This uses the fundamental theorem of calculus. For
Since
Therefore:
Since
Thus:
Applying Theorem 2 Analog
From our Theorem 2 analog:
Therefore:
Geometric Bound:
Single Epoch Analysis
Consider epoch
Let
By Theorem 8 Analog with
From Step 3:
Therefore:
For large
Epoch Length Calculation
To achieve a factor of
This gives us:
For large
To ensure the bound holds robustly, we choose:
This gives us:
Conclusion
From now on, here an iteration is defined as one pass through the
entire set of edges
After
where
Theorem 5.6
Suppose
Now we can use the above result to give a guarantee for the
To solve the
which requires:
We upper bound
Denote
We can show that
Thus, we have:
So we need to run the algorithm in iterations of
In terms of asymptotic complexity, this gives us:
The set of C is defined as
For the DDS problem, define
Then the total complexity of solving all c-DDS subproblems is: