Visualizing high-dimensional loss landscapes with Hessian directions

Abstract

Analyzing geometric properties of high-dimensional loss functions, such as local curvature and the existence of other optima around a certain point in loss space, can help provide a better understanding of the interplay between neural network structure, implementation attributes, and learning performance. In this work, we combine concepts from high-dimensional probability and differential geometry to study how curvature properties in lower-dimensional loss representations depend on those in the original loss space. We show that saddle points in the original space are rarely correctly identified as such in lower-dimensional representations if random projections are used. In such projections, the expected curvature in a lower-dimensional representation is proportional to the mean curvature in the original loss space. Hence, the mean curvature in the original loss space determines if saddle points appear, on average, as either minima, maxima, or almost flat regions. We use the connection between expected curvature and mean curvature (i.e., the normalized Hessian trace) to estimate the trace of Hessians without calculating the Hessian or Hessian-vector products as in Hutchinson’s method. Because random projections are not able to correctly identify saddle information, we propose to study projections along Hessian directions that are associated with the largest and smallest principal curvatures. We connect our findings to the ongoing debate on loss landscape flatness and generalizability. Finally, we illustrate our method in numerical experiments on different image classifiers with up to about 7 × 10^6 parameters.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,435

External links

  • This entry has no external links. Add one.
Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

交叉的突然変異による適応的近傍探索 だましのある多峰性関数の最適化.木村 周平 高橋 治 - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16:175-184.
The Use and Misuse of Counterfactuals in Ethical Machine Learning.Atoosa Kasirzadeh & Andrew Smart - 2021 - In ACM Conference on Fairness, Accountability, and Transparency (FAccT 21).

Analytics

Added to PP
2022-12-20

Downloads
0

6 months
0

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.
How can I increase my downloads?

Author's Profile

Gregory Wheeler
Frankfurt School Of Finance And Management

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references