You must log in or # to comment.
• Researchers have discovered that gradient descent has a theory that is not fully understood. • Gradient descent uses the cost function to find the lowest value on the graph. • Gradient descent algorithms pick a point and calculate the slope of the curve around it. • Conventional wisdom is to take small steps, but automated testing techniques have shown that large steps can be optimal. • Research has shown that the optimal stride length depends on the number of steps in the sequence. • The cyclic approach to gradient descent may be more efficient, but will not change its current use. • The results of the study raise a theoretical puzzle about the structure that governs the best decisions.