The type of web pages has a direct impact on the recognition of user preferences. The automatic classification technology of web pages, which acts as the basis of extracting user preferences, is faced with many problems such as rigid classification, low efficiency of discriminant algorithm and poor versatility. In view of this, firstly, this article reclassifies web pages according to their function and uses novel double centerline method to reduce the page noise. It selects two features-Standard Deviation of Blocks Area and Link Number per Height, then applies Naive Bayes classifier to train ...