Is it a Must-Do step to standardize/normalize/rescale variables in Automated Valuation Models (AVMs) of real estate?

09 May 2020 0 3K Report

During the submission process of my recent work, there were mainly two valuable comments from the reviewers, which made me very interested to continue the discussion. Here I first expand one of the issues and hope to discuss it with you.

Whether to standardize data is a topic that has been discussed for a long time the model building process in many fields. In the following discussion, I will focus on the field of property mass appraisal.

For the scale of variables:

Price: 1000-100000+ RMB Yuan/m2;

Age: 0-100+ years

Bedroom: 1-6?

Decoration condition: 0-1

Ratio of Elevator: 0-1;

Floor Area Ratio: 0-15+?

Green Ratio: 0-0.6+?

Distance to POIs: 0-2 km? or 0-10 km? or 0-2000 m?

.........

(Example source: Article Mass Appraisal Modeling of Real Estate in Urban Centers by G...

)

It can be seen that under the measurement of different units, the numerical difference of the variables is relatively large. (also the distribution or density estimation is another important issue)

Here are some points I considered.

1. For the linear regression model, whether it is normalized or not does not affect the results of the model, such as the value of R2.

2. If the same variable between different models (i.e. Hedonic Model1 vs Hedonic Model 2 or Hedonic Model 1 vs Tree-based Model 1) wants to compare its coefficients, the same standardization method is needed for the different models.

3. Models such as neural network, PCA, and support vector machine, standardization is a good and must-do choice for the data sets. But models such as linear regression, logistic regression, and decision tree, standardization will not affect the results.

4. On the contrary, if standardization removes the unit metric, we do not know what is being compared between different variables.

Besides, if you think standardization is needed, what software and corresponding function modules would you use or recommend?

Thank you in advance!

Badges
Science topic

More Daikun Wang's questions See All

Why does the MFDFA algorithm need to calculate the profile of the time series?

As described in the Multifractal detrended fluctuation analysis (MFDFA) algorithm, it at first calculates the profile of the time series, and then other steps are operated on the profile....

05 August 2024 9,366 2 View

Differences between deep seated landslides and slope destabilization?

Hi, Could someone explain the primary differences between deep-seated landslides and slope destabilization? In particular, definition and characteristics, mechanisms and triggering factors,...

02 August 2024 4,212 2 View

A question about arbuscular mycorrhizal???

How long it takes for arbuscular mycorrhiza to establish and produce benefits under experimental conditions？

25 July 2024 5,208 2 View

Is it possible to run the AIMD within a system using virtual crystal approximation (VCA)?

I want to study the thermal properties of a mixed system which is constructed by virtual crystal approximation in VASP. When I try to run the ab initio Molecular Dynamics of this system in VASP, I...

19 July 2024 6,569 3 View

If I want to invent my own hypothesis testing method, where should I get started ?

15 July 2024 5,376 5 View

Recommendations for Rapid Publication Journals in Traffic and Transportation?

I am currently working on a research paper focused on the control of Connected and Autonomous Vehicles (CAVs) utilizing multi-agent reinforcement learning methods. At this stage, I am seeking a...

14 July 2024 2,620 2 View

I've earned 1 best paper award and 4 best oral presentation awards. What should I do next?

I've earned 1 best paper award and 4 best oral presentation awards. What should I do next to elevate my academic capabilities to the next level ?

14 July 2024 6,071 5 View

How to start writing an anti-virus software ?

I read several information security books. How do I start writing anti-virus softwares ？

13 July 2024 8,180 1 View

Are my cells contaminated with mycoplasma?

I suspect my cells are contaminated with mycoplasma. I fixed the cells with 4% PFA and stained them with DAPI. Below is the image I obtained. I don't observe the typical small, rounded DAPI foci...

11 July 2024 7,786 3 View

How to distinguish the DRX prior austenite grains (PAG) by EBSD of the martensite transformed from PAG （Ausforming）?

During the heavy warm rolling process, austenite undergoes dynamic recrystallization (either DDRX or CDRX), and then transforms into martensite upon quenching. Is there a way to distinguish from...

11 July 2024 7,602 1 View

Can you connect an HPLC to a Mass Spec only at a certain time point?

Can anyone explain this method? Especially the last statement where it says only at 1.5 to 2.5mins was the MS/MS connected to the UPLC. How is that possible, is it a feature in this specific...

11 August 2024 8,141 3 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

How to calculate CCS for Sodiated adduct ions and Multiply Charged Ions?

I'm currently working on calculating the collision cross section (CCS) for various ions, and I'm facing challenges when dealing with sodiated and multiply charged ions. Most of the resources I’ve...

08 August 2024 8,329 0 View

What precautions should be taken while handling S. aureus enterotoxin Type B in the lab?

I would like to understand potential safety concerns while handling SEB in the lab. Especially while working in animal house facility. Would like to know precautions for handling. Sigma MSDS...

07 August 2024 6,034 3 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

Determining the worth of a point improvement in Hamilton Depression Scale?

Dear readers, Thanks for your attention. I am wondering about the health economic problem of quantifying the value of interventions which a) prevent, b) improve symptom profile and c) ultimately...

05 August 2024 3,246 1 View

Is there any machine to do real time pcr?

I want to know how do you make real time pcr solation ? is there any machine to make it? thanks for answering

05 August 2024 1,660 0 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View