Dear all, I wonder if I can argue if my instruments of choice are valid and defendable (i.e. not related to the unobserved effects).

I want to estimate the effect that a particular shops or separate in-shop policies have on the customer's multichannel shopping behaviour.

The situation

I have a cross sectional dataset with 26 shops spread around the country, and 25000 shop customers. The customers are mostly 1 to 10 employee firms in the MRO business. The shop only sells to professionals (B2B). The customers are divided by division, which refers to the specific industry they work in (affects the products they buy).

The shops carry a limited inventory, they stocks less than 10% of the products available in the catalogue.

These customers could potentially also shop in 2 other channels. In the e-commerce shop and place orders with a dedicated sales representative (that calls them or visits them depending on their size/imporantance).

Estimation

What I want to estimate:

(number of sales channels used) = B1 + B2 * (fact that customer buys with sales rep) + B3 * (fact that customers buys in webstore) + B4 * (fact that customer shops in shop) + B5 * (Shop customer shops at) + B6 * (Shop policy 1) + .... + Bn * (Shop policy n) + unobserved effect + error

The unobserved effect could be a number of factors, maybe the person doing the buying is young and thus more tech savvy and more likely to procure online and use webshop. Or maybe that the sales rep and customer are family and thus the customer only shops with him. Etc.

I am interested in the size of B5 - B n.

For controls regarding sales influence company I have:

-assigned sales rep

-

For controls regarding customer characteristics

-order size ($)

-number of orders

-sector/industry

-sales potential ($)

-sales exploitation

-age of the sales account (months)

These are not enough to cover for all inputs (I guess).

Instruments

#1

Now I know the location of the shops, and the location of the customers (x,y coordinates). Customers closer to the shop are more likely to shop there and undergo shop policy ('receive treatment').

Distance is unrelated to unobserved factors (I hope)

Problem: shops are not placed at random, nor are the customers, on the map.

#2

Also I know that shops cater to the specific audience that shops there. If in a region a lot of customers are in industry X, the shop will stock more products from the X line.

Therefore if there is less customer diversity in a shop area, the majority group will be more likely to visit the shop because it caters to them more. And thus are more likely to undergo shop-policy. ('receive treatment')

Problem: customers are not randomly dropped on the map (may be clustered for specific reasons)

Do the problems make my instruments invalid? Can I get away with a basic IV approach?

Similar questions and discussions