Shivani, generally I do not see any problems with using AIC both for linear and nonlinear models. This index is based on the maximized value of the log likelihood function for the respective model, a function you would get for both models. Are you talking about structural equation modeling?
The AIC is one of many measures that allows you to compare the fit of many probability distributions, and we can used for both linear and non linear model.
Basically, yes. You can use AIC to compare quite different models to one another, as long as the models are being fit to the same data (e.g., don't add/drop datapoints, don't transform endogenous variables), and you can calculate an AIC for each model in the first place. To calculate a model's AIC, you first need the value of the loglikelihood at the MLE from your dataset, and you also need the number of free parameters. If the model in question is not fit by maximum likelihood (or by minimizing a loss function interpretable as a minus loglikelihood), as is the case for some nonparametric methods, then there's no loglikelihood to use for calculating AIC. You also need the number of free parameters, which in many cases is unambiguous. However, there are exceptions. For instance, in ridge regression, the effective number of free parameters is generally not an integer; for some data-mining/machine-learning models, the "number of free parameters" may not have a clear sensible definition.
BTW, if your candidate set of models contains both continuous and discrete models--that is, some models have a likelihood based on a probability density function whereas others have a likelihood based on a probability mass function--there is a bit of extra work involved to ensure such comparisons are valid. See D. I. Warton, 2005, Environmetrics 16: 275-289.