For computerized tasks of simple RT and choice RT, is there a recommended number of trials to include, or a formula to calculate an appropriate number of trials needed to account for variability?
the number of trials that would be required depends on several factors related to the task itself and the characteristics of the participants. What you are looking for is a measure of stable performance.
For a new participant completing a typical RT task you will tend to find that it takes them several to many responses to get used to the task, there will then be a period of stable performance and eventually fatigue or loss of attention to the task. You can see this by plotting RT as a time series. You will see relatively high RT to begin with which slowly decreases to a point where it levels off. It will then vary around a fairly consistent "average" for a while, then this will change (e.g. increase) as fatigue/loss of attention sets in.
You want the middle "stationary" section as your measure of performance. One way to determine what this is would be to calculate the mean and sd for blocks of e.g. 5-10 trials, and the point at which the mean and sd looks to be relatively stable is the starting point, and when it starts to change again would be the end point to calculate e.g. the mean RT.
Alternatively, you can just look at the time series plot and pick a point where it looks like it is stable.
In practice, you could pilot the task under realistic conditions and see where the stable section tends to be, then apply this to all your participants.
So, for example, a student of mine is using a simple RT task with 40 trials (nb, you should also include a practice block of trials). Looking at the mean performance for the first 5 participants (i.e. we averaged for each trial across the 5 participants to get 1 time series) it appeared that performance was stable after 10 trials and remained so for the rest of the trials. Simple RT is not one of our main variable (it is more exploratory) so I think 30 samples is sufficient, but if it was a main variable I would probably use more, and have more than one block of trials per participant.
The other thing to keep in mind is that there will be individual and group differences you may want to consider. For example, the performance of a group of people who play a lot of computer games might stabilise a lot quicker than a control group. If this is the case, you would want to make sure that you only include the stable section e.g. using the longer cutoff for both groups (but this might have to be balanced with other considerations in order to equate what you are measuring) .
Usually, you would want to give them enough practice so their performance was relatively stable before the experiment trials.
Finally, it is always a good idea to have a quick look at research similar to yours (or in journals where you want to publish the research) that include RT and see how they went about collecting their RT data. This can at least give you a starting point.